-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cuda-python bindings for getting device properties. #4830
Changes from all commits
c85c40b
07118bd
8080bc8
664155a
209ed58
8400d39
8c387bb
6de0e28
c27e77e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Copyright (c) 2021, NVIDIA CORPORATION. | ||
# Copyright (c) 2021-2025, NVIDIA CORPORATION. | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
|
@@ -19,7 +19,6 @@ | |
from cugraph.utilities.path_retrieval cimport get_traversed_cost as c_get_traversed_cost | ||
from cugraph.structure.graph_primtypes cimport * | ||
from libc.stdint cimport uintptr_t | ||
from numba import cuda | ||
import cudf | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was an unused import. |
||
import numpy as np | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
@@ -1,4 +1,4 @@ | ||||
# Copyright (c) 2020-2024, NVIDIA CORPORATION. | ||||
# Copyright (c) 2020-2025, NVIDIA CORPORATION. | ||||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||||
# you may not use this file except in compliance with the License. | ||||
# You may obtain a copy of the License at | ||||
|
@@ -15,13 +15,10 @@ | |||
import os | ||||
import shutil | ||||
|
||||
from numba import cuda | ||||
|
||||
import cudf | ||||
from cudf.core.column import as_column | ||||
|
||||
from cuda.cudart import cudaDeviceAttr | ||||
from rmm._cuda.gpu import getDeviceAttribute | ||||
from cuda.bindings import runtime | ||||
|
||||
from warnings import warn | ||||
|
||||
|
@@ -210,45 +207,42 @@ def get_traversed_path_list(df, id): | |||
return answer | ||||
|
||||
|
||||
def is_cuda_version_less_than(min_version=(10, 2)): | ||||
def is_cuda_version_less_than(min_version): | ||||
""" | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on the function name, this should not have a default value. Its default was also outdated. This function also appears to be unused. Do we want to keep it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removing it seems like a good idea to me. |
||||
Returns True if the version of CUDA being used is less than min_version | ||||
""" | ||||
this_cuda_ver = cuda.runtime.get_version() # returns (<major>, <minor>) | ||||
if this_cuda_ver[0] > min_version[0]: | ||||
return False | ||||
if this_cuda_ver[0] < min_version[0]: | ||||
return True | ||||
if this_cuda_ver[1] < min_version[1]: | ||||
return True | ||||
return False | ||||
status, version = runtime.getLocalRuntimeVersion() | ||||
if status != runtime.cudaError_t.cudaSuccess: | ||||
raise RuntimeError("Could not get CUDA runtime version.") | ||||
major = version // 1000 | ||||
minor = (version % 1000) // 10 | ||||
return (major, minor) < min_version | ||||
|
||||
|
||||
def is_device_version_less_than(min_version=(7, 0)): | ||||
def is_device_version_less_than(min_version): | ||||
""" | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on the function name, this should not have a default value. It appears this is only used once, to guard against use on Pascal. However, we dropped Pascal a year ago. Can we remove this guard on the test? Then, can we delete this function since it is unused?
|
||||
Returns True if the version of CUDA being used is less than min_version | ||||
""" | ||||
major_version = getDeviceAttribute( | ||||
cudaDeviceAttr.cudaDevAttrComputeCapabilityMajor, 0 | ||||
) | ||||
minor_version = getDeviceAttribute( | ||||
cudaDeviceAttr.cudaDevAttrComputeCapabilityMinor, 0 | ||||
) | ||||
if major_version > min_version[0]: | ||||
return False | ||||
if major_version < min_version[0]: | ||||
return True | ||||
if minor_version < min_version[1]: | ||||
return True | ||||
return False | ||||
status, device_id = runtime.cudaGetDevice() | ||||
if status != runtime.cudaError_t.cudaSuccess: | ||||
raise RuntimeError("Could not get CUDA device.") | ||||
status, device_prop = runtime.cudaGetDeviceProperties(device_id) | ||||
if status != runtime.cudaError_t.cudaSuccess: | ||||
raise RuntimeError("Could not get CUDA device properties.") | ||||
return (device_prop.major, device_prop.minor) < min_version | ||||
|
||||
|
||||
def get_device_memory_info(): | ||||
""" | ||||
Returns the total amount of global memory on the device in bytes | ||||
""" | ||||
meminfo = cuda.current_context().get_memory_info() | ||||
return meminfo[1] | ||||
status, device_id = runtime.cudaGetDevice() | ||||
if status != runtime.cudaError_t.cudaSuccess: | ||||
raise RuntimeError("Could not get CUDA device.") | ||||
status, device_prop = runtime.cudaGetDeviceProperties(device_id) | ||||
if status != runtime.cudaError_t.cudaSuccess: | ||||
raise RuntimeError("Could not get CUDA device properties.") | ||||
return device_prop.totalGlobalMem | ||||
|
||||
|
||||
# FIXME: if G is a Nx type, the weight attribute is assumed to be "weight", if | ||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented what was here before, but I would double-check this logic: are our notebooks still failing on Ampere and newer? Does this check need to be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'm in favor of committing what's here now then checking the notebooks that use the "# Does not run on Ampere" comment on Ampere to see if it's still needed. If not needed, we can have a followup PR to remove it and/or the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rlratzel Sounds good. I will follow up with a PR that removes this, and our CI will cover the check. Our ARM runners use Ampere.