-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PyNVML 12 #345
Use PyNVML 12 #345
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One wheel-tests job is failing with segfaults.
parser.c:2305 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning)
parser.c:1339 UCX ERROR Invalid value for SEG_SIZE: 'invalid-size'. Expected: memory units: [b|kb|mb|gb], "inf", or "auto"
Fatal Python error: Segmentation fault
full stack trace (click me)
Fatal Python error: Segmentation fault
Thread 0x00007f215df596c0 (most recent call first):
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/ucxx/_lib_async/notifier_thread.py", line 37 in _notifierThread
File "/pyenv/versions/3.10.16/lib/python3.10/threading.py", line 953 in run
File "/pyenv/versions/3.10.16/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
File "/pyenv/versions/3.10.16/lib/python3.10/threading.py", line 973 in _bootstrap
Current thread 0x00007f216eea8b80 (most recent call first):
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/ucxx/_lib_async/endpoint.py", line 34 in _finalizer
File "/pyenv/versions/3.10.16/lib/python3.10/weakref.py", line 591 in __call__
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/ucxx/_lib_async/listener.py", line 209 in _listener_handler_coroutine
File "/pyenv/versions/3.10.16/lib/python3.10/asyncio/events.py", line 80 in _run
File "/pyenv/versions/3.10.16/lib/python3.10/asyncio/base_events.py", line 1909 in _run_once
File "/pyenv/versions/3.10.16/lib/python3.10/asyncio/base_events.py", line 603 in run_forever
File "/pyenv/versions/3.10.16/lib/python3.10/asyncio/base_events.py", line 636 in run_until_complete
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pytest_asyncio/plugin.py", line 906 in inner
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/python.py", line 1792 in runtest
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pytest_asyncio/plugin.py", line 440 in runtest
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/runner.py", line 262 in <lambda>
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/runner.py", line 341 in from_call
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/runner.py", line 222 in call_and_report
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/runner.py", line 133 in runtestprotocol
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/main.py", line 350 in pytest_runtestloop
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/main.py", line 325 in _main
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/main.py", line 271 in wrap_session
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/main.py", line 318 in pytest_cmdline_main
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/config/__init__.py", line 169 in main
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/_pytest/config/__init__.py", line 192 in console_main
File "/pyenv/versions/3.10.16/lib/python3.10/site-packages/pytest/__main__.py", line 5 in <module>
File "/pyenv/versions/3.10.16/lib/python3.10/runpy.py", line 86 in _run_code
File "/pyenv/versions/3.10.16/lib/python3.10/runpy.py", line 196 in _run_module_as_main
Could that be a result of these changes?
Given the previous commit passed and the last one is simply merging in the latest from |
@pentschev does the error above in James' comment look familiar to you? |
Those are normal when we test invalid arguments, they are errors that we expect to catch.
That is unfortunately known, I thought it had been fixed a while ago, had not seen this one in particular for weeks, it seems there's still something flaky. 😞 -- Nevertheless, it should not be related, so triggered a rerun. |
Alright thanks @pentschev , I'll approve this then and we can merge if CI passes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @jakirkham .
Thanks Peter and James! 🙏 Looks like the rerun worked 🎉 Let's keep an eye on it and we can follow up as needed |
/merge |
Bump
pynvml
from11
to12
. This version ofpynvml
also now depends onnvidia-ml-py
for core functionality.