Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] Fix LTR with empty partition and NCCL error. #11152

Merged
merged 5 commits into from
Jan 10, 2025

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Jan 8, 2025

close #11147 .

still have some flaky errors about timeout. But this should fix the hang caused by the LTR test.

Ref #11154 .

@trivialfis trivialfis requested a review from hcho3 January 8, 2025 19:11
@trivialfis
Copy link
Member Author

Hmm, segfault at a completely unknown location:

(gdb) bt
#0  0x00007074406efce5 in ?? ()
#1  0x00007074673ffe00 in ?? ()
#2  0x00007074673ffdf8 in ?? ()
#3  0x00007074674005c0 in ?? ()
#4  0x0000000000000000 in ?? ()

@trivialfis
Copy link
Member Author

Something funky in the dlopen.

@trivialfis trivialfis changed the title [dask] Fix LTR with empty partition. [dask] Fix LTR with empty partition and NCCL error. Jan 9, 2025
@trivialfis
Copy link
Member Author

@hcho3 This should fix existing MGPU errors. There are some other flaky errors in CPU tests, I will followup in different PRs. Please help take a look when you are available.

@trivialfis trivialfis merged commit 51b3c46 into dmlc:master Jan 10, 2025
58 checks passed
@trivialfis trivialfis deleted the dask-ltr-empty branch January 10, 2025 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] Dask test hangs.
2 participants