-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make nvToolsExt conditional on WITH_CUDA_PROFILING #428
Conversation
d2bb331
to
a14e7b1
Compare
retest this please |
Can one of the admins verify this patch? |
Codecov Report
@@ Coverage Diff @@
## develop #428 +/- ##
=======================================
Coverage 63.1% 63.1%
=======================================
Files 86 86
Lines 25625 25625
=======================================
Hits 16190 16190
Misses 9435 9435
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
GNU test by hand:
That's +6% up from https://object.cscs.ch/v1/AUTH_40b5d92b316940098ceb15cf46fb815e/dbcsr-artifacts/logs/build-679/gnu.test.out, not sure if significant |
From CI: 2783.59s 😕 +30% |
Curious, cause I ran it like this:
and
seems to be on the right commit. |
Indeed, but can we remove now the output and see if it goes down? (Hans trick) |
Wait a second, all tests are now slower. Comparison at old: https://object.cscs.ch/v1/AUTH_40b5d92b316940098ceb15cf46fb815e/dbcsr-artifacts/logs/build-679/gnu.test.out For example, the |
No, I can't... But I start to think it is a Daint issue too... |
What comes to mind is a recent bug I've seen in slurm after the latest Daint upgrade (fixed on Dom, but not yet on Daint) about thread pinning when using
However, I couldn't trigger that bug with
|
Well, this is an interesting consideration (affinity). We actually tested it years ago...
(xthi does print the affinity mask) and I get:
This is wrong! We run on the HT cores.
Instead of 3.
instead of Could you do that? Let's see if we can go faster... |
Ok, I'm back at this again.
How do you know which affinity numbers correspond to which physical core? |
0-11 are the physical core, 12-23 are the HT cores. Therefore, 0 and 12 are sitting on the same core.
where you get something like:
Sor for Core L#0 you have P#0 and P#12... |
The code I use to check the affinity (xthi) is at https://github.com/olcf/XC30-Training/blob/master/affinity/Xthi.c |
Right. I've checked on Dom which runs
indeed no multithreading. But
is not the way to do it apparently. On Daint I can do this:
and it behaves as expected (dropping --exclusive because of the bug). I think we can just drop |
retest this please |
Should I open another issue with cray / slurm people about |
retest this please |
This is a known problem, sorry for that... |
2377.19 sec for GNU now, that looks better, but it's still +12% w.r.t. https://object.cscs.ch/v1/AUTH_40b5d92b316940098ceb15cf46fb815e/dbcsr-artifacts/logs/build-679/gnu.test.out... |
Let me try 21dae0f by hand with the current state of daint for refence. |
To wrap this up: I've run the CI script once more by hand on the latest commit from this PR for GNU, and I'm getting 2155s, which is in 2% of the previous CI runs, so seems fine. And the same for the old commit 21dae0f with the current state of Daint gives me 2235s with some tests that are just flaky (maybe bad thread pinning then) |
As a conclusion, the execution of the CI on Daint is still slower than before. Old: 1/23 Test #1: dbcsr_perf:inputs/test_H2O.perf ....................... Passed 4.67 sec I've opened #429 to track the issue. I will investigate... |
As pointed out by @hfp in #425