Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ifpack2/KokkosKernels? Performance regression with 4.3.1 in AdditiveSchwarz initialization with overlap #13013

Closed
brian-kelley opened this issue May 15, 2024 · 8 comments
Assignees
Labels
client: Sierra All issues that primarily impacts SNL Sierra codes impacting: performance pkg: Ifpack2 type: bug The primary issue is a bug in Trilinos code or tests

Comments

@brian-kelley
Copy link
Contributor

Between 45d800 and 53b714, the "remainder" component of preconditioner initialization (AdditiveSchwarz + ILUk, overlap level 1) slowed down significantly on eclipse and amber (for example, 14.2->17.9s on amber for one abnormal energy problem). The remainder is supposed to not include any of the expensive things like RILUK setup, filter construction or overlapping row matrix construction.

@brian-kelley brian-kelley added type: bug The primary issue is a bug in Trilinos code or tests pkg: Ifpack2 impacting: performance labels May 15, 2024
@brian-kelley brian-kelley self-assigned this May 15, 2024
Copy link

Automatic mention of the @trilinos/ifpack2 team

@brian-kelley brian-kelley added the client: Sierra All issues that primarily impacts SNL Sierra codes label May 15, 2024
Copy link

Automatic mention of the @trilinos/ifpack2 team

@csiefer2
Copy link
Member

csiefer2 commented May 15, 2024

@kliegeois @brian-kelley We saw substantial increases in read off disk time for the sparc problem as well, though we couldn't isolate that to the Kokkos/KK promotion.

Kim reworked the test to remove the disk read from Remainder

@brian-kelley
Copy link
Contributor Author

@csiefer2 Sounds like that should be a separate issue, if it's reproducible and caused by a code change.
Good that we're not measuring that in the stacked timer anymore though.

@kliegeois
Copy link
Contributor

@csiefer2 to be precise, the disk read was not in the remainder before #12997.
The PR removed the allocation of one vector (and its initialization to zero), computations of initial norms, conversion of line_info vector to parts arrays, and a block spmv used to make sure that the data moved from host to device.

@brian-kelley
Copy link
Contributor Author

brian-kelley commented May 29, 2024

@csiefer2 @kliegeois I think i figured both of these things out (but they're not related)

  • This issue appears to be 100% due to software environment changes on amber and eclipse. On amber we had to explicitly chang the compiler we use to intel 24 because intel 21 went away, and this happened on the exact day of the spike. Not sure about the details on eclipse. I wasn't able to replicate the old (good) time on either machine. Both the good and bad SHAs above with the current environment give the (bad) time on both machines.
  • The sparc single_gpu remainder slowdown was due to cusparse BSR spmv being extremely slow on the first call. Before this didn't use the TPL at all. I'm talking to NVIDIA about what we could be done about it, but going back to our native impl on Cuda seems like a reasonable workaround.

@brian-kelley
Copy link
Contributor Author

brian-kelley commented May 29, 2024

BTW, sparc was not because host-device copying either, because replacing cusparse with native (also on GPU obviously) decreases the time of that warmup apply from 2.36s -> 0.0038s. So the matrix was already synced to device.

@brian-kelley
Copy link
Contributor Author

But I'll close this one as "not a code issue"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client: Sierra All issues that primarily impacts SNL Sierra codes impacting: performance pkg: Ifpack2 type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

3 participants