-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize compute_kda_ma for memory and speed #857
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #857 +/- ##
==========================================
- Coverage 88.48% 88.22% -0.26%
==========================================
Files 48 48
Lines 6337 6342 +5
==========================================
- Hits 5607 5595 -12
- Misses 730 747 +17 ☔ View full report in Codecov by Sentry. |
@jdkent these improvements make a small difference to run speed, but ultimately it's a pretty hard thing to optimize. ultimately, it's just slow and fairly expensive process |
* Resolve merge * Add sum aross studies * Remove @Profile * Only enable sum across studies for MKDA Chi Squared * Run black * Return dense for MKDACHiSquared * Update nimare/meta/utils.py Co-authored-by: James Kent <[email protected]> * Run black * Update nimare/meta/utils.py Co-authored-by: James Kent <[email protected]> * Format suggestion * change how number of studies and active voxels are found * add explicit dtype when creating image * make the comment clearer * add the kernel argument to the dictionary * bump minimum required versions * alternative way to sum across studies in a general way * fix arguments and style * pin minimum seaborn version --------- Co-authored-by: Alejandro de la Vega <[email protected]> Co-authored-by: James Kent <[email protected]>
Even more speed increases by indexing the ijks array within numba |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! it's added and tested. @JulioAPeraza will be adding results for additional confirmation the results looks the same using the updated method.
Awesome. Let's merge once everything passes, and cut a release soon. |
@adelavega, the new changes are fantastic! In the past, I could not train the decoder with more than 16 cores. I used 40 cores this time, and it took 11 minutes to train an LDA-based decoder with 200 topics, and it only required 1.7 GB of memory. @jdkent, I compared the decoders' results, and the output maps' values are exactly the same. I also tested the new changes with different dataset sizes based on Neurosynth. See below: |
Julio, thank for your running this comparison. This is fantastic :) |
MKDAKernel
, which convolves kernels to coordinates in a 3D dense volume, summing counts in place, saving substantial memory and compute.For large-scale
MKDAChi2
(i.e. using Neurosynth dataset), memory footprint is reduced ~18-20x (25GB to 1.2GB), and computation is sped up ~3.2x.