Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize compute_kda_ma for memory and speed #857

Merged
merged 23 commits into from
Jan 10, 2024
Merged

Optimize compute_kda_ma for memory and speed #857

merged 23 commits into from
Jan 10, 2024

Conversation

adelavega
Copy link
Member

@adelavega adelavega commented Jan 5, 2024

  • Implement "summary_array" return type for MKDAKernel, which convolves kernels to coordinates in a 3D dense volume, summing counts in place, saving substantial memory and compute.
  • Using numba to speed up sphere kernel convolution
  • @jdkent set types to int to reduce memory usage
  • Minor improvements throughout

For large-scale MKDAChi2 (i.e. using Neurosynth dataset), memory footprint is reduced ~18-20x (25GB to 1.2GB), and computation is sped up ~3.2x.

Copy link

codecov bot commented Jan 5, 2024

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (efae75e) 88.48% compared to head (a89c16d) 88.22%.
Report is 1 commits behind head on main.

❗ Current head a89c16d differs from pull request most recent head 6236f6c. Consider uploading reports for the commit 6236f6c to get more accurate results

Files Patch % Lines
nimare/meta/utils.py 67.44% 14 Missing ⚠️
nimare/meta/kernel.py 87.50% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #857      +/-   ##
==========================================
- Coverage   88.48%   88.22%   -0.26%     
==========================================
  Files          48       48              
  Lines        6337     6342       +5     
==========================================
- Hits         5607     5595      -12     
- Misses        730      747      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@adelavega
Copy link
Member Author

@jdkent these improvements make a small difference to run speed, but ultimately it's a pretty hard thing to optimize.

ultimately, it's just slow and fairly expensive process

@adelavega adelavega changed the title WIP: Optimize compute_kda_ma Optimize compute_kda_ma for memory and speed Jan 5, 2024
adelavega and others added 7 commits January 5, 2024 17:18
* Resolve merge

* Add sum aross studies

* Remove @Profile

* Only enable sum across studies for MKDA Chi Squared

* Run black

* Return dense for MKDACHiSquared

* Update nimare/meta/utils.py

Co-authored-by: James Kent <[email protected]>

* Run black

* Update nimare/meta/utils.py

Co-authored-by: James Kent <[email protected]>

* Format suggestion

* change how number of studies and active voxels are found

* add explicit dtype when creating image

* make the comment clearer

* add the kernel argument to the dictionary

* bump minimum required versions

* alternative way to sum across studies in a general way

* fix arguments and style

* pin minimum seaborn version

---------

Co-authored-by: Alejandro de la Vega <[email protected]>
Co-authored-by: James Kent <[email protected]>
@adelavega
Copy link
Member Author

Even more speed increases by indexing the ijks array within numba

Copy link
Member

@jdkent jdkent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! it's added and tested. @JulioAPeraza will be adding results for additional confirmation the results looks the same using the updated method.

@adelavega
Copy link
Member Author

Awesome. Let's merge once everything passes, and cut a release soon.

@jdkent jdkent merged commit 1fa0603 into main Jan 10, 2024
17 checks passed
@JulioAPeraza
Copy link
Collaborator

@adelavega, the new changes are fantastic! In the past, I could not train the decoder with more than 16 cores.

I used 40 cores this time, and it took 11 minutes to train an LDA-based decoder with 200 topics, and it only required 1.7 GB of memory.

@jdkent, I compared the decoders' results, and the output maps' values are exactly the same.

I also tested the new changes with different dataset sizes based on Neurosynth. See below:

MKDAChi2_comparison

@adelavega
Copy link
Member Author

Julio, thank for your running this comparison. This is fantastic :)
This should make it much easier for people to replicate the results of your new paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants