Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sum_across_studies to kda #859

Merged
merged 18 commits into from
Jan 9, 2024
Merged

Add sum_across_studies to kda #859

merged 18 commits into from
Jan 9, 2024

Conversation

adelavega
Copy link
Member

@adelavega adelavega commented Jan 9, 2024

sum_across_studies returns an array where study counts are sequentially added to a single dense matrix, rather than compiling an exhaustive matrix containg all study-level count data.

This is much more effecient memory and compute wise, enabling MKDA Chi squared with Neurosynth as reference, using <1GB of memory.

@jdkent
Copy link
Member

jdkent commented Jan 9, 2024

oh shoot, do we actually not need the individual matrices for fwe correction? Looking over the code it seems like we don't, and the first thing done for mkdachi2 is summing across all the studies.

@adelavega
Copy link
Member Author

Allright guys @JulioAPeraza @jdkent @tsalo, this is realy exciting.

For MKDAChi2 we don't actually need study level modeled activiation. This realization means we can save massive amounts of memory and substantial time (2-3x speed up over an already optimized PR I was working on).

When looping over studies, instead of saving all study level (M)KDA modeled activation (MA) maps, we can sum them in-place into a single MNI sized dense volume.

This means no need to run np.unique, which is slow, and that the data is O(1) with number of studies.

I initally enable this for all MKDAKernels, but it seems to result in incorrect statisics for MKDADensity. We could potentially implement this there as well but need to look at stats more careful. We might be able to save intermediary stats to get correct results.

@adelavega
Copy link
Member Author

On a measly Macbook Air with 8GB of RAM, I was able to run MKDAChi2 comparing the example pain dataset to all of Neurosynth in 32 seconds. Only need a little over 1 GB of RAM.

Screen Shot 2024-01-08 at 10 49 29 PM

Someone check my logic in case I'm missing something!

nimare/meta/utils.py Outdated Show resolved Hide resolved
@adelavega
Copy link
Member Author

I was able to simplify the code further by just returning a dense matrix. Saves ~2 more seconds.

I'm open to other stylistic suggestions though.

all_values += study_values

# Set voxel outside the mask to zero.
all_values[~mask_data] = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this step is functionally replacing this

sphere_idx_inside_mask = np.where(mask_data[tuple(all_spheres.T)])[0]
all_spheres = all_spheres[sphere_idx_inside_mask, :]


del ma_maps1
n_selected = self.dataset1.coordinates["id"].unique().shape[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should have _collect_ma_maps return what the n_selected value is, all the coordinates for a particular experiment could exist outside the mask and the experiment would not be included.

Copy link
Member

@jdkent jdkent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line:

https://github.com/neurostuff/NiMARE/pull/859/files#diff-953de5154c5b38a0a853444e8525f6d8edc0eef147cb69c54130eb389fc92cf5L187

need to add dtype to image creation:

img = nib.Nifti1Image(kernel_data, mask.affine, dtype=kernel_data.dtype)

this should fix some of the failing tests

@adelavega
Copy link
Member Author

Your changes look good to me, James.

@jdkent jdkent merged commit a4f434e into optimize_kda Jan 9, 2024
15 of 17 checks passed
@jdkent
Copy link
Member

jdkent commented Jan 9, 2024

fantastic work! merging it in.

jdkent added a commit that referenced this pull request Jan 10, 2024
* Simplify stacking

* Fix typo

* Remove vstack

* Fix stacking

* Remove @Profile

* Use jit for _convolve_sphere

* switch arrays to int32 where possible

* reduce memory consumption

* fix style

* Simplify numba

* Run black

* �Mask outside space in numba

* Add indicator later

* Set value to input

* Add `sum_across_studies` to kda (#859)

* Resolve merge

* Add sum aross studies

* Remove @Profile

* Only enable sum across studies for MKDA Chi Squared

* Run black

* Return dense for MKDACHiSquared

* Update nimare/meta/utils.py

Co-authored-by: James Kent <[email protected]>

* Run black

* Update nimare/meta/utils.py

Co-authored-by: James Kent <[email protected]>

* Format suggestion

* change how number of studies and active voxels are found

* add explicit dtype when creating image

* make the comment clearer

* add the kernel argument to the dictionary

* bump minimum required versions

* alternative way to sum across studies in a general way

* fix arguments and style

* pin minimum seaborn version

---------

Co-authored-by: Alejandro de la Vega <[email protected]>
Co-authored-by: James Kent <[email protected]>

* manage minimum dependencies

* Index within numba

* Only allow sum_overlap if not sum_across_studies

* Add unique index back

* Remove @Profile

* run black

* ensure the methods for creating the kernel are equivalent

---------

Co-authored-by: James Kent <[email protected]>
Co-authored-by: Alejandro de la Vega <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants