-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making an anndata object to predict ATAC values at per-base or bin level #83
Comments
Hey Matthew, You're not missing anything, it's simply not implemented (yet). Currently CREsted only works with sequence to scalar predictions. You're right, you run into problems with the anndata dimensionality but you can get around these in a couple of different ways (storing in adata layers instead, wrapping the anndata, using mudata, etc...). You'll notice quickly though that this is only the tip of the iceberg, since you'll also run into issues with the dataloaders, calculation of the contribution scores, etc further along the CREsted pipeline. It's definitely useful though which is why we're actively working on this exact feature, but I can't say yet when we would be releasing this to the public. |
That's great, I did some thinking about how to achieve this in a seamless manner. My thought is to create a custom class which spoofs anndata into thinking it's a 2D matrix of the correct size, but when an element indexed, it retrieves row elements from a hdf5 backed matrix (might need anndata shadows or something to prevent adata.uns['tracks'] from loading into memory), effectively using lazy matrix as the third dimension of an on-disk tensor.
Not sure if that is helpful, but thought I'd share because it seems like an interesting problem. |
Nice, that's a quite elegant solution. I need to look at this a bit more in depth and compare to how we we're implementing it now. I'll get back to you soon on this, thanks for sharing! |
Description of feature
I'm looking at your
import_bigwigs
function, and thinking about how to go beyond ATAC value summary statistics like 'mean', 'max', 'count', or 'logcount'. I assumed it would be as easy as adding the below to the_extract_values_from_bigwig
i/o function:However on reflection, predicting at a per-base or bin level means the AnnData.X would need to be 3-dimensional, and I don't see a way to work around anndata's limitations there.
Am I missing something? Is there no reason to try to ATAC values at a per-base or per-bin level? Is there some other work-around you have in crested that I've missed?
Thanks for your input :)
Matthew
The text was updated successfully, but these errors were encountered: