-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-module KeyData interface #337
Conversation
This pull request introduces 1 alert when merging d750db7 into 1ba78dd - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging a138efa into 1ba78dd - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging fd4ad9a into 1ba78dd - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this nice PR. Indeed the components
API did feel slightly out of touch, especially when using them the first time after KeyData
got introduced. This will nicely fill the gap.
Given the fairly complex axes-shifting and index-juggling machinery in here, I don't feel able to properly review logic as such. But I tried to give some constructive comments or ideas here or there.
This pull request introduces 1 alert when merging 8c1b922 into 9b36ce4 - view on LGTM.com new alerts:
|
Thank you, LGTM! |
Thanks for the review 👍 |
This brings an API similar to KeyData to multi-module detector data, so instead of
agipd.get_array('image.data')
you can do something likeagipd['image.data'].xarray()
. I hope this will make using EXtra-data more consistent, at least once people are used to using these newer APIs.This also adds the ability to get an unlabelled numpy array for multi-module detector data; previously you had to get an xarray, even if you didn't want the labels. I still think dimension labels are a good idea, but xarray is certainly an extra level of complexity, and many people are more familiar with numpy arrays.
Finally, this changes how Dask arrays are created for multi-module data - rather than making chunks directly based on the files, so each frame is split across multiple chunks, it now uses the
.split_trains()
method to break data into chunks to load, so each Dask chunk spans all modules. This simplifies the task graph for the common case where you want to process data frame-wise with all modules together. I had hoped that this would solve all our issues with Dask and make #333 obsolete, but unfortunately that doesn't seem to be the case - we might want some variant of that PR as well.