Multi-module KeyData interface #337

takluyver · 2022-08-18T09:48:07Z

This brings an API similar to KeyData to multi-module detector data, so instead of agipd.get_array('image.data') you can do something like agipd['image.data'].xarray(). I hope this will make using EXtra-data more consistent, at least once people are used to using these newer APIs.

This also adds the ability to get an unlabelled numpy array for multi-module detector data; previously you had to get an xarray, even if you didn't want the labels. I still think dimension labels are a good idea, but xarray is certainly an extra level of complexity, and many people are more familiar with numpy arrays.

Finally, this changes how Dask arrays are created for multi-module data - rather than making chunks directly based on the files, so each frame is split across multiple chunks, it now uses the .split_trains() method to break data into chunks to load, so each Dask chunk spans all modules. This simplifies the task graph for the common case where you want to process data frame-wise with all modules together. I had hoped that this would solve all our issues with Dask and make #333 obsolete, but unfortunately that doesn't seem to be the case - we might want some variant of that PR as well.

lgtm-com · 2022-08-18T10:22:24Z

This pull request introduces 1 alert when merging d750db7 into 1ba78dd - view on LGTM.com

new alerts:

1 for Module is imported with 'import' and 'import from'

lgtm-com · 2022-08-18T10:53:04Z

This pull request introduces 1 alert when merging a138efa into 1ba78dd - view on LGTM.com

new alerts:

1 for Module is imported with 'import' and 'import from'

lgtm-com · 2022-08-18T15:42:00Z

This pull request introduces 1 alert when merging fd4ad9a into 1ba78dd - view on LGTM.com

new alerts:

1 for Module is imported with 'import' and 'import from'

philsmt

Thanks for this nice PR. Indeed the components API did feel slightly out of touch, especially when using them the first time after KeyData got introduced. This will nicely fill the gap.

Given the fairly complex axes-shifting and index-juggling machinery in here, I don't feel able to properly review logic as such. But I tried to give some constructive comments or ideas here or there.

extra_data/tests/test_components.py

extra_data/components.py

lgtm-com · 2022-10-28T15:29:10Z

This pull request introduces 1 alert when merging 8c1b922 into 9b36ce4 - view on LGTM.com

new alerts:

1 for Module is imported with 'import' and 'import from'

extra_data/components.py

philsmt · 2023-02-20T17:11:18Z

Thank you, LGTM!

takluyver · 2023-02-21T08:59:46Z

Thanks for the review 👍

takluyver added 8 commits August 16, 2022 18:23

Start creating KeyData interface for multi-module detector data

5d60214

Add .dtype, .split_trains() and .select_trains() for multi-mod KeyData

b416dfe

Add .dask_array to multi-mod KeyData interface

7603a28

Remove debugging print()

1d76005

Fix for unstacking train/pulse axis

8e0fbd0

Fix renaming index levels on labelled Dask array

0735d0c

fixup! Fix renaming index levels on labelled Dask array

7d3c505

Allow setting frames_per_chunk for dask_array()

d750db7

takluyver added the enhancement New feature or request label Aug 18, 2022

takluyver added 3 commits August 18, 2022 11:33

Fix renaming index levels for labelled Dask array

a138efa

Default to unlablled array for .dask_array() for consistency

3c125f6

Only import xarray where it's used

c510a2d

Another try at fixing up xarray wrapper

fd4ad9a

takluyver mentioned this pull request Sep 8, 2022

Options to read & decompress data in parallel #340

Draft

takluyver marked this pull request as ready for review September 22, 2022 12:58

philsmt reviewed Oct 26, 2022

View reviewed changes

Remove debugging print()

8c1b922

takluyver added 2 commits January 30, 2023 16:22

More careful checking of pulse selection type

336e9f6

Rename _shape -> buffer_shape

9bd8f08

github-advanced-security bot found potential problems Jan 30, 2023

View reviewed changes

extra_data/components.py Fixed Show fixed Hide fixed

Better way to mark type of attribute

639dc2b

takluyver added this to the 1.13 milestone Feb 21, 2023

takluyver merged commit 7f80524 into master Feb 21, 2023

takluyver deleted the multimod-keydata branch February 21, 2023 08:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-module KeyData interface #337

Multi-module KeyData interface #337

takluyver commented Aug 18, 2022

lgtm-com bot commented Aug 18, 2022

lgtm-com bot commented Aug 18, 2022

lgtm-com bot commented Aug 18, 2022

philsmt left a comment

lgtm-com bot commented Oct 28, 2022

philsmt commented Feb 20, 2023

takluyver commented Feb 21, 2023

Multi-module KeyData interface #337

Multi-module KeyData interface #337

Conversation

takluyver commented Aug 18, 2022

lgtm-com bot commented Aug 18, 2022

lgtm-com bot commented Aug 18, 2022

lgtm-com bot commented Aug 18, 2022

philsmt left a comment

Choose a reason for hiding this comment

lgtm-com bot commented Oct 28, 2022

philsmt commented Feb 20, 2023

takluyver commented Feb 21, 2023