Skip to content

Commit

Permalink
Merge pull request #572 from European-XFEL/doc/architecture-update
Browse files Browse the repository at this point in the history
Update architecture page in docs
  • Loading branch information
takluyver authored Nov 22, 2024
2 parents 935894f + d60620a commit c2153c6
Show file tree
Hide file tree
Showing 3 changed files with 34 additions and 27 deletions.
41 changes: 23 additions & 18 deletions docs/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,35 +9,39 @@ Architecture
Objects
-------

The :class:`.DataCollection` class is the central piece of EXtra-data. It
represents a collection of XFEL data sources and their keys, for a set of train
IDs. It refers to data in one or more files (a run directory is often the
starting point). A subset of its sources/keys or train IDs may be selected to
make a new, more restricted :class:`.DataCollection`.

:class:`.KeyData` represents data for a single source & key, selected from a
``DataCollection`` like ``run[source, key]``. This data may still be spread
across several files. The data can be loaded into a NumPy array, among other
types.

:class:`.FileAccess` manages access to a single EuXFEL format HDF5 file,
including caching index information. There should only be one ``FileAccess``
object per file on disk, even if multiple ``DataCollection`` and ``KeyData``
objects refer to it.
There are three classes making up the core API of EXtra-data:

- :class:`.DataCollection` is what you get from :ref:`opening a run or file
<opening-files>`: data for several sources over some range of pulse trains
(i.e. time). It has methods to :ref:`select a subset of that data
<selecting-combining>`.
- :class:`.SourceData` comes from ``run[source]``, representing one source, such
as a motor or a detector module. Each source has a set of keys.
- :class:`.KeyData` comes from ``run[source, key]``, representing data for a
single source & key. This has a dtype and a shape like a NumPy array, but
the data is not in memory. It has methods to load the data as a NumPy array,
an Xarray DataArray, or a Dask array.

Component classes for :doc:`multi-module detectors <agipd_lpd_data>` build on
top of this core to work more conveniently with major data sources. There are
more component classes in the `EXtra package <https://extra.readthedocs.io/en/latest/>`_.

:class:`.FileAccess` is a lower-level class to manage access to a single
:doc:`EuXFEL format HDF5 file <data_format>`, including caching index information.
There should only be one ``FileAccess`` object per file on disk, even if
multiple ``DataCollection``, ``SourceData`` and ``KeyData`` objects refer to it.

Modules
-------

- ``cli`` contains command-line interfaces (so far only
:ref:`cmd-make-virtual-cxi`).
- ``cli`` contains command-line interfaces.
- ``components`` provides interfaces that bring together data from several
similar sources, i.e. multi-module detectors where each module is a separate
source.
- ``exceptions`` defines some custom error classes.
- ``export`` sends data from files over ZMQ in the Karabo Bridge format.
- ``file_access`` contains :class:`.FileAccess` (described above), along with
machinery to keep the number of open files under a limit.
- ``h5index`` lists datasets in an HDF5 file. Deprecated.
- ``keydata`` contains :class:`.KeyData` (described above).
- ``locality`` can check whether files are available on disk or on tape
in a `dCache <https://www.dcache.org/>`_ filesystem.
Expand All @@ -47,6 +51,7 @@ Modules
- ``read_machinery`` is a collection of pieces that support ``reader``.
- ``run_files_map`` manages caching metadata about the files of a run in a
JSON file, to speed up opening the run.
- ``sourcedata`` contains :class:`.SourceData` (described above).
- ``stacking`` has functions for stacking multiple arrays into one, another
option for working with multi-module detector data.
- ``utils`` is miscellaneous pieces that don't fit anywhere else.
Expand Down
11 changes: 6 additions & 5 deletions docs/reading_files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,15 @@ Opening files
-------------

You will normally access data from a run, which is stored as a directory
containing HDF5 files. You can open a run using :func:`RunDirectory` with the
path of the directory, or using :func:`open_run` with the proposal number and
run number to look up the standard data paths on the Maxwell cluster.
containing HDF5 files. You can open a run by proposal & run number using
:func:`open_run`, or from a directory path using :func:`RunDirectory`.

.. module:: extra_data

.. autofunction:: RunDirectory

.. autofunction:: open_run

.. autofunction:: RunDirectory

You can also open a single file. The methods described below all work for either
a run or a single file.

Expand Down Expand Up @@ -313,6 +312,8 @@ they will read data for all sources in the run, which may be very slow.

.. automethod:: train_from_index

.. _selecting-combining:

Selecting & combining data
--------------------------

Expand Down
9 changes: 5 additions & 4 deletions extra_data/reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -1882,13 +1882,13 @@ def RunDirectory(
path, include='*', file_filter=locality.lc_any, *, inc_suspect_trains=True,
parallelize=True, _use_voview=True,
):
"""Open data files from a 'run' at European XFEL.
"""Open a European XFEL run directory.
::
run = RunDirectory("/gpfs/exfel/exp/XMPL/201750/p700000/raw/r0001")
A 'run' is a directory containing a number of HDF5 files with data from the
A run directory contains a number of HDF5 files with data from the
same time period.
Returns a :class:`DataCollection` object.
Expand Down Expand Up @@ -1950,13 +1950,14 @@ def open_run(
inc_suspect_trains=True, parallelize=True, aliases=DEFAULT_ALIASES_FILE,
_use_voview=True,
):
"""Access EuXFEL data on the Maxwell cluster by proposal and run number.
"""Access European XFEL data by proposal and run number.
::
run = open_run(proposal=700000, run=1)
Returns a :class:`DataCollection` object.
Returns a :class:`DataCollection` object. This finds the run directory in
standard paths on EuXFEL infrastructure.
Parameters
----------
Expand Down

0 comments on commit c2153c6

Please sign in to comment.