Skip to content

Commit

Permalink
Merge pull request #528 from sjspielman/sjspielman/517-module-gha
Browse files Browse the repository at this point in the history
Add module GHA doc
  • Loading branch information
sjspielman authored Jun 18, 2024
2 parents a4156b3 + eb394c7 commit 002e013
Show file tree
Hide file tree
Showing 6 changed files with 61 additions and 3 deletions.
2 changes: 2 additions & 0 deletions .github/components/dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ fibroblasts
formatters
Generis
GFM
GHA
GHAs
GitHub
GitKraken
GitKraken's
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@ The `create-analysis-module.py` script will also create two additional files bes
These files, stored in the repository folder `.github/workflows`, are [GitHub Action workflow files](https://docs.github.com/en/actions) that the OpenScPCA project uses to ensure module reproducibility.
The workflows are disabled by default.

- `run_{my-module-name}.yml` contains a skeleton workflow for running the analysis module
- `run_{my-module-name}.yml` contains a skeleton workflow for testing the analysis module.
[Learn more about module testing workflows here.](../../ensuring-repro/workflows/run-module-gha.md)
- `docker_{my-module-name}.yml` contains a skeleton workflow for building the analysis module's [Dockerfile](../../ensuring-repro/docker/index.md)
Please [commit these files](../working-with-git/making-commits.md) as part of your first [pull request](../creating-pull-requests/index.md), and we'll take care of the rest!
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

All modules should contain [clear documentation in the `README.md` file](documenting-analysis.md) about how to run them, including:

- Information about software and compute requirements
- Information about [software dependencies](./module-dependencies.md) and [compute requirements](./compute-requirements.md)
- What command(s) to issue to run the full module

!!! note
Expand Down
3 changes: 3 additions & 0 deletions docs/ensuring-repro/workflows/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# OpenScPCA workflows

_Content forthcoming._
49 changes: 49 additions & 0 deletions docs/ensuring-repro/workflows/run-module-gha.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Automated module testing

To maintain module functionality over time, we use [GitHub Actions](https://docs.github.com/en/actions) (GHAs) to periodically run each module ("module testing GHA") with the goal of testing that the module code runs to completion without errors.

!!! info
For more information about how we run modules to generate official results and OpenScPCA releases, please see our documentation on the `OpenScPCA-nf` workflow. <!-- openscpca-nf STUB_LINK -->

Module testing GHAs are automatically run in two circumstances:

- When a pull request is filed with changes to any module files
- This GHA will need to pass without errors for [pull requests](../../contributing-to-analyses/pr-review-and-merge/index.md) to be approved
- On a periodic schedule
- This ensures that changes in data or other code do not break tests within each module

For examples of existing analysis module GHAs, see the example Python and R module GHAs, [`run_hello-python.yml`](https://github.com/AlexsLemonade/OpenScPCA-analysis/blob/main/.github/workflows/run_hello-python.yml) and [`run_hello-R.yml`](https://github.com/AlexsLemonade/OpenScPCA-analysis/blob/main/.github/workflows/run_hello-R.yml), respectively.

To make GHAs run efficiently, the tests should run the module code with the [simulated test data](../../getting-started/accessing-resources/getting-access-to-data.md#accessing-test-data).
This means that it's important to write your module code with sufficient flexibility to allow for test data to be used.
You should read in files from the `data/current` directory, which will be automatically directed to test data during module testing GHA runs.

In addition, it's also helpful for your module to have a single entry point for running all module scripts and/or notebooks in their intended order, e.g. a [shell script](../../contributing-to-analyses/analysis-modules/running-a-module.md).
This way, the module testing GHA can directly call this script to execute the entire module.


## Writing a module testing GHA

!!! tip
The Data Lab will generally maintain and write module testing GHAs, but you are welcome to do so as well if you are interested!
See this GitHub documentation to learn about [workflow syntax for GHAs](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions).

When you [create a new module](../../contributing-to-analyses/analysis-modules/creating-a-module.md), a GHA workflow file is created in the file `.github/workflows/run_{module-name}.yml`.
This initial file is inactive, meaning it will not run automatically run on the two aforementioned triggers.
As analysis module begins to mature over time, the Data Lab staff will activate this workflow file so the module can be regularly tested.

### GHA steps

Each module testing GHA is initially created with these steps, which should be updated to reflect the given module's needs:

- Checkout the repository
- Download test data
- Use the [`download-data.py`](../../getting-started/accessing-resources/getting-access-to-data.md#using-the-download-data-script) and/or [`download-results.py`](../../getting-started/accessing-resources/getting-access-to-data.md#accessing-scpca-module-results) scripts to specify the set of input files you need, with the `--test-data` flag to specify downloading the test data.
- After this step, the `data/current` directory will point to the test data, ensuring the module GHA runs using the test data.
- Set up the module environment
- Depending on [the flags used when creating your module](../../contributing-to-analyses/analysis-modules/creating-a-module.md#module-creation-script-flags), this will steps steps needed to install the [`renv` and/or conda environment](../managing-software/index.md) from existing environment files (`renv.lock` and/or `conda-lock.yml`, respectively).
- Run the analysis module
- Generally, this will involve calling the [module's run script](../../contributing-to-analyses/analysis-modules/running-a-module.md).

As an analysis module matures, the GHA will be updated to run the analysis in the module's Docker image, rather than using the `renv` and/or conda environment files.
Module testing GHAs can use their module's Docker images once the image has been built and pushed to the registry. <!-- STUB LINK building/updating docker images -->
5 changes: 4 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,10 @@ nav:
- contributing-to-analyses/pr-review-and-merge/index.md
- contributing-to-analyses/pr-review-and-merge/respond-to-review.md
- Ensuring reproducibility:
- index.md
- ensuring-repro/index.md
- OpenScPCA workflows:
- ensuring-repro/workflows/index.md
- ensuring-repro/workflows/run-module-gha.md
- Managing module software:
- ensuring-repro/managing-software/index.md
- ensuring-repro/managing-software/using-renv.md
Expand Down

0 comments on commit 002e013

Please sign in to comment.