Regenerate wilms notebooks: Code updates #913

sjspielman · 2024-11-27T13:11:03Z

This is the first of 2 PRs to address #906 (but might be more if GitHub UI gets mad) to update notebooks in the cell-type-wilms-tumor-06. This PR covers a smidge of reorg and code changes that cropped up while regenerating notebooks with the 2024-11-25 data release.

I changed a few things to better handle dealing with the exploratory CNV notebooks which are not going to get regenerated:
- I created a new directory cnv-exploratory-notebooks for them and their output to live in. I also had to update the path accordingly in scripts/explore-cnv-methods.sh.
- I added a flag to the 00_run_workflow.sh script to specifically handle logic for regenerating these notebooks. Even though we will want to run other exploratory steps in the future, we almost certainly won't want to run these!
I moved 00b_characterize_fetal_kidney_reference_Stewart.Rmd out of notebook_template and into notebook, since it is not a template.
- I updated 00_run_workflow.sh with the right new path, and also I also renamed that script variable notebook_output_dir --> notebook_dir, since the directory is named "notebook".
I realized (WOMP) that scripts/06b_build-normal_reference.R was missing from the workflow when running the workflow through! The script also needed a Seurat library load which snuck past during review because this module was mostly not being run in CI. The script is now run, loads Seurat, and was styled by pre-commit.
The 03 clustering notebook had some Seurat-related bugs which appeared with the new data release: AddModuleScore doesn't work with defaults when there are fewer cells (<500), so I updates some spots to get this code running. This is now relevant because at least one sample lost about 20-30 cells.
README updates along the way

Please let me know what I can clarify!!

…05 and 06

…00 cells for certain steps

…ntly run

allyhawkins

I left a few comments about the approach here. Based on #906 (comment), I was thinking we would completely remove generating these notebooks from the workflow. Since we have a script that is already used to run two scripts and generate the notebooks then I don't think it needs to be present in the main workflow.

The other comment is about how we organize the exploratory analysis. Since the supplemental-notebooks exists, then we should put everything that's exploratory/ not run in the workflow in that folder.

allyhawkins · 2024-11-27T15:55:39Z

analyses/cell-type-wilms-tumor-06/00_run_workflow.sh

+  # These steps are only run if RUN_CNV_EXPLORATORY is true
+  if [[ $RUN_CNV_EXPLORATORY -eq 1 ]]; then
+    # Create the pooled normal reference for inferCNV
+    Rscript scripts/06b_build-normal_reference.R
+
+    # Run infercnv and copykat for a selection of samples
+    # This script calls scripts/05_copyKAT.R and scripts/06_infercnv.R and associated exploratory CNV notebooks in `cnv-exploratory-notebooks/`
+    # By default, copyKAT as called by this script uses 32 cores
+    THREADS=${THREADS} TESTING=${TESTING} ./scripts/explore-cnv-methods.sh
+  fi


I almost think we could just remove this from the main workflow rather than do this. It looks like you are just running a single script that already runs two other scripts to actually generate the notebooks so I don't see why this is needed at all.

No complaints from me.

allyhawkins · 2024-11-27T15:56:47Z

analyses/cell-type-wilms-tumor-06/README.md

+
+- `scripts` contain analysis scripts used in the module workflow.
+See `scripts/README.md` for more information
+- `notebook_template` contains R Markdown notebooks meant to be run as a template across module samples


Just a small annoyance with the fact that this is _ and not - when all the other folder names have -, but not worth the large diffs to fix 😢

It should now all be _ :)

allyhawkins · 2024-11-27T15:58:30Z

analyses/cell-type-wilms-tumor-06/README.md

+- `cnv-exploratory-notebooks` contains R Markdown notebooks and their HTML outputs specifically from exploratory steps during CNV analysis in the module workflow
+- `supplemental-notebooks` contains exploratory notebooks comparing Azimuth label transfer to an Azimuth-adapted approach which is used in this module.


So since we have this supplemental folder that contains notebooks not used as part of the workflow and are also exploratory why don't we move them into this folder instead? So you would have exploratory-notebooks as a folder and then maybe two folders inside that, one for cnv and one for azimuth?

I agree, but... for context, I originally did something sort of like this #912, and the diff resulting from moving notebooks was simply not viewable on any platform. So, I decided to let it go.....

allyhawkins · 2024-11-27T15:59:52Z

analyses/cell-type-wilms-tumor-06/scripts/README.md

Why is this file being changed? Github won't let me preview the changes but I don't think it should be changed?

Boo, GitHub.
This was changed because 06b_build-normal_reference.R wasn't documented, and I had a brief moment of recreational revision to make 06 scripts listed after 05 scripts.

Here's the diff which renders for me:

…y, and underscorify some folders. READMEs and related scripts updated to match these changes

sjspielman · 2024-11-27T17:22:34Z

Fresh changes:

All the "not part of this workflow" notebooks are now in supplemental_notebooks (check out that underscore! This is most of the new diffs - renaming this directory to have an underscore)
- This includes the cnv exploration notebooks, and an exploratory notebook which was used to classify one of the label transfer references, but indeed is wholly unused in cell typing itself.
marker-sets --> marker_sets, for more consistency. You can have all _, but not all -.
I updated the workflow script to not run anything that lives in supplemental_notebooks, and made it clearer that the RUN_EXPLORATORY flag can be used to regenerate all notebooks that live in the repo.
README updates accordingly.

I think I recommend looking directly at the two commits I added since your last review to review this, since it will make identifying the specific changes easier. Again, please let me know where I can clarify!

allyhawkins

New changes look good 👍

…irectory in AlexsLemonade#913

sjspielman added 12 commits November 27, 2024 07:47

Establish separate cnv-exploratory-notebooks directory for notebooks …

40040cd

…05 and 06

readme for the new cnv directory

54718b5

update path name

d019f4a

add section for directory contents

5f2ecce

Move 00b notebook since its not a template, and fix README typo

c92a13b

Fix bugs that appeared with new data release: Seurat needs at least 5…

6756978

…00 cells for certain steps

script was missing Seurat library load, and script styled

565f6dc

add missing step to script

b1905db

add an additional flag to cover the cnv step which is way less freque…

1904c2c

…ntly run

README updates and file name fix

36df0bb

update README

508819b

path update

ced1616

sjspielman marked this pull request as ready for review November 27, 2024 14:07

sjspielman requested a review from jaclyn-taroni as a code owner November 27, 2024 14:07

sjspielman requested review from allyhawkins and removed request for jaclyn-taroni November 27, 2024 14:07

allyhawkins reviewed Nov 27, 2024

View reviewed changes

sjspielman added 2 commits November 27, 2024 12:07

Relocate exploratory notebooks to the supplemental notebooks director…

36fd9e4

…y, and underscorify some folders. READMEs and related scripts updated to match these changes

rip out some exploratory that doesn't need to be here

38efa39

Wrong notebook name:

eafbdbd

sjspielman requested a review from allyhawkins November 27, 2024 18:13

allyhawkins approved these changes Nov 27, 2024

View reviewed changes

sjspielman merged commit a31945e into AlexsLemonade:main Nov 27, 2024
3 checks passed

sjspielman deleted the sjspielman/906-regenerate-notebooks-1 branch November 27, 2024 18:17

sjspielman added a commit to sjspielman/OpenScPCA-analysis that referenced this pull request Nov 27, 2024

remove reference to notebook that was relocated to the supplemental d…

c97bbfb

…irectory in AlexsLemonade#913

sjspielman mentioned this pull request Nov 27, 2024

Regenerate wilms-tumor-06 notebooks 2/N #916

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regenerate wilms notebooks: Code updates #913

Regenerate wilms notebooks: Code updates #913

sjspielman commented Nov 27, 2024 •

edited

Loading

allyhawkins left a comment

allyhawkins Nov 27, 2024

sjspielman Nov 27, 2024

allyhawkins Nov 27, 2024

sjspielman Nov 27, 2024

allyhawkins Nov 27, 2024

sjspielman Nov 27, 2024

allyhawkins Nov 27, 2024

sjspielman Nov 27, 2024

sjspielman commented Nov 27, 2024

allyhawkins left a comment

		- `cnv-exploratory-notebooks` contains R Markdown notebooks and their HTML outputs specifically from exploratory steps during CNV analysis in the module workflow
		- `supplemental-notebooks` contains exploratory notebooks comparing Azimuth label transfer to an Azimuth-adapted approach which is used in this module.

Regenerate wilms notebooks: Code updates #913

Regenerate wilms notebooks: Code updates #913

Conversation

sjspielman commented Nov 27, 2024 • edited Loading

allyhawkins left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjspielman commented Nov 27, 2024

allyhawkins left a comment

Choose a reason for hiding this comment

sjspielman commented Nov 27, 2024 •

edited

Loading