Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regenerate wilms notebooks: Code updates #913

Merged

Conversation

sjspielman
Copy link
Member

@sjspielman sjspielman commented Nov 27, 2024

This is the first of 2 PRs to address #906 (but might be more if GitHub UI gets mad) to update notebooks in the cell-type-wilms-tumor-06. This PR covers a smidge of reorg and code changes that cropped up while regenerating notebooks with the 2024-11-25 data release.

  1. I changed a few things to better handle dealing with the exploratory CNV notebooks which are not going to get regenerated:
    • I created a new directory cnv-exploratory-notebooks for them and their output to live in. I also had to update the path accordingly in scripts/explore-cnv-methods.sh.
    • I added a flag to the 00_run_workflow.sh script to specifically handle logic for regenerating these notebooks. Even though we will want to run other exploratory steps in the future, we almost certainly won't want to run these!
  2. I moved 00b_characterize_fetal_kidney_reference_Stewart.Rmd out of notebook_template and into notebook, since it is not a template.
    • I updated 00_run_workflow.sh with the right new path, and also I also renamed that script variable notebook_output_dir --> notebook_dir, since the directory is named "notebook".
  3. I realized (WOMP) that scripts/06b_build-normal_reference.R was missing from the workflow when running the workflow through! The script also needed a Seurat library load which snuck past during review because this module was mostly not being run in CI. The script is now run, loads Seurat, and was styled by pre-commit.
  4. The 03 clustering notebook had some Seurat-related bugs which appeared with the new data release: AddModuleScore doesn't work with defaults when there are fewer cells (<500), so I updates some spots to get this code running. This is now relevant because at least one sample lost about 20-30 cells.
  5. README updates along the way

Please let me know what I can clarify!!

@sjspielman sjspielman marked this pull request as ready for review November 27, 2024 14:07
@sjspielman sjspielman requested review from allyhawkins and removed request for jaclyn-taroni November 27, 2024 14:07
Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments about the approach here. Based on #906 (comment), I was thinking we would completely remove generating these notebooks from the workflow. Since we have a script that is already used to run two scripts and generate the notebooks then I don't think it needs to be present in the main workflow.

The other comment is about how we organize the exploratory analysis. Since the supplemental-notebooks exists, then we should put everything that's exploratory/ not run in the workflow in that folder.

Comment on lines 152 to 161
# These steps are only run if RUN_CNV_EXPLORATORY is true
if [[ $RUN_CNV_EXPLORATORY -eq 1 ]]; then
# Create the pooled normal reference for inferCNV
Rscript scripts/06b_build-normal_reference.R

# Run infercnv and copykat for a selection of samples
# This script calls scripts/05_copyKAT.R and scripts/06_infercnv.R and associated exploratory CNV notebooks in `cnv-exploratory-notebooks/`
# By default, copyKAT as called by this script uses 32 cores
THREADS=${THREADS} TESTING=${TESTING} ./scripts/explore-cnv-methods.sh
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost think we could just remove this from the main workflow rather than do this. It looks like you are just running a single script that already runs two other scripts to actually generate the notebooks so I don't see why this is needed at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No complaints from me.


- `scripts` contain analysis scripts used in the module workflow.
See `scripts/README.md` for more information
- `notebook_template` contains R Markdown notebooks meant to be run as a template across module samples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small annoyance with the fact that this is _ and not - when all the other folder names have -, but not worth the large diffs to fix 😢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should now all be _ :)

Comment on lines 33 to 34
- `cnv-exploratory-notebooks` contains R Markdown notebooks and their HTML outputs specifically from exploratory steps during CNV analysis in the module workflow
- `supplemental-notebooks` contains exploratory notebooks comparing Azimuth label transfer to an Azimuth-adapted approach which is used in this module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So since we have this supplemental folder that contains notebooks not used as part of the workflow and are also exploratory why don't we move them into this folder instead? So you would have exploratory-notebooks as a folder and then maybe two folders inside that, one for cnv and one for azimuth?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but... for context, I originally did something sort of like this #912, and the diff resulting from moving notebooks was simply not viewable on any platform. So, I decided to let it go.....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this file being changed? Github won't let me preview the changes but I don't think it should be changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boo, GitHub.
This was changed because 06b_build-normal_reference.R wasn't documented, and I had a brief moment of recreational revision to make 06 scripts listed after 05 scripts.

Here's the diff which renders for me:
Screenshot 2024-11-27 at 11 15 39 AM

…y, and underscorify some folders. READMEs and related scripts updated to match these changes
@sjspielman
Copy link
Member Author

Fresh changes:

  • All the "not part of this workflow" notebooks are now in supplemental_notebooks (check out that underscore! This is most of the new diffs - renaming this directory to have an underscore)
    • This includes the cnv exploration notebooks, and an exploratory notebook which was used to classify one of the label transfer references, but indeed is wholly unused in cell typing itself.
  • marker-sets --> marker_sets, for more consistency. You can have all _, but not all -.
  • I updated the workflow script to not run anything that lives in supplemental_notebooks, and made it clearer that the RUN_EXPLORATORY flag can be used to regenerate all notebooks that live in the repo.
  • README updates accordingly.

I think I recommend looking directly at the two commits I added since your last review to review this, since it will make identifying the specific changes easier. Again, please let me know where I can clarify!

Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New changes look good 👍

@sjspielman sjspielman merged commit a31945e into AlexsLemonade:main Nov 27, 2024
3 checks passed
@sjspielman sjspielman deleted the sjspielman/906-regenerate-notebooks-1 branch November 27, 2024 18:17
sjspielman added a commit to sjspielman/OpenScPCA-analysis that referenced this pull request Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants