Skip to content

Commit

Permalink
Merge pull request #891 from sjspielman/sjspielman/2024-11-18_merge_main
Browse files Browse the repository at this point in the history
Merge main into feature/wilms-tumor-06-azimuth
  • Loading branch information
sjspielman authored Nov 19, 2024
2 parents a275005 + 4472566 commit 0e055c1
Show file tree
Hide file tree
Showing 227 changed files with 12,475 additions and 4,825 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/docker_cell-type-ewings.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
test-build:
name: Test Build Docker Image
if: github.event_name == 'pull_request' || (contains(github.event_name, 'workflow_') && !inputs.push-ecr)
runs-on: ubuntu-latest
runs-on: openscpca-22.04-big-disk

steps:
- name: Set up Docker Buildx
Expand All @@ -49,7 +49,7 @@ jobs:
- name: Build image
uses: docker/build-push-action@v6
with:
context: "{{defaultContext}}:analyses/simulate-sce"
context: "{{defaultContext}}:analyses/cell-type-ewings"
push: false
cache-from: type=gha
cache-to: type=gha,mode=max
Expand Down
16 changes: 8 additions & 8 deletions .github/workflows/docker_metacells.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ concurrency:
cancel-in-progress: true

on:
# pull_request:
# branches:
# - main
# paths:
# - "analyses/metacells/Dockerfile"
# - "analyses/metacells/.dockerignore"
# - "analyses/metacells/renv.lock"
# - "analyses/metacells/conda-lock.yml"
pull_request:
branches:
- main
paths:
- "analyses/metacells/Dockerfile"
- "analyses/metacells/.dockerignore"
- "analyses/metacells/renv.lock"
- "analyses/metacells/conda-lock.yml"
workflow_dispatch:
inputs:
push-ecr:
Expand Down
15 changes: 14 additions & 1 deletion .github/workflows/run_cell-type-ETP-ALL-03.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,19 @@ jobs:
run: |
cd ${MODULE_PATH}
# run module script(s) here
printf "\n\nRunning 00-01_processing_rds.R\n"
Rscript scripts/00-01_processing_rds.R
printf "\n\nRunning 02-03_annotation.R\n"
Rscript scripts/02-03_annotation.R
Rscript scripts/multipanel_plot.R
printf "\n\nRunning 04_multipanel_plot.R\n"
Rscript scripts/04_multipanel_plot.R
printf "\n\nRunning 05_cluster_evaluation.R\n"
Rscript scripts/05_cluster_evaluation.R
printf "\n\nRunning 06_sctype_exploration.R\n"
Rscript scripts/06_sctype_exploration.R
printf "\n\nRunning 07_run_copykat.R\n"
Rscript scripts/07_run_copykat.R
printf "\n\nRunning markerGenes_submission.R\n"
Rscript scripts/markerGenes_submission.R
printf "\n\nRunning writeout_submission.R\n"
Rscript scripts/writeout_submission.R
15 changes: 14 additions & 1 deletion .github/workflows/run_cell-type-nonETP-ALL-03.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ jobs:
libfontconfig1-dev \
libharfbuzz-dev \
libfribidi-dev \
libtiff5-dev
libtiff5-dev \
jags
- name: Set up renv
uses: r-lib/actions/setup-renv@v2
Expand Down Expand Up @@ -87,7 +88,19 @@ jobs:
run: |
cd ${MODULE_PATH}
# run module script(s) here
printf "\n\nRunning 00-01_processing_rds.R\n"
Rscript scripts/00-01_processing_rds.R
printf "\n\nRunning 02-03_annotation.R\n"
Rscript scripts/02-03_annotation.R
printf "\n\nRunning 04_multipanel_plot.R\n"
Rscript scripts/04_multipanel_plot.R
printf "\n\nRunning 05_cluster_evaluation.R\n"
Rscript scripts/05_cluster_evaluation.R
printf "\n\nRunning 06_sctype_exploration.R\n"
Rscript scripts/06_sctype_exploration.R
printf "\n\nRunning 07_run_copykat.R\n"
Rscript scripts/07_run_copykat.R
printf "\n\nRunning markerGenes_submission.R\n"
Rscript scripts/markerGenes_submission.R
printf "\n\nRunning writeout_submission.R\n"
Rscript scripts/writeout_submission.R
41 changes: 11 additions & 30 deletions .github/workflows/run_cell-type-wilms-tumor-14.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,41 +33,22 @@ jobs:
run-module:
if: github.repository_owner == 'AlexsLemonade'
runs-on: ubuntu-latest
container: public.ecr.aws/openscpca/cell-type-wilms-tumor-14:latest

steps:
- name: Checkout repo
uses: actions/checkout@v4

- name: Set up R
uses: r-lib/actions/setup-r@v2
with:
r-version: 4.4.0
use-public-rspm: true

- name: Set up pandoc
uses: r-lib/actions/setup-pandoc@v2

- name: Install system dependencies
- name: Install git
run: |
sudo apt-get install -y \
jags \
libcurl4-openssl-dev \
libfribidi-dev \
libglpk40 \
libharfbuzz-dev \
libhdf5-dev \
libmagick++-dev \
libtiff5-dev
apt-get update
apt-get install -y git
- name: Set up renv
uses: r-lib/actions/setup-renv@v2
with:
working-directory: ${{ env.MODULE_PATH }}

- name: Initialize zellkonverter environment
- name: Install aws-cli
run: |
cd ${MODULE_PATH}
Rscript -e "proc <- basilisk::basiliskStart(env = zellkonverter::zellkonverterAnnDataEnv(), testload = 'anndata'); basilisk::basiliskStop(proc)"
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip -q awscliv2.zip
./aws/install
- name: Checkout repo
uses: actions/checkout@v4

# Update this step as needed to download the desired data
- name: Download test data and results
Expand Down
38 changes: 0 additions & 38 deletions .github/workflows/test_ropenscpca.yml

This file was deleted.

27 changes: 14 additions & 13 deletions analyses/cell-type-ETP-ALL-03/README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,29 @@
# ETP T-ALL Annotation (SCPCP000003)

This analysis module will include codes to annotate cell types and tumor/normal status in ETP T-ALL from SCPCP000003 (n=30) present on the ScPCA portal.
This analysis module will include codes to annotate cell types and tumor/normal status in ETP T-ALL from SCPCP000003 (n=31) present on the ScPCA portal.

## Description

We first aim to annotate the cell types in ETP T-ALL, and use the annotated B cells in the sample as the "normal" cells to identify tumor cells, since T-ALL is caused by the clonal proliferation of immature T-cell [<https://www.nature.com/articles/s41375-018-0127-8>].

- We use the cell type marker (`Azimuth_BM_level1.csv`) from [Azimuth Human Bone Marrow reference](https://azimuth.hubmapconsortium.org/references/#Human%20-%20Bone%20Marrow). In total, there are 14 cell types: B, CD4T, CD8T, Other T, DC, Monocytes, Macrophages, NK, Early Erythrocytes, Late Erythrocytes, Plasma, Platelet, Stromal, and Hematopoietic Stem and Progenitor Cells (HSPC). Based on the exploratory analysis, we believe that most of the cells in these samples do not express adequate markers to be distinguished at finer cell type level (eg. naive vs memory, CD14 vs CD16 etc.), and majority of the cells should belong to T-cells. In addition, we include the marker genes for blast cell [[Bhasin et al. (2023)](https://www.nature.com/articles/s41598-023-39152-z)] as well as erythroid precursor and cancer cell in immune system [[ScType](https://sctype.app/database.php) database].

\*\*Azimuth_BM_level1.csv is converted to submission_markerGenes.tsv, in the final submission format.

- Since ScType annotates cell types at cluster level using marker genes provided by user or from the built-in database, we employ [self-assembling manifold](https://github.com/atarashansky/self-assembling-manifold/tree/master) (SAM) algorithm, a soft feature selection strategy for better separation of homogeneous cell types.

- After cell type annotation, we provide B cells as the normal cells in the sample, if there is any, to [CopyKat](https://github.com/navinlabcode/copykat), for identification of tumor cells.
- After cell type annotation, we fine-tune the annotated B cells by applying 99 percentile cutoff of non-B ScType score on the "B cell clusters". We then use the new B cells (i.e those cells which passed the cutoff) as the normal cells in running [CopyKat](https://github.com/navinlabcode/copykat), for the identification of tumor cells.

Here are the steps in the module:

1. Generating a processed rds file for each sample using SAM (`scripts/00-01_processing_rds.R`)

2. Annotating cell type using ScType and identifying tumor cells using CopyKat (`scripts/02-03_annotation.R`)

3. Fine-tuning the B cells (`scripts/06_sctype_exploration.R`)

4. Re-running CopyKat (`scripts/07_run_copykat.R`)

## Usage

Before running Rscripts in R or Rstudio, we first need to prepare the input files as shown in the next section, and run the following codes in the terminal for installing required libraries:
Expand All @@ -27,6 +33,7 @@ Before running Rscripts in R or Rstudio, we first need to prepare the input file
sudo apt install libglpk40
sudo apt install libcurl4-openssl-dev #for Seurat
sudo apt-get install libxml2-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev libtiff5-dev #for devtools
sudo apt-get install r-cran-rjags #for InferCNV, if wish to run
conda-lock install --name openscpca-cell-type-ETP-ALL-03 conda-lock.yml
Rscript -e "renv::restore()"
Expand All @@ -44,21 +51,15 @@ The `scripts/00-01_processing_rds.R` requires the processed SingleCellExperiment

As for the annotation, `scripts/02-03_annotation.R` requires cell type marker gene file, `Azimuth_BM_level1.csv`, as an input for ScType. This excel file contains a list of positive marker genes in Ensembl ID under `ensembl_id_positive_marker` for each cell type; *TMEM56* and *CD235a* are not detected in our dataset, thus they are being removed as part of the markers for Late Eryth and Pre Eryth respectively. As of now, there is no negative marker genes provided under `ensembl_id_negative_marker`.

## Output files

Running `scripts/00-01_processing_rds.R` will generate two types of output:

- `rds` objects in `scratch/`

- umap plots showing leiden clustering in `plots/`
## Important output files

Running `scripts/02-03_annotation.R` will generate several outputs:
- `rds` objects in `results/rds`

- updated `rds` objects in `scratch/`
- ScType results of top 10 possible cell types in a cluster (`results/_sctype_top10_celltypes_perCluster.txt`) and ScType score (`results/_sctype_scores.txt`)

- umap plots showing cell type and CopyKat prediction (if there is any) and dotplots showing the features added with `AddModuleScore()` in `plots/`
- location of fine-tuned B cells in umap (`plots/sctype_exploration/_newBcells.png`) and the cell type assignment with added fine-tuned B cells (`results/_newB-normal-annotation.txt`)

- ScType results of top 10 possible cell types in a cluster (`_sctype_top10_celltypes_perCluster.txt`) and metadata file tabulating leiden cluster, cell type, low confidence cell type, and CopyKat prediction for each cell (`_metadata.txt`) in `results/`
- final submission table (`results/submission_table/_metadata.tsv`) and the umap plots showing cell_type_assignment from ScType and tumor_cell_classification from CopyKat using fine-tuned B cells (`results/submission_table/multipanels_.png`)

## Software requirements

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 0e055c1

Please sign in to comment.