Skip to content

Commit

Permalink
final submission script
Browse files Browse the repository at this point in the history
  • Loading branch information
UTSouthwesternDSSR committed Oct 30, 2024
1 parent 3588896 commit 32423db
Show file tree
Hide file tree
Showing 3 changed files with 67 additions and 12 deletions.
24 changes: 12 additions & 12 deletions analyses/cell-type-nonETP-ALL-03/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,22 @@ We first aim to annotate the cell types in non-ETP T-ALL, and use the annotated

- We use the cell type marker (`Azimuth_BM_level1.csv`) from [Azimuth Human Bone Marrow reference](https://azimuth.hubmapconsortium.org/references/#Human%20-%20Bone%20Marrow). In total, there are 14 cell types: B, CD4T, CD8T, Other T, DC, Monocytes, Macrophages, NK, Early Erythrocytes, Late Erythrocytes, Plasma, Platelet, Stromal, and Hematopoietic Stem and Progenitor Cells (HSPC). Based on the exploratory analysis, we believe that most of the cells in these samples do not express adequate markers to be distinguished at finer cell type level (eg. naive vs memory, CD14 vs CD16 etc.), and majority of the cells should belong to T-cells. In addition, we include the marker genes for blast cell [[Bhasin et al. (2023)](https://www.nature.com/articles/s41598-023-39152-z)] as well as erythroid precursor and cancer cell in immune system [[ScType](https://sctype.app/database.php) database].

\*\*`Azimuth_BM_level1.csv` is converted to `submission_markerGenes.tsv`, in the final submission format.

- Since ScType annotates cell types at cluster level using marker genes provided by user or from the built-in database, we employ [self-assembling manifold](https://github.com/atarashansky/self-assembling-manifold/tree/master) (SAM) algorithm, a soft feature selection strategy for better separation of homogeneous cell types.

- After cell type annotation, we provide B cells as the normal cells in the sample, if there is any, to [CopyKat](https://github.com/navinlabcode/copykat), for identification of tumor cells.
- After cell type annotation, we fine-tune the annotated B cells by applying 99 percentile cutoff of non-B ScType score on the "B cell clusters". We then use the new B cells (i.e those cells which passed the cutoff) as the normal cells in running [CopyKat](https://github.com/navinlabcode/copykat), for the identification of tumor cells. We could not detect strong B cell signal in `SCPCL000082`.

Here are the steps in the module:

1. Generating a processed rds file for each sample using SAM (`scripts/00-01_processing_rds.R`)

2. Annotating cell type using ScType and identifying tumor cells using CopyKat (`scripts/02-03_annotation.R`)

3. Fine-tuning the B cells (`scripts/06_sctype_exploration.R`)

4. Re-running CopyKat (`scripts/07_run_copykat.R`)

## Usage

Before running Rscripts in R or Rstudio, we first need to prepare the input files as shown in the next section, and run the following codes in the terminal for installing required libraries:
Expand All @@ -44,21 +50,15 @@ The `scripts/00-01_processing_rds.R` requires the processed SingleCellExperiment

As for the annotation, `scripts/02-03_annotation.R` requires cell type marker gene file, `Azimuth_BM_level1.csv`, as an input for ScType. This excel file contains a list of positive marker genes in Ensembl ID under `ensembl_id_positive_marker` for each cell type; *TMEM56* and *CD235a* are not detected in our dataset, thus they are being removed as part of the markers for Late Eryth and Pre Eryth respectively. As of now, there is no negative marker genes provided under `ensembl_id_negative_marker`.

## Output files

Running `scripts/00-01_processing_rds.R` will generate two types of output:

- `rds` objects in `scratch/`

- umap plots showing leiden clustering in `plots/`
## Important output files

Running `scripts/02-03_annotation.R` will generate several outputs:
- `rds` objects in `results/rds`

- updated `rds` objects in `scratch/`
- ScType results of top 10 possible cell types in a cluster (`results/_sctype_top10_celltypes_perCluster.txt`) and ScType score (`results/_sctype_scores.txt`)

- umap plots showing cell type and CopyKat prediction (if there is any) and dotplots showing the features added with `AddModuleScore()` in `plots/`
- location of fine-tuned B cells in umap (`plots/sctype_exploration/_newBcells.png`) and the cell type assignment with added fine-tuned B cells (`results/_newB-normal-annotation.txt`)

- ScType results of top 10 possible cell types in a cluster (`_sctype_top10_celltypes_perCluster.txt`) and metadata file tabulating leiden cluster, cell type, low confidence cell type, and CopyKat prediction for each cell (`_metadata.txt`) in `results/`
- final submission table (`results/submission_table/_metadata.tsv`) and the umap plots showing cell_type_assignment from ScType and tumor_cell_classification from CopyKat using fine-tuned B cells (`results/submission_table/multipanels_.png`)

## Software requirements

Expand Down
2 changes: 2 additions & 0 deletions analyses/cell-type-nonETP-ALL-03/results/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,10 @@ These are the generated outputs for each sample in the S3 bucket:
- `rds` objects: `s3://researcher-650251722463-us-east-2/cell-type-nonETP-ALL-03/results/rds`
- metadata and ScType results: `s3://researcher-650251722463-us-east-2/cell-type-nonETP-ALL-03/results/`
- CopyKat results: `s3://researcher-650251722463-us-east-2/cell-type-nonETP-ALL-03/results/copykat_output`
- InferCNV results: `s3://researcher-650251722463-us-east-2/cell-type-nonETP-ALL-03/results/infercnv_output`
- evaluating cluster separation, stability, and purity: `s3://researcher-650251722463-us-east-2/cell-type-nonETP-ALL-03/results/evalClus`
- umap and dot plots: `s3://researcher-650251722463-us-east-2/cell-type-nonETP-ALL-03/plots`
- violin and stacked bar plots for exploring the results of CopyKat prediction: `s3://researcher-650251722463-us-east-2/cell-type-nonETP-ALL-03/plots/copykat_exploration`
- final submission `tsv` files and `png` for cell type and/or tumor cell classification: `s3://researcher-650251722463-us-east-2/cell-type-nonETP-ALL-03/results/submission_table`

\*\*All the plots are also found in the repository plots/.
53 changes: 53 additions & 0 deletions analyses/cell-type-nonETP-ALL-03/scripts/writeout_submission.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#!/usr/bin/env Rscript

library(Seurat)
library(ggplot2)

writeout <- function(ind.lib, ct.colors = ct_color, project.ID = projectID, n.row = 1){
seu <- readRDS(file.path(out_loc,"results/rds",paste0(ind.lib,".rds")))
voi <- c('newB.copykat.pred','sctype_classification')
changeName.voi <- c('tumor_cell_classification','cell_type_assignment')
tryCatch({
voi_df <- data.frame(FetchData(seu, vars = voi)) |> tibble::rownames_to_column(var = "cell_barcode")
}, error=function(e){})
colnames(voi_df)[2:length(voi_df)] <- changeName.voi[match(colnames(voi_df)[2:length(voi_df)],voi)]
final.df <- data.frame(scpca_sample_id=rep(project.ID, nrow(voi_df)), voi_df,
CL_ontology_id=gene.df$ontologyID[match(voi_df$cell_type_assignment,gene.df$cellName)])
write.table(final.df, sep = "\t", quote = F, row.names = F,
file = file.path(out_loc,"results/submission_table",paste0(ind.lib,"_metadata.tsv")))

## plotting the variables
plot.list <- list()
for (plot.type in voi){
if (plot.type == "sctype_classification"){
clrs <- ct.colors
} else{
clrs <- NULL
}
tryCatch({
plot.list[[plot.type]] <- DimPlot(seu, reduction = "Xumap_", group.by = plot.type,
label = T, cols = clrs, repel = T) +
ggtitle(changeName.voi[match(plot.type, voi)])
}, error=function(e){})
}
cowplot::plot_grid(plotlist = plot.list, nrow = n.row) + patchwork::plot_annotation(title = ind.lib) &
theme(plot.title = element_text(hjust = 0.5, size = 18, face="bold"))
ggsave(file.path(out_loc,"results/submission_table",paste0("multipanels_",ind.lib,".png")), width = 12, height = 5, bg = "white", dpi = 150)
}

project_root <- rprojroot::find_root(rprojroot::is_git_root)
projectID <- "SCPCP000003"
out_loc <- file.path(project_root, "analyses/cell-type-nonETP-ALL-03")
data_loc <- file.path(project_root, "data/current",projectID)

gene.df <- read.table(file.path(out_loc, "Azimuth_BM_level1.csv"), sep = ",", header = T)
ct_color <- c("darkorchid","skyblue2","dodgerblue2","gold","beige","sienna1","green4","navy",
"chocolate4","red","darkred","#6A3D9A","maroon","yellow4","grey35","black","lightpink","grey80")
names(ct_color) <- c("B","CD4 T","CD8 T","DC","HSPC","Mono","NK","Other T","Macrophage",
"Early Eryth","Late Eryth","Plasma","Platelet","Stromal","Blast","Cancer","Pre Eryth","Unknown")

metadata <- read.table(file.path(data_loc,"single_cell_metadata.tsv"), sep = "\t", header = T)
metadata <- metadata[which(metadata$scpca_project_id == projectID &
metadata$diagnosis == "Non-early T-cell precursor T-cell acute lymphoblastic leukemia"), ]
libraryID <- metadata$scpca_library_id
purrr::walk(libraryID, ~ writeout(ind.lib = .x))

0 comments on commit 32423db

Please sign in to comment.