Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to run AUCell on all samples using EWS-FLI1 high/low gene signatures #998

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

allyhawkins
Copy link
Member

@allyhawkins allyhawkins commented Jan 17, 2025

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

Closes #985

What is the goal of this pull request?

Here I am adding two new scripts, one to run AUCell on a single SCE object using a set of custom gene signatures for defining tumor cell states (mainly EWS-FLI1 high and EWS-FLI1 low) and a second script to run the first script on all samples in the Ewing project. Ultimately, we want to use the results from AUCell to help label cells based on the specific cell state they are in.

I am using the two custom gene sets that we have stored in references/gene_signatures that are marker gene lists for EWS-FLI1 high and low along with a set of MSigDB gene sets that we have identified from the literature as being potentially useful. It's pretty quick to run AUCell so I figured it wouldn't hurt to just use all the gene sets in that list.

I did not include genes in our two marker gene lists, visser-all-marker-genes.tsv and tumor-cell-state-markers.tsv since the gene lists for each cell type there are quite small and would probably skew the AUC results.

Briefly describe the general approach you took to achieve this goal.

  • Although we previously had a script for AUCell, it was for a pretty specific use case where we are defining tumor cells. I didn't want to have to make huge edits in that workflow and the code was different enough that I chose to make a new script for this.
  • This script takes in a single SCE object, the custom gene sets, and a percentage to use for determining the aucMaxRank, which I set to the default of 0.01.
  • The output of this script is a TSV file with AUC values for each cell and each gene set. I also chose to include the AUC threshold value that is reported by AUCell. I don't know that we will use it, but I think it could be helpful when we go to plot this data.
  • I then wrote a wrapper script that runs AUCell using these gene signatures on every sample in the project and saves each TSV to the results directory.
  • The last thing I did was update the documentation where necessary. I know this technically isn't a new "workflow", since we're just running one script, but I still followed a similar format in how I documented it and included it in the main README. Once we create our notebooks that go sample by sample, I would like to reorganize some of the things in this module and move things that aren't used anymore to a sub folder so that things are less crowded and easier to understand. But for now, I just added it to existing documentation.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Yes see #993

Results

What is the name of your results bucket on S3?

s3://researcher-211125375652-us-east-2/cell-type-ewings/aucell-ews-signatures

What types of results does your code produce (e.g., table, figure)?

TSV files with the AUC values

What is your summary of the results?

Coming next!

Author checklists

Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.

Analysis module and review

Reproducibility checklist

  • Code in this pull request has been added to the GitHub Action workflow that runs this module.
  • The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
  • If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
  • If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

@allyhawkins allyhawkins removed the request for review from jaclyn-taroni January 17, 2025 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Script to run AUCell on all marker gene sets in Ewings module
1 participant