-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advanced Usage
introduction simplification
#378
base: staging
Are you sure you want to change the base?
Changes from all commits
a42344c
4e232f6
c6494ac
131e007
de2eca5
571a3dc
8c5ceeb
5263697
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,164 +26,96 @@ As with many clustering and network methods, there are some parameters that may | |
|
||
# How to run this example | ||
|
||
For general information about our tutorials and the basic software packages you will need, please see our ['Getting Started' section](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html#how-this-tutorial-is-structured). | ||
We recommend taking a look at our [Resources for Learning R](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html#resources-for-learning-r) if you have not written code in R before. | ||
For general information about our tutorials and the basic software packages required, please see our ['Getting Started' pages](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html). | ||
Below are some brief instructions about the files and directory structure this example expects. | ||
If you need more detailed instructions about how to obtain the data files from refine.bio, please consult one of the earlier examples, such as this one about [clustering and heatmaps](https://alexslemonade.github.io/refinebio-examples/03-rnaseq/clustering_rnaseq_01_heatmap.html). | ||
|
||
## Obtain the `.Rmd` file | ||
|
||
To run this example yourself, [download the `.Rmd` for this analysis by clicking this link](https://alexslemonade.github.io/refinebio-examples/03-rnaseq/differential_expression_rnaseq_01_rnaseq.Rmd). | ||
## Directory structure and required files | ||
|
||
Clicking this link will most likely send this to your downloads folder on your computer. | ||
Move this `.Rmd` file to where you would like this example and its files to be stored. | ||
To run this example yourself, [download the `.Rmd` for this analysis by clicking this link](https://alexslemonade.github.io/refinebio-examples/03-rnaseq/differential_expression_rnaseq_01_rnaseq.Rmd) and move the `.Rmd` file to your preferred analysis folder. | ||
|
||
You can open this `.Rmd` file in RStudio and follow the rest of these steps from there. (See our [section about getting started with R notebooks](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html#how-to-get-and-use-rmds) if you are unfamiliar with `.Rmd` files.) | ||
|
||
## Set up your analysis folders | ||
|
||
Good file organization is helpful for keeping your data analysis project on track! | ||
We have set up some code that will automatically set up a folder structure for you. | ||
Run this next chunk to set up your folders! | ||
|
||
If you have trouble running this chunk, see our [introduction to using `.Rmd`s](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html#how-to-get-and-use-rmds) for more resources and explanations. | ||
|
||
```{r} | ||
# Create the data folder if it doesn't exist | ||
if (!dir.exists("data")) { | ||
dir.create("data") | ||
} | ||
|
||
# Define the file path to the plots directory | ||
plots_dir <- "plots" # Can replace with path to desired output plots directory | ||
|
||
# Create the plots folder if it doesn't exist | ||
if (!dir.exists(plots_dir)) { | ||
dir.create(plots_dir) | ||
} | ||
|
||
# Define the file path to the results directory | ||
results_dir <- "results" # Can replace with path to desired output results directory | ||
|
||
# Create the results folder if it doesn't exist | ||
if (!dir.exists(results_dir)) { | ||
dir.create(results_dir) | ||
} | ||
``` | ||
|
||
In the same place you put this `.Rmd` file, you should now have three new empty folders called `data`, `plots`, and `results`! | ||
|
||
## Obtain the dataset from refine.bio | ||
|
||
For general information about downloading data for these examples, see our ['Getting Started' section](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html#how-to-get-the-data). | ||
|
||
Go to this [dataset's page on refine.bio](https://www.refine.bio/experiments/SRP140558). | ||
|
||
Click the "Download Now" button on the right side of this screen. | ||
|
||
<img src="https://github.com/AlexsLemonade/refinebio-examples/raw/40e47f4d3f39283effbd9843a457168061be9680/template/screenshots/download-now.png" width=200> | ||
|
||
Fill out the pop up window with your email and our Terms and Conditions: | ||
|
||
<img src="https://github.com/AlexsLemonade/refinebio-examples/raw/40e47f4d3f39283effbd9843a457168061be9680/template/screenshots/download-email.png" width=500> | ||
|
||
We are going to use non-quantile normalized data for this analysis. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm worried about deleting this piece in particular. We need them to know whether or not they should download quantile normalized data and where to find that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do cover that on line 40, but it could be made more prominent. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think screenshots help. I guess bigger question is just because someone's an "advanced topics" user, can we assume they know how to download data from refine.bio and know the refine.bio options more readily? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess I think even advanced topics users will appreciate screenshots. Though I do agree with cutting back on the file path hand holding. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That was my thought, or they could go to another example to get this information. 🤷🏼 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If we think they might do this, then we should probably just keep the screenshots here too. I think screenshots help decrease brain glucose usage for simple things. |
||
To get this data, you will need to check the box that says "Skip quantile normalization for RNA-seq samples". | ||
Note that this option will only be available for RNA-seq datasets. | ||
The data we are using is from the SRA project SRP140558 as processed by refine.bio. | ||
To obtain this data, go to the [dataset's page on refine.bio](https://www.refine.bio/experiments/SRP140558/identification-of-transcription-factor-relationships-associated-with-androgen-deprivation-therapy-response-and-metastatic-progression-in-prostate-cancer) and click the "Download Now" button. | ||
Follow the instructions, being sure to select "Skip quantile normalization for RNA-seq samples". | ||
|
||
<img src="https://github.com/AlexsLemonade/refinebio-examples/raw/40e47f4d3f39283effbd9843a457168061be9680/template/screenshots/skip-quantile-normalization.png" width=500> | ||
|
||
It may take a few minutes for the dataset to process. | ||
You will get an email when it is ready. | ||
|
||
## About the dataset we are using for this example | ||
|
||
For this example analysis, we will use this [acute viral bronchiolitis dataset](https://www.refine.bio/experiments/SRP140558). | ||
The data that we downloaded from refine.bio for this analysis has 62 paired peripheral blood mononuclear cell RNA-seq samples obtained from 31 patients. | ||
Samples were collected at two time points: during their first, acute bronchiolitis visit (abbreviated "AV") and their recovery, their post-convalescence visit (abbreviated "CV"). | ||
|
||
## Place the dataset in your new `data/` folder | ||
|
||
refine.bio will send you a download button in the email when it is ready. | ||
Follow the prompt to download a zip file that has a name with a series of letters and numbers and ends in `.zip`. | ||
Double clicking should unzip this for you and create a folder of the same name. | ||
|
||
<img src="https://github.com/AlexsLemonade/refinebio-examples/raw/40e47f4d3f39283effbd9843a457168061be9680/template/screenshots/download-folder-structure.png" width=400> | ||
|
||
For more details on the contents of this folder see [these docs on refine.bio](http://docs.refine.bio/en/latest/main_text.html#downloadable-files). | ||
|
||
The `<experiment_accession_id>` folder has the data and metadata TSV files you will need for this example analysis. | ||
Experiment accession ids usually look something like `GSE1235` or `SRP12345`. | ||
|
||
Copy and paste the `SRP140558` folder into your newly created `data/` folder. | ||
Once you have downloaded and unzipped the refine.bio dataset, place the `SRP140558` folder in a `data` subdirectory of your analysis folder. | ||
|
||
## Check out our file structure! | ||
We will also create `plots` and `results` folders for future use. | ||
The analysis folder will have the following content: | ||
|
||
Your new analysis folder should contain: | ||
|
||
- The example analysis `.Rmd` you downloaded | ||
- A folder called "data" which contains: | ||
- The example analysis `.Rmd` notebook | ||
- A folder called `data` which contains: | ||
- The `SRP140558` folder which contains: | ||
- The gene expression | ||
- The metadata TSV | ||
- A folder for `plots` (currently empty) | ||
- A folder for `results` (currently empty) | ||
- A `plots` folder | ||
- A `results` folder | ||
|
||
Your example analysis folder should now look something like this (except with respective experiment accession ID and analysis notebook name you are using): | ||
Your analysis folder will end up looking something like this (except with respective experiment accession ID and analysis notebook name you are using): | ||
|
||
<img src="https://github.com/AlexsLemonade/refinebio-examples/raw/40e47f4d3f39283effbd9843a457168061be9680/template/screenshots/analysis-folder-structure.png" width=400> | ||
|
||
In order for our example here to run without a hitch, we need these files to be in these locations so we've constructed a test to check before we get started with the analysis. | ||
These chunks will declare your file paths and double check that your files are in the right place. | ||
## Define file paths | ||
|
||
First we will declare our file paths to our data and metadata files, which should be in our data directory. | ||
This is handy to do because if we want to switch the dataset (see next section for more on this) we are using for this analysis, we will only have to change the file path here to get started. | ||
We will define variables for the files and directories we are using in the chunk below. | ||
|
||
```{r} | ||
# Define the file path to the data directory | ||
data_dir <- file.path("data", "SRP140558") # Replace with accession number which will be the name of the folder the files will be in | ||
# Define the file path to the accession data directory | ||
data_dir <- file.path("data", "SRP140558") | ||
|
||
# path to the refine.bio expression matrix | ||
data_file <- file.path(data_dir, "SRP140558.tsv") | ||
# file path to the refine.bio metadata file | ||
metadata_file <- file.path(data_dir, "metadata_SRP140558.tsv") | ||
|
||
# Declare the file path to the gene expression matrix file using the data directory saved as `data_dir` | ||
data_file <- file.path(data_dir, "SRP140558.tsv") # Replace with file path to your dataset | ||
# Define the file path to the plots directory | ||
# (create it if missing) | ||
plots_dir <- "plots" | ||
if (!dir.exists(plots_dir)) { | ||
dir.create(plots_dir) | ||
} | ||
|
||
# Declare the file path to the metadata file using the data directory saved as `data_dir` | ||
metadata_file <- file.path(data_dir, "metadata_SRP140558.tsv") # Replace with file path to your metadata | ||
# Define the file path to the results directory | ||
results_dir <- "results" | ||
if (!dir.exists(results_dir)) { | ||
dir.create(results_dir) | ||
} | ||
``` | ||
|
||
Now that our file paths are declared, we can use the `file.exists()` function to check that the files are where we specified above. | ||
It is always worth checking that the paths we defined above are correct and the files are where we expect! | ||
|
||
```{r} | ||
# Check if the gene expression matrix file is at the file path stored in `data_file` | ||
# Check the gene expression matrix file | ||
file.exists(data_file) | ||
|
||
# Check if the metadata file is at the file path stored in `metadata_file` | ||
# Check for the metadata file | ||
file.exists(metadata_file) | ||
``` | ||
|
||
If the chunk above printed out `FALSE` to either of those tests, you won't be able to run this analysis _as is_ until those files are in the appropriate place. | ||
|
||
If the concept of a "file path" is unfamiliar to you; we recommend taking a look at our [section about file paths](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html#an-important-note-about-file-paths-and-Rmds). | ||
|
||
# Using a different refine.bio dataset with this analysis? | ||
## Using a different refine.bio dataset with this analysis? | ||
|
||
If you'd like to adapt an example analysis to use a different dataset from [refine.bio](https://www.refine.bio/), we recommend placing the files in the `data/` directory you created and changing the filenames and paths in the notebook to match these files (we've put comments to signify where you would need to change the code). | ||
We suggest saving plots and results to `plots/` and `results/` directories, respectively, as these are automatically created by the notebook. | ||
From here you can customize this analysis example to fit your own scientific questions and preferences. | ||
|
||
### Sample size | ||
|
||
Keep in mind when using a different refine.bio dataset with this example, that WGCNA requires at least 15 samples to produce a meaningful result [according to its authors](https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html). | ||
Keep in mind when using a different refine.bio dataset with this example that WGCNA requires at least 15 samples to produce a meaningful result [according to its authors](https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html). | ||
So you will need to make sure the dataset you use is sufficiently large. | ||
However, note that very large datasets will be difficult to run locally (on a personal laptop) due to the required computing power. | ||
While you can adjust some parameters to make this more doable on a laptop, it may decrease the reliability of your result if taken to an extreme (more on this parameter, called `maxBlockSize`, in the [`Run WGCNA!` section](#run-wgcna)). | ||
|
||
### Microarray vs RNA-seq | ||
### Microarray vs. RNA-seq | ||
|
||
WGCNA can be used with both RNA-seq and microarray datasets so long as they are well normalized and filtered. | ||
In this example we use RNA-seq and [normalize and transform the data with DESeq2's `vst()`](https://alexslemonade.github.io/refinebio-examples/03-rnaseq/00-intro-to-rnaseq.html#deseq2-transformation-methods), which not only is a method and package we recommend in general, but is also the [authors' specific recommendations for using WGCNA with RNA-seq data](https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html#:~:text=Can%20WGCNA%20be%20used%20to,Yes.&text=Whether%20one%20uses%20RPKM%2C%20FPKM,were%20processed%20the%20same%20way.). | ||
|
||
If you end up wanting to run WGCNA with a microarray dataset, the normalization done by refine.bio _should_ be sufficient, but you will likely want to [apply a minimum expression filter](#define-a-minimum-counts-cutoff) as we do in this example. | ||
If you have troubles finding a `power` parameter that yields a sufficient R^2 even after applying a stringent cutoff, you may want to look into using a different dataset. | ||
|
||
*** | ||
|
||
<!-- Do not delete this line --> <a name="analysis" style="padding-top:56px;margin-top:-56px;"> </a> | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still will need to create these directories though. (Unless you've put this part somewhere else)