Complete sequencing pipeline for differential gene expression analysis of bulk RNA from human latent membrane protein 1 (LMP1) knockout (KO) lymphoblastoid cell lines (LCL).
The control transcriptome sequence was obtained from GM12878, a well-established LCL produced from the blood of a female donor by Epstein-Barr virus (EBV) transformation as a part of the ENCODE project. The control consists of raw paired-end reads from GM12878 replicates 2 and 3, and can be downloaded from the command line:
#Replicate 2
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqGm12892R2x75Il200FastqRd1Rep2V2.fastq.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqGm12892R2x75Il200FastqRd2Rep2V2.fastq.gz
#Replicate 3
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqGm12892R2x75Il200FastqRd1Rep3V2.fastq.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqGm12892R2x75Il200FastqRd2Rep3V2.fastq.gz
For replicate 1, the BAM file with aligned reads was used due to large FASTQ files and hardware limitations:
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqGm12892R2x75Il200AlignsRep1V2.bam
The RNA sequence data for LMP1 knockout LCLs came from Mitra et al 2023, (GEO Accession ID GSE228167) generated from the GM12878 cell lines and consists of three replicates. The data can be obtained from:
#Replicate 1
#wget -O reads1.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR239/010/SRR23957810/SRR23957810_1.fastq.gz
#wget -O reads2.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR239/010/SRR23957810/SRR23957810_2.fastq.gz
#Repilcate 2
#wget -O reads1.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR239/011/SRR23957811/SRR23957811_1.fastq.gz
#wget -O reads2.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR239/011/SRR23957811/SRR23957811_2.fastq.gz
#Replicate 3
#wget -O reads1.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR239/012/SRR23957812/SRR23957812_1.fastq.gz
#wget -O reads2.fastq.gz ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR239/012/SRR23957812/SRR23957812_2.fastq.gz
fastqc
contains quality reports for raw and processed reads.
The bash script for the complete RNA-Seq pipeline can be found in script.sh
.
References:
- Mitra et al. (2023). Characterization of Target Gene Regulation by the Two Epstein-Barr Virus Oncogene LMP1 Domains Essential for B-cell Transformation.
- RNA-seq Data Analysis: A Practical Approach