It's a Tab Separated Value file, based on: subject gender status sample lane fastq1 fastq2
or subject gender status sample bam bai
Quite straight-forward:
subject
designate the subject, it should be the ID of the Patient, or if you don't have one, il could be the Normal ID Sample.gender
is the gender of the Patient, (XX or XY)status
is the status of the Patient, (0 for Normal or 1 for Tumor)sample
designate the Sample, it should be the ID of the Sample (it is possible to have more than one tumor sample for each patient)lane
is used when the sample is multiplexed on several lanesfastq1
is the path to the first pair of the fastq filefastq2
is the path to the second pair of the fastq filebam
is the bam filebai
is the index
In this sample for the normal case there are 3 read groups, and 2 for the tumor. It is recommended to add the absolute path of the paired FASTQ files, but relative path should work also. Note, the delimiter is the tab (\t) character:
G15511 XX 0 C09DFN C09DF_1 pathToFiles/C09DFACXX111207.1_1.fastq.gz pathToFiles/C09DFACXX111207.1_2.fastq.gz
G15511 XX 0 C09DFN C09DF_2 pathToFiles/C09DFACXX111207.2_1.fastq.gz pathToFiles/C09DFACXX111207.2_2.fastq.gz
G15511 XX 0 C09DFN C09DF_3 pathToFiles/C09DFACXX111207.3_1.fastq.gz pathToFiles/C09DFACXX111207.3_2.fastq.gz
G15511 XX 1 D0ENMT D0ENM_1 pathToFiles/D0ENMACXX111207.1_1.fastq.gz pathToFiles/D0ENMACXX111207.1_2.fastq.gz
G15511 XX 1 D0ENMT D0ENM_2 pathToFiles/D0ENMACXX111207.2_1.fastq.gz pathToFiles/D0ENMACXX111207.2_2.fastq.gz
On the other hand, if you have BAMs (T/N pairs that were not realigned together) and their indexes, you should use a structure like:
G15511 XX 0 C09DFN pathToFiles/G15511.C09DFN.md.real.bam pathToFiles/G15511.C09DFN.md.real.bai
G15511 XX 1 D0ENMT pathToFiles/G15511.D0ENMT.md.real.bam pathToFiles/G15511.D0ENMT.md.real.bai
All the files will be created in the Preprocessing/NonRealigned/ directory, and by default a corresponding TSV file will also be deposited there. Generally, getting MuTect1 and Strelka calls on the preprocessed files should be done by:
nextflow run SciLifeLab/CAW --sample Preprocessing/NonRealigned/mysample.tsv --step realign --tools MuTect1,Strelka
The same way, if you have recalibrated BAMs (T/N pairs that were realigned together) and their indexes, you should use a structure like:
G15511 XX 0 C09DFN pathToFiles/G15511.C09DFN.md.real.bam pathToFiles/G15511.C09DFN.md.real.bai
G15511 XX 1 D0ENMT pathToFiles/G15511.D0ENMT.md.real.bam pathToFiles/G15511.D0ENMT.md.real.bai
All the files will be in he Preprocessing/Recalibrated/ directory, and by default a corresponding TSV file will also be deposited there. Generally, getting MuTect1 and Strelka calls on the recalibrated files should be done by:
nextflow run SciLifeLab/CAW --sample Preprocessing/Recalibrated/mysample.tsv --step skipPreprocessing --tool MuTect1,Strelka