A Nextflow MinION-based pipeline for tracking species biodiversity
ONTrack2 is a Nextflow implementation of ONTrack pipeline, a rapid and accurate MinION-based barcoding pipeline for tracking species biodiversity on site; starting from MinION sequence reads in fastq format, the ONTrack2 pipeline is able to provide accurate consensus sequences in ~10 minutes per sample on a standard laptop. Compared to the original version, polishing is now performed with Racon and Medaka.
Prerequisites
- Nextflow
- Docker or Singularity
- NCBI nt database (optional, in case you want to perform a local Blast analysis of your consensus sequences)
For downloading the database:
mkdir NCBI_nt_db
cd NCBI_nt_db
echo `date +%Y-%m-%d` > download_date.txt
wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt*
targz_files=$(find . | grep "\\.tar\\.gz$")
for f in $targz_files; do
tar -xzvf $f;
rm $f;
rm $f".md5";
done
Installation
git clone https://github.com/MaestSi/ONTrack2.git
cd ONTrack2
chmod 755 *
The ONTrack2 pipeline requires you to open ONTrack2.conf configuration file and set the desired options. Then, you can run the pipeline using either docker or singularity environments just specifying a value for the -profile variable.
Usage:
nextflow -c ONTrack2.conf run ONTrack2.nf --fastq_files = "/path/to/files*.fastq" --scripts_dir = "/path/to/scripts_dir" --results_dir = "/path/to/results_dir" -profile docker
Mandatory argument:
-profile Configuration profile to use. Available: docker, singularity
Other mandatory arguments which may be specified in the ONTrack2.conf file
--fastq_files Path to fastq files, use wildcards to select multiple samples
--results_dir Path to a folder where to store results
--scripts_dir scripts_dir is the directory containing all scripts
--subsampling_flag subsampling_flag = true if you want to perform reads subsampling to reduce running time
--subsampled_reads subsampled_reads is the number of subsampled reads for each sample in case subsampling_flag = true
--minQ min Q value for reads filtering
--minLen min read length for reads filtering
--maxLen max read length for reads filtering
--target_reads_consensus target_reads_consensus defines the maximum number of reads used for consensus calling
--target_reads_polishing target_reads_polishing defines the maximum number of reads used for consensus polishing
--clustering_id_threshold identity threshold for clustering preliminary allele assembly
--plurality cut-off for the number of positive matches in the multiple sequence alignment below which there is no consensus
--fast_alignment_flag set fast_alignment_flag=1 if you want to perform fast multiple sequence alignment; otherwise set fast_alignment_flag=0
--primers_length primers_length defines how many bases are trimmed from consensus sequences
--medaka_model medaka model for consensus polishing
--blast_db path to Blast-indexed database for Blasting consensus sequences
For running the analysis straight after live base-calling and demultiplexing in interactive mode, the helper script Run_ONTrack2.R is also available, which will perform concatenation of fastq files for each barcode, and run ONTrack2 pipeline on each file exploting Docker profile. The script should be executed with Rscript.
In case you wish to run ONTrack2 on a Windows laptop, you can install Ubuntu with WSL and run the pipeline following this tutorial.
This pipeline was designed and implemented by Prof. Massimo Delledonne and Simone Maestri.
If this tool is useful for your work, please consider citing our manuscript.
Maestri S, Cosentino E, Paterno M, Freitag H, Garces JM, Marcolungo L, Alfano M, Njunjić I, Schilthuizen M, Slik F, Menegon M, Rossato M, Delledonne M. A Rapid and Accurate MinION-Based Workflow for Tracking Species Biodiversity in the Field. Genes. 2019; 10(6):468.
For further information and insights into pipeline development, please have a look at my doctoral thesis.
Maestri, S (2021). Development of novel bioinformatic pipelines for MinION-based DNA barcoding (Doctoral thesis, Università degli Studi di Verona, Verona, Italy). Retrieved from https://iris.univr.it/retrieve/handle/11562/1042782/205364/.
As a real-life Pokédex, the workflow described in our manuscript will facilitate tracking biodiversity in remote and biodiversity-rich areas. For instance, during a Taxon Expedition to Borneo, our analysis confirmed the novelty of a beetle species named after Leonardo DiCaprio.