Annotation Pipeline

Installation Guide (Ubuntu)

1- Install Docker (
2- Make Docker run without requiring sudo (

sudo groupadd docker # create the docker group
sudo usermod -aG docker $USER # add your user to the docker group
#Log out and log back in so that your group membership is re-evaluated
docker run hello-world #verify that you can run docker commands without sudo
#If it runs without issues, it means that it works, else please consult the link above for further steps

3- Pull the Braker3 Docker image and test if it runs

docker pull teambraker/braker3
docker run --user 1000:100 --rm -it teambraker/braker3:latest bash 
# it should work without sudo if the previous steps worked

4- Install miniconda (

mkdir -p ~/miniconda3
wget -O ~/miniconda3/
bash ~/miniconda3/ -b -u -p ~/miniconda3
rm -rf ~/miniconda3/
# After installing, initialize your newly-installed Miniconda using the following commands for bash and zsh shells
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh
#add conda-forge and bioconda channels to miniconda installation
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

5- Create conda environment called ‘annotation’ and install repeatmodeler

conda create -n annotation bioconda::repeatmodeler

6- Create conda environment called ‘busco_env’ and install Busco

conda create -n busco_env -c conda-forge -c bioconda busco=5.7.1

7- Install interproscan in the base conda environment, make sure the appropriate java and other depndencies are installed:

mkdir ~/my_interproscan
cd ~/my_interproscan
# recommended checksum to confirm the download was successful: must return *interproscan-5.69-101.0-64-bit.tar.gz: OK*
md5sum -c interproscan-5.69-101.0-64-bit.tar.gz.md5
tar -pxvzf interproscan-5.69-101.0-*-bit.tar.gz
cd ~/my_interproscan/interproscan-5.69-101.0
python3 -f # index the hmm models to prepare them into a format used by hmmscan

If you face any of the common issues resolve them as indicated in the interproscan docs. error while loading shared libraries: can be resolved using sudo apt-get install -y libgomp1

8- Install ncbi-genome-download in the base conda environment

pip install ncbi-genome-download

9- Create and environment and install the dependencies for the remove_isoforms module:

mamba create -n remove_isoforms
mamba activate remove_isoforms
pip install biopython

10- Download the appropriate reference protein file from here. Ensure that the reference is unzipped and has appropriate read permissions to be opened with docker: chmod 664 <reference_fasta> 11- Make sure you have read and write permissions to our input and output directories

Runtime Guide (Ubuntu)

# reference
bash -a <accession_number> -r <reference_fasta> -l <lineage> -o <output_directory> -t <threads>

-a   Specify the NCBI accession number for the species genome you want to annotate, must start with GCA_(GeneBank) or GCF_ (RefSeq)
-r   Specify the path of the reference fasta file (.fa, .fna, .fasta),
   usually obtained from (for our case Eukaryota.fa.gz)
-l   Specify the BUSCO lineage term to be used from this list (for our case euglenozoa)
-o   Specify the path of the output directory where the results will be saved, default is working directory
-t   Specify the number of threads to use, not recommended above 32, default is 8
-h   Display the help message

Remove Isoforms

After species annotation is done, ensure the remove_isoforms directory is present and run:

bash remove_isoforms/ <BASE_NAME>

where BASE_NAME is the name of the directory created by the annotation process.

Internal Use Only

# example for Leishmania tarentolae
bash -a GCA_033953505.1 -r Eukaryota.fa -l euglenozoa -o ~/tarentolae -t 32