SOCfinder is a bioinformatics tool for finding cooperative genes in bacterial genomes. SOCfinder combines information from several methods, considering if a gene is likely to: (1) code for an extracellular protein; (2) have a cooperative functional annotation; or (3) be part of the biosynthesis of a cooperative secondary metabolite. SOCfinder uses information on the quality and significance of database matches and annotations.
You will need miniconda, which can be installed by following the instructions here. You can check that conda has installed correctly by running conda list
(you may need to restart your terminal first).
For an introduction to conda, see here.
If you are on a mac that has an M1 or M2 chip, you might have to adjust your conda architecture. Instructions can be found here.
You can download SOCfinder from github using the code below.
git clone https://github.com/lauriebelch/SOCfinder.git
cd SOCfinder
conda env create -f environment_noversion.yml
# activate conda environment
conda activate SOCfinder
pip install gffutils
You will then need to download some files for KOFAMscan and ANTISMASH. The easiest way to do this is to use the helper script.
chmod +x ./helper_script
./helper_script
When this script has finished running, it will tell you how to add the required programs to your path. For a simple explanation of the path, see here.
You will need to build the databases that the BLAST search uses. You only need to do this once, and can use the script provided.
cd blast_files
unzip Archive.zip
cat psort_extracellular_gramN.fasta psort_extracellular_gramP.fasta | awk '/^>/ {if(!seen[$0]++) print; next} {print}' > psort_extracellular_gramBoth.fasta
cd ..
chmod +x ./SOC_MakeBlastDB.py
./SOC_MakeBlastDB.py
SOCfinder comes with two genomes that you can test the code with. For each genome, you need a protein fasta, a nucleotide fasta, and a gff. This is because the tools that SOCfinder uses require different inputs.
The folder test
contains the files for a strain of Buchnera aphidicola.
The folder test2
contains the files for a strain of Piscirickettsia salmonis.
Part 1: Mine the Genome. In this section, the three modules of SOCfinder are run. The output files are stored in a folder
./SOC_mine.py -g test/B_aphidicola.faa -f test/B_aphidicola.fna -gff test/B_aphidicola.gff -O B_aphidicola -n
./SOC_mine.py -g test2/P_salmonis.faa -f test2/P_salmonis.fna -gff test2/P_salmonis.gff -O P_salmonis -n
Part 2: Extract the Social Genes.
In this section, the outputs of each modules are converted into lists of social genes.
./SOC_parse.py -i B_aphidicola/ -k inputs/SOCIAL_KO.csv -a inputs/antismash_types.csv
./SOC_parse.py -i P_salmonis/ -k inputs/SOCIAL_KO.csv -a inputs/antismash_types.csv
The final list of cooperative genes is stored as SOCKS.csv
. Outputs for each module are stored as K_SOCK.csv
for the functional annotation social genes, B_SOCK.csv
for the extracellular genes, and A_SOCK_filtered.csv
for the antismash social genes. There is also a summary file summary.csv
that gives you the counts of cooperative genes for each module.
B. aphidicola has nine social genes, and P. salmonis has 64.
Command-line options for SOCfinder
SOC_mine.py
-g GENOMEinput
- Path to GENOME protein (.faa)
-f FASTAinput
- Path to GENOME nucleotide (.fna)
-gff GENOMEinput
- Path to GENOME gff file (.gff)
-O outputfolder
- Name of output folder
-p -n -both GramPositive | GramNegative | both
- Gram stain (positive | negative | both)
SOC_parse.py
-i inputfolder
- Path to input folder from SOC_mine
-k ko
- Path to list of social KO terms
-a ANTISMASHtypes
- Path to list of antismash types
The SOCfinder reccommended way to download the genome files you need is to use the NCBI Datasets command line tool to download RefSeq genomes. This is so that gene ID is the same in the protein fasta, nucleotide fasta, and gff.
datasets download genome accession GCF_003798305.1 --include gff3,genome,protein --filename GCF_003798305.1.zip
Apple recently made the switch from Intel processors to their own Apple Silicon processors. This can cause package compatibility issues if your computer has one of the new M1 or M2 chips. Currently, the best solution is to create conda environments that still use the old architecture. You can do this by running the following command before creating the SOCfinder conda environment.
conda config --add subdirs osx-64
Further discussion of this issue can be found here.
There are also some problems with diamond on newer macs, that can make antismash fail. We can fix this by running
conda install -c bioconda diamond=0.9.14
The manuscript "SOCfinder: a genomic tool for identifying cooperative genes in bacteria" is now published in Microbial Genomics link here.
Belcher, Dewar, Hao, Katz, Ghoul, & West (2023) Microbial Genomics. https://doi.org/10.1099/mgen.0.001171
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
SOCfinder code is open-source and free to use and distribute, but please cite the SOCfinder paper. antiSMASH is an open source tool available under the GNU Affero General Public License version 3.0 or greater. KOFAMscan is released under the MIT License.