-
Notifications
You must be signed in to change notification settings - Fork 10
Scripts
pATLAS repo contains several python scripts that are used to compute the matrix using mash and fetch the taxonomic annotation for each sequence. There is also another script that allows to dump abricate outputs to the psql database that is generated while running pATLAS-db-creation workflow (see detailed instructions here). Other scripts are legacy and are no longer used for deploying the website but are still maintained if for some reason they are required for example for debug purposes.
This is the main script, used to generate the pATLAS matrix of pairwise
distances. It collects the fastas given as input, merge them into a concatenated
fasta and them feeds a function that splits every entry as a single fasta, which
will then be given to mash dist
in order to execute several pairwise distances
in parallel. Then all these distances will be outputted to a JSON
file called
import_to_vivagraph.json
and this is then used to render the vivagraph nodes
and links.
Other outputs:
- mash sketch for the entire plasmid database available in pATLAS, that should then be updated in the docker image for mash based analysis that can be imported to pATLAS. For now, in FlowCraft docker image and pATLASflow.
- the lengths of all plasmids in the database is exported to a json file and should also be added to mapping docker images to that the scripts that are used for the import can calculate the coverage of a given plasmid in the sequencing results. For now, it is available in FlowCraft docker image and pATLASflow.
- The bowtie2 and samtools indexes used in mapping approaches, which will be available in the same docker images as the ones described above ( FlowCraft docker image and pATLASflow).
These other outputs are made available in each pATLASflow release.
Other other output:
- an sql file is also made available in each pATLAS release with the current database so that the service can be easily launched elsewhere using patlas-compose.
- a text file with all the removed entries from the database in relation to the NCBI refseq database with the description of the reason by which they were removed.
This script can work in standalone using argparse or as a module as it is by
default executed inside MASHix.py
. It crawls the NCBI taxonomy given a list of
species as input. In pATLAS it is used to dump the list of species to the
database and to generate a file called taxa_tree.json, that is then served in
the following view: /taxa
. Therefore, everytime a new database is created
this view needs to be updated as well. To do so, just copy the new
taxa_tree.json
into this directory.
This is then used to populate the dropdowns available in pATLAS for taxonomic
classifications.
This parameter is used to remove entries from the queries that may refer to some conflicting species, in which a bacteria has the same exact scientific name as other living being may have from a different kindgom.
This script is imported by MASHix.py
and is used as a blacklist of accession
numbers that should not be added to the database, because they aren't plasmids.
It is a simple dictionary which stores as keys the accession numbers of the
blacklisted entries and as values the reason by which they should be excluded.
This script is basically responsible for outputting the results of a given .tsv file from abricate to the desired database.
- database correspondence with the -db parameter:
- resistance - used for "card" and "resfinder" and will output to the model
Card
(check models here). - plasmidfinder - used for the "plasmidfinder" database and will output to
the model
Database
. - virulence - used for the "vfdb" database and will output to the model
Positive
.
- resistance - used for "card" and "resfinder" and will output to the model
Other outputs:
-
resistance.json
,plasmidfinder.json
,virulence.json
: This files will be used to populate dropdowns for resfinder, card, plasmidfinder and vfdb. They are served in different views as documented here.
Similarly to abricate2db.py
, this script is responsible for outputting
the results from diamond tabular output format into the pATLAS metal resistance
database table.
- database correspondence with the -db parameter:
- metal - used for the "bacmet" database and will output to the model
MetalDatabase
. Note that this is the only database that uses a protein query and therefore abricate is expected to last longer.
- metal - used for the "bacmet" database and will output to the model
Other outputs:
-
metal.json
: This file will be used to populate dropdowns for bacmet. They are served in different views as documented here.