This project was made as part of the following curriculum:
Programming Lab II - Life Science Informatics, University of Bonn
Table of Contents
A bioinformatician trusts to find reliable protein information and up-to-date referencing in certain "gold standard" databases, such as UniProt and NCBI's RefSeq.
The dbinspector enables the trusting bioinformatician to cross-reference UniProt and RefSeq entries for human
proteins and compare accession IDs, gene symbol, and amino acid sequence.
He can pull up a summary of metadata alignment between databases, or explore information on a specific protein to map
comparable metadata. Both options are available from the command line, as well as from the user-friendly web interface.
To begin, clone the repository:
git clone https://github.com/laurendelong21/DBInspector.git
... navigate to the DBInspector directory, and install:
pip install dbi_pkg/
Here are the 3 options for queries:
- direct usage with python, as illustrated in a Jupyter Notebook manual
- GUI: Graphical User Interface
- CLI: Command Line Interface
See the Jupyter Notebook tutorial for examples on how to use this package directly with Python.
For a user-friendly interface, navigate to the group04
directory and execute:
python frontend/run.py
this will start up the GUI and display the proper link from which it can be accessed. The GUI is organized into two main parts. There is a summary page and comparison page.
On the initial startup the summary page shows an button that will populate the backend with the necessary data. After this step, an overview table with overlap statistics (in percentage) between the UniProt and RefSeq database is visible.
After initialization, the comparison function can be used to lookup a RefSeq Accession ID, a UniProt ID or a Gene Symbol. If the lookup is successful, a comparison table will be shown contrasting the results from both databases. This include number of results, sequence, sequence length and identifier.
To use the command line interface, the user must first install the package.
Afterward, use dbi
in the terminal followed by one of these commands:
command | description |
---|---|
parse | Parses the downloaded database data. |
compare | Searches databases for matches of given query, compares entries across databases and prints results. Query can be UniProt ID, RefSeq ID, or gene symbol. Optionally saves results to a file |
database-summary | Calculates summary statistics comparing the entries in the UniProt and RefSeq databases. Optionally saves results to a file. |
check-age | Use this to check the age of raw and/or parsed files. |
clear-cache | Use this to clear up the space used up by this program. |
The parsing is necessary to restructure and internally store and restructured data from the databases.
This should be run first and only needs to be run once in the beginning.
Gets the corresponding RefSeq and UniProt information on a given UniProt ID, RefSeq ID, or gene symbol query, and visualizes it in a table, which can be stored in a tsv file (-o) optionally.
option | description |
---|---|
-q / --query | Database accession identifier or symbol to be compared. |
-o / --outfile | Filepath for saving the results as a tsv. |
See a summary of the overall matches between RefSeq and UniProt entries for the categories:
symbol, RefSeq ID, UniProt ID, sequence, sequence length
option | description |
---|---|
-o / --outfile | Filepath for saving the results as a tsv. |
Use this to check the age of raw and/or parsed files- if no flag specified, will show age of raw files.
option | description |
---|---|
-p / --parsed | A flag indicating that user wants to see age of parsed data. |
-r / --raw | A flag indicating that user wants to see age of raw downloaded data. |
Use this to clear the large, downloaded files from NCBI RefSeq and UniProt after parsing is done.
Clearing the parsed data as well using the flag -a
frees up an additional ~50 MB of space BUT compromises the functionality of the DBInspector. To restore functionality, rerun the parsing.
option | description |
---|---|
-a / --all_data | Clear entire cache: Downloaded data as well as parsed files. Else only downloaded data. |
Show options and help on any of the CLI commands or an overview with all commands using --help
:
dbi command --help
dbi --help
- Lauren DeLong - [email protected] - GitHub
- Rebeca Figueiredo - [email protected] - GitLab
- Simon Müller - [email protected] - GitLab
- Maren Philipps - [email protected] - GitHub
- GitLab Pages
- README template
- Distributed under the MIT License. See LICENSE for more information.