Skip to content

Latest commit

 

History

History
81 lines (77 loc) · 22.3 KB

README.md

File metadata and controls

81 lines (77 loc) · 22.3 KB

SARS-CoV-2_Bioinformatics

1. genome/genomics (Guangyuan)

Data resources

  • GISAID (Global Initiative on Sharing All Influenza Data) International database of hCoV-19 genome sequences and related clinical and epidemiological data

    Y. Shu and J. McCauley, GISAID: global initiative on sharing all influenza data – from vision to reality, Eurosurveillance, vol. 22, iss. 13, 2017.

  • EMBL-EBI: Covid-19 Data Portal The Covid-19 Data Portal developed and hosted by EMBL-EBI brings together relevant datasets for sharing and analysis in an effort to accelerate coronavirus research. It enables researchers to upload, access and analyse COVID-19 related reference data and specialist datasets as part of the wider European COVID-19 Data Platform. It includes some tools as well.
  • ENA (European Nucleotide Archive) ENA lists data held at EMBL-EBI relating to the COVID-19 outbreak, including sequences of outbreak isolates and records relating to coronavirus biology. In the coming weeks, these data will be included in EMBL-EBI’s new dedicated resource for COVID-19 data, the COVID-19 Portal.
  • China National Center for Bioinformation (2019nCoVR) 2019nCoVR features comprehensive integration of genomic and proteomic sequences as well as their metadata information from the GISAID, NCBI, NMDC and CNCB/NGDC. It also incorporates a wide range of relevant information including scientific literatures, news, and popular articles for science dissemination, and provides visualization functionalities for genome variation analysis results based on all collected 2019-nCoV strains.

    Zhao WM, Song SH, Chen ML, et al. The 2019 novel coronavirus resource. Yi Chuan. 2020;42(2):212–221. doi:10.16288/j.yczz.20-030 [PMID: 32102777]

  • GenBank Nucleotide Sequences Provides rapid, open, and unrestricted access to virus nucleotide sequences and is the repository being recommended by NIAID and CDC for investigator and public health submissions. Due to the scale of data indexing, there may be a delay before new submissions are indexed and retrievable with a term-based query.
  • GenBank Protein Sequences Provides rapid, open, and unrestricted access to virus conceptually translated protein sequences and is the repository being recommended by NIAID and CDC for investigator and public health submissions. Due to the scale of data indexing, there may be a delay before new submissions are indexed and retrievable with a term-based query.
  • NCBI Virus: SARS-CoV-2 data hub SARS-CoV-2 focused content from NCBI Virus, including links to related resources. Search, filter, and download the most up-to-date nucleotide and protein sequences from GenBank and RefSeq (taxid 2697049). Generate multiple sequence alignments and phylogenetic trees for sequences of interest. Provides one-click access to the Betacoronavirus BLAST database and relevant literature in PubMed.
  • ViPR SARS-CoV-2 data portal | Virus Pathogen Resource The ViPR database integrates various types of data for multiple virus families. You can search the comprehensive database for sequences & strains, immune epitopes, 3D protein structures, host factor data, antiviral drugs, plasmid data. Further you can analyze the data online using sequence alignment, phylogenetic tree reconstruction, sequence variation (SNP), metadata-driven comparative analysis and BLAST. Visit the SARS-CoV-2 data portal in ViPR.

    B. E. Pickett, E. L. Sadat, Y. Zhang, J. M. Noronha, B. R. Squires, V. Hunt, M. Liu, S. Kumar, S. Zaremba, Z. Gu, L. Zhou, C. N. Larson, J. Dietrich, E. B. Klem, and R. H. Scheuermann, ViPR: an open bioinformatics database and analysis resource for virology research, Nucleic Acids Res, vol. 40, iss. D1, p. D593–D598, 2011.

  • INB/ELIXIR-ES and TransBioNet: COVID-19 research
  • Nextstrain COVID-19 genetic epidemiology Open-source SARS-CoV-2 genome data and analytic and visualization tools
  • Sequence Read Archive (SRA) Provides rapid, open, and unrestricted access to virus nucleotide or metagenomic sequence data and is the repository being recommended by NIAID and CDC for investigator and public health submissions. Due to the scale of data indexing, there may be a delay before new submissions are indexed and retrievable with a term-based query.
  • COVID-19 Genome Tracker
  • ViralZone SARS-CoV-2 protein seqs available at ViralZone
  • CoV-GLUE an online resource for comparative genomic analysis
  • Twist Bioscience Twist is offering two fully-synthetic SARS-CoV-2 RNA controls, available for distinct reference sequences: Twist Synthetic SARS-CoV-2 RNA Control 1 (MT007544.1), Twist Synthetic SARS-CoV-2 RNA Control 2 (MN908947.3). The Twist synthetic controls are designed based on two specific SARS-CoV-2 variants, cover the full viral genome and are sequence-verified. In addition, Twist is able to create synthetic RNA controls from other strains or sequences of the virus, and can provide these custom controls within two weeks.
  • Materials and Methods from Labome
  • PubMed trending research papers (SARS-CoV-19)

Tools - Detection, Reconstruction, Identification

  • PriSeT | Efficient De Novo Primer Discovery Appropriate PCR primer pairs for DNA metabarcoding would match to a broad evolutionary range of taxa, such that we only need a few to achieve high taxonomic coverage. At the same time, the DNA barcodes between primer pairs should be different to allow us to distinguish between species to improve resolution. PriSeT finds a primer set P balancing both: high taxonomic coverage and high resolution. It is capable of processing large libraries and is robust against mislabeled or low quality references. It tackles the computationally expensive steps with linear runtime filters and efficient encodings. PriSeT has been applied to 19 SARS-CoV-2 genomes and computed 114 new primer pairs with the additional constraint that the sequences have no co-occurrences in other taxa. These primer sets would be suitable for empirical testing.

    M. Hoffmann, M. T. Monaghan, and K. Reinert, PriSeT: efficient de novo primer discovery, bioRxiv, 2020.

  • CoVPipe | reference-based reconstruction of SARS-CoV-2 genomes CoVPipe is a highly optimized and fully automated workflow for the reference-based reconstruction of SARS-CoV-2 genomes based on next generation amplicon sequencing data using CleanPlex SARS-CoV-2 Panel (Paragon Genomics, Hayward, CA, USA) from swab samples. The pipeline is designed for reproducibility and scalability in order to ensure reliable and fast data analysis of SARS-CoV2 data.
  • poreCov The nanopore workflow poreCov carries out all necessary steps from basecalling to assembly depending on the user input, followed by lineage prediction of each genome using Pangolin. Furthermore, read coverage plots are provided for each genome to assess the amplification quality of the multiplex PCR. In addition, poreCov includes a quick time tree-based analysis of the inputs against reference sequences. poreCov is implemented in nextflow for full parallelization of the workload and stable sample processing.
  • V-Pipe | Mining viral genomes and improve clinical diagnostics V-Pipe has released a new version specifically adapted to analyze high-throughput sequencing data of SARS-CoV-2. It allows for the detection of within-host genetic variation of SARS-CoV-2 from viral NGS data.

    L. A. Carlisle, T. Turk, K. Kusejko, K. J. Metzner, C. Leemann, C. Schenkel, N. Bachmann, S. Posada, N. Beerenwinkel, J. Böni, S. Yerly, T. Klimkait, M. Perreau, D. L. Braun, A. Rauch, A. Calmy, M. Cavassini, M. Battegay, P. Vernazza, E. Bernasconi, H. F. Günthard, R. D. Kouyos, and Swiss HIV Cohort Study, Viral diversity from next-generation sequencing of HIV-1 samples provides precise estimates of infection recency and time since infection., J Infect Dis, 2019.

  • VIRify VIRify can be used for the identification of coronaviruses in clinical and environmental samples. VIRify is a recently developed, generic pipeline for the detection, annotation, and taxonomic classification of viral and phage contigs in metagenomic and metatranscriptomic assemblies. VIRify’s taxonomic classification relies on the detection of taxon-specific profile hidden Markov models (HMMs), built upon a set of 22,014 orthologous protein domains and referred to as ViPhOGs. Included in this profile HMM database are 139 models that serve as specific markers for taxa within the Coronaviridae family.
  • VBRC Tools for Coronaviruses The VBRC was developed for dsDNA viruses but has been adapted for coronaviruses. Only SARS-CoV-2 and closely related viruses will be added to this database. The VBRC provides unique tools that may be useful for the analysis of SARS-CoV-2.
  • VIRULIGN | Fast codon-correct alignment and annotation of viral genomes VIRULIGN is built for fast codon-correct alignments of large datasets, with standardized and formalized genome annotation and various alignment export formats. VIRULIGN has been adapted to SARS-CoV-2.

    P. J. K. Libin, K. Deforche, A. B. Abecasis, and K. Theys, VIRULIGN: fast codon-correct alignment and annotation of viral genomes, Bioinformatics, 2018.

  • VIGOR4 | Viral Genome ORF Reader VIGOR4 (Viral Genome ORF Reader) is a Java application to predict protein sequences encoded in viral genomes. VIGOR4 determines the protein coding sequences by sequence similarity searching against curated viral protein databases. Vigor4 uses the VIGOR_DB project which currently has databases for the following viruses: Influenza (A & B for human, avian, and swine, and C for human), West Nile Virus, Zika Virus, Chikungunya Virus, Eastern Equine Encephalitis Virus, Respiratory Syncytial Virus, Rotavirus, Enterovirus, Lassa Mammarenavirus. SARS-CoV-2 release is coming (May, 1st).

    S. Wang, J. P. Sundaram, and D. Spiro, VIGOR, an annotation program for small viral genomes, BMC Bioinf, vol. 11, iss. 1, 2010.

  • Rfam COVID-19 Resources In response to the SARS-CoV-2 outbreak, Rfam produced a special release 14.2 that includes 10 new and 4 revised families that can be used to annotate the SARS-CoV-2 and other Coronavirus genomes with RNA families.
  • Covidex | Alignment-free machine learning subtyping for viral species Covidex is an alignment-free machine learning subtyping tool for viral species, based on a random forest model trained over a kmer database. Currently, it supports FMDV and SARS-Cov-2 viral sequences. The tool allows a fast classification in pre-defined clusters (from the Nextstrain database).

Phylogenetic Analysis

  • Nextstrain | Genomic analysis of COVID-19 spread Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data. They provide a continually-updated view of publicly available data with powerful analytics and visualizations showing pathogen evolution and epidemic spread.
  • Phylogenetic Network Analysis Used to Trace COVID-19 Infection Routes Early "evolutionary paths" of COVID-19 in humans was reconstructed using phylogenetic network analysis. By analyzing the first 160 complete virus genomes to be sequenced from human patients, some of the original spread of the new coronavirus have been mapped through its mutations, which creates different viral lineages. Mathematical network algorithm was used to visualise all the plausible trees simultaneously

    Forster P, Forster L, Renfrew C, Forster M. Phylogenetic network analysis of SARS-CoV-2 genomes. Proc Natl Acad Sci U S A. 2020;117(17):9241‐9243. doi:10.1073/pnas.2004999117

  • pangolin | Phylogenetic Assignment of Named Global Outbreak Lineages Pangolin assigns a global lineage to query SARS-CoV-2 genomes by estimating the most likely placement within a phylogenetic tree of representative sequences from all currently defined global SARS-CoV-2 lineages based on the lineage nomenclature.

    A. Rambaut, E. C. Holmes, V. Hill, Á. O’Toole, J. McCrone, C. Ruis, L. du Plessis, and O. G. Pybus, A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology bioRxiv, 2020.

  • BEAST 2 | Bayesian evolutionary analysis by sampling trees BEAST 2 is a cross-platform program for Bayesian phylogenetic analysis of molecular sequences. It estimates rooted, time-measured phylogenies using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST 2 uses Markov chain Monte Carlo (MCMC) to average over tree space, so that each tree is weighted proportional to its posterior probability. BEAST 2 includes a graphical user-interface for setting up standard analyses and a suit of programs for analysing the results.

    R. Bouckaert, T. G. Vaughan, J. Barido-Sottani, S. Duchêne, M. Fourment, A. Gavryushkina, J. Heled, G. Jones, D. Kühnert, N. D. Maio, M. Matschiner, F. K. Mendes, N. F. Müller, H. A. Ogilvie, L. du Plessis, A. Popinga, A. Rambaut, D. Rasmussen, I. Siveroni, M. A. Suchard, C. Wu, D. Xie, C. Zhang, T. Stadler, and A. J. Drummond, BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis, PLOS Comput Biol, vol. 15, iss. 4, p. e1006650, 2019.

  • Phylogeographic reconstruction using air transportation data Phylogeographic reconstruction using air transportation data can be used to study the global spread of the SARS-CoV-2 pandemic, especially in the early phases when air travel still substantially contributed to the spread of the virus. The method is currently adapted to consider both air travel and local movement data within countries during inference to reflect the changing worldwide movements in different phases of the pandemic.

    S. Reimering, S. Muñoz, and A. C. McHardy, Phylogeographic reconstruction using air transportation data and its application to the 2009 H1N1 influenza A pandemic, PLoS Comput Biol, vol. 16, iss. 2, p. e1007101, 2020.

2. related DNAses (Yongjing)

3. RNA (Youhuang)

Resources

tools & databases

RNA motifs

others

  • A widespread Xrn1-resistant RNA motif composed of two short hairpins The 3’ untranslated region of several beny-and cucumovirus RNAs harbors a so-called ‘coremin’ motif that is required for Xrn1 stalling. the minimal benyvirus stalling site consists of two hairpins of 3 and 4 base pairs respectively. The 5’ proximal hairpin requires a YGAD (Y = U/C, D = G/A/U) consensus loop sequence, whereas the 3′ proximal hairpin loop sequence is variable. The sequence of the 9-nucleotide spacer that separates the hairpins is highly conserved and potentially involved in tertiary interactions. A role for Xrn1 and the host decay machinery has only been shown for the SARS coronavirus nsp1. Severe acute respiratory syndrome coronavirus nsp1 protein suppresses host gene expression by promoting host mRNA degradation Expression of nsp1, the most N-terminal gene 1 protein, prevented Sendai virus-induced endogenous IFN-beta mRNA accumulation without inhibiting dimerization of IFN regulatory factor 3, a protein that is essential for activation of the IFN-beta promoter.
  • a conserved BH3-like sequence SARS-CoV E and SARS-CoV-2 E have a C-terminal BH3-like motif and a predicted interactome for E was identified.

4. interactions (Asif)