Skip to content
Luke Thompson edited this page Mar 8, 2016 · 14 revisions

Welcome to the Wiki of the Earth Microbiome Project

EMP20k Workflow

Processing step Closed reference Open reference Swarm
OTU picking SortMeRNA vs GG97 v13.8 Sumaclust/SortMeRNA vs rep_set.fa Swarm v2
Taxonomy assignment (comes with GG) SortMeRNA (same as open-ref)
Alignment (comes with GG) SSU-ALIGN(a) (not PyNAST) (same as open-ref)
Tree building (comes with GG) FastTreeMP(b) (same as open-ref)
Outputs (+Map) BIOM, Taxonomy, Tree BIOM, Taxonomy, Tree BIOM, Taxonomy, Tree

(a): Greg's preferred option. (b): With double precision.

Analysis: core Details and challenges
Alpha diversity Calculated per sample, easily parallelizable
Beta diversity Memory limitations with new algorithm (see Daniel's section below)
Principal coordinates Now works with up to 50k samples using conda installation of scipy
Taxa summaries Calculated per sample, easily parallelizable
Analysis: extended Details and challenges
GitHub issues -- start here https://github.com/biocore/emp/issues -- work on these, add more
Slides from old talks https://github.com/biocore/emp/tree/master/presentations and Google Drive
IPython notebook for plots Seaborn, Emperor, Qiime results
Group significance Dependent on specific questions
Machine learning Somewhat dependent on specific questions
Co-occurrence Display on the VROOM with Juergen
Phylogenetic trees ETE, display on the VROOM with Juergen
Other Yoshiki and Jamie have code/ideas, Bobby Prill meta-analysis
Clone this wiki locally