Adding ONT read support for ampliseq #741

ShareiFae · 2024-05-13T12:38:36Z

Description of feature

Would it be possible to add support for Oxford Nanopore Technology reads? I've found two projects (the second integrating the first) that might allow a quick implementation.

https://github.com/treangenlab/emu
https://github.com/genomic-medicine-sweden/gms_16S

d4straub · 2024-06-18T11:56:29Z

Hi @ShareiFae ,
emu seems indeed interesting, its also available as container https://bioconda.github.io/recipes/emu/README.html.
However, if I understood correctly, it compares sequences to a database and makes count tables from that. It is not a reference-free (de-novo) representative sequence (OTU/ASV/whatever) calculation like DADA2 or Deblur. Therefore I have the impression the tool is misplaced in this pipeline. Let me know if you have a different opinion.

ShareiFae · 2024-06-18T13:32:07Z

A shame, the downstream analysis and the report of this pipeline is really really good and I would like it to have it with nanopore data even it they cannot be done exactly the same way.

With EMU, as demonstrated by this other pipeline, you can create a phyloseq object that functions as Pseudo OTU tables:
https://github.com/josephpetrone/RESCUE/blob/master/detailed_steps.md#rstudio-and-phyloseq-handoff
https://github.com/josephpetrone/RESCUE

I've also found a pipeline that utilizes Quiime:
https://github.com/MaestSi/MetONTIIME

samuell · 2024-12-12T12:48:38Z

Hi @d4straub , I'm interested in hearing more about this. Is it the fact that it brings a database at all that you see as a problem here, or the size of the database?

EMU ships with a bundled database as an 84MB fasta file as part of the repo, but allows customizing this database, if that clarifies anything.

d4straub · 2024-12-12T13:07:58Z

Hi there,
databases are no problem. The "problem" is that many parts of the pipeline use ASV (/OTU) sequences. And EMU doesnt support that, is that correct?
There was/is also a discussion of nanopore analysis in nf-core slack: https://nfcore.slack.com/archives/CEA7TBJGJ/p1707383491473489
Essentially, newer nanopore data (from recent flowcells with high quality) might be possible to process with the pipeline and DADA2 with minor modifications (see benjjneb/dada2#759). Also possible is clustering with VSEARCH (that would be relatively easy to add into the pipeline).
Generally it is not only necessary to identify a solution, but also implementing it. I have neither the need nor time to do so. But if someone want to tackle that, I'm supportive.

ShareiFae added the enhancement New feature or request label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding ONT read support for ampliseq #741

Adding ONT read support for ampliseq #741

ShareiFae commented May 13, 2024

d4straub commented Jun 18, 2024

ShareiFae commented Jun 18, 2024

samuell commented Dec 12, 2024

d4straub commented Dec 12, 2024

Adding ONT read support for ampliseq #741

Adding ONT read support for ampliseq #741

Comments

ShareiFae commented May 13, 2024

Description of feature

d4straub commented Jun 18, 2024

ShareiFae commented Jun 18, 2024

samuell commented Dec 12, 2024

d4straub commented Dec 12, 2024