MSA workflow takes so much time. any other ways to get a accurate species tree? #950

Samadhi9 · 2024-12-09T19:23:37Z

Hi David,

Thank you for this incredible tool!
I am trying to build orthogroups for 235 plant genomes and I did it successfully. I used the Standard workflow. But the species tree was wrong (wrong outgroup). So, I tried to used MSA workflow (mafft and fasttree) using -t 64 and -a 16 with hope to run iqtree later (I cannot run entire pipeline in one go because of the time limitation in the HPC system I am using). But it is been stuck in forever as in issue #921.

Is there any other way i can increase the speed?
If not, is there any parameter changes I could do in standard workflow to get an accurate species tree?

Thank you in advance!
Samadhi

Jonathan-Holmes-Bioinformatics · 2024-12-17T11:24:05Z

Hi Samadhi9,

To speed up the MSA workflow you can use the new --core --assign function to generate a core orthogroup set and add further proteomes to the pre-computed set of orthogroups resulting in a linear runtime (see github page). This will speed up the OrthoFinder workflow, however it may cost some accuracy in orthogroup assignment.

Alternatively you may be able to build a species tree by aligning a set of single copy orthologs and building a tree from concatenated alignments.

Samadhi9 · 2024-12-20T05:28:51Z

Hello Holmes,

Thank you for your reply. I tried --core --assign method; I have DNA data and It gave me the following error.

Command: diamond makedb --in Orthofinder/233_CDS_core/CORE_CDS/OrthoFinder/Results_correct/WorkingDirectory/profile_sequences..10_km.fa -d Orthofinder/233_CDS_core/CORE_CDS/OrthoFinder/Results_correct/WorkingDirectory/profile_sequences..10_kmeans.fa.dmnd

Error: The sequences are expected to be proteins but only contain DNA letters. Use the option --ignore-warnings to proceed

diamond blastp -d Orthofinder/233_CDS_core/CORE_CDS/OrthoFinder/Results_correct/WorkingDirectory/profile_sequences..10_kmeans.fa.dmnd -q Orthofinder/233_CDS_core/CORE_CDS/OrthoFinder/Results_correct/../Results_Dec19/WorkingDirectory/Species62.fa -o Orthofinder/233_CDS_core/CORE_CDS/OrthoFinder/Results_correct/../Results_Dec19/WorkingDirectory/Blast62_-1.txt --more-sensitive -p 1 --quiet -e 0.001 --compress 1

Error opening file Orthofinder/233_CDS_core/CORE_CDS/OrthoFinder/Results_correct/WorkingDirectory/profile_sequences..10_kmeans.fa.dmnd: No such file or directory

I sincerely appreciate it if you could help me to fix this error. Thank you so much in advance.

Best,
Samadhi

PS: The above error was fixed as mentioned in #603

Jonathan-Holmes-Bioinformatics · 2025-01-06T15:02:37Z

Hi Samadhi,

I'm not sure why --core --assign is not working in this case. The method is relatively new and may not be set up for DNA sequences. I will attempt to re-create the problem locally on my end.

As you mentioned in # 603 have you tried installing DIAMOND v2.0.9 and re-running?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSA workflow takes so much time. any other ways to get a accurate species tree? #950

MSA workflow takes so much time. any other ways to get a accurate species tree? #950

Samadhi9 commented Dec 9, 2024

Jonathan-Holmes-Bioinformatics commented Dec 17, 2024

Samadhi9 commented Dec 20, 2024 •

edited

Loading

Jonathan-Holmes-Bioinformatics commented Jan 6, 2025

MSA workflow takes so much time. any other ways to get a accurate species tree? #950

MSA workflow takes so much time. any other ways to get a accurate species tree? #950

Comments

Samadhi9 commented Dec 9, 2024

Jonathan-Holmes-Bioinformatics commented Dec 17, 2024

Samadhi9 commented Dec 20, 2024 • edited Loading

Jonathan-Holmes-Bioinformatics commented Jan 6, 2025

Samadhi9 commented Dec 20, 2024 •

edited

Loading