Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSA workflow takes so much time. any other ways to get a accurate species tree? #950

Open
Samadhi9 opened this issue Dec 9, 2024 · 3 comments

Comments

@Samadhi9
Copy link

Samadhi9 commented Dec 9, 2024

Hi David,

Thank you for this incredible tool!
I am trying to build orthogroups for 235 plant genomes and I did it successfully. I used the Standard workflow. But the species tree was wrong (wrong outgroup). So, I tried to used MSA workflow (mafft and fasttree) using -t 64 and -a 16 with hope to run iqtree later (I cannot run entire pipeline in one go because of the time limitation in the HPC system I am using). But it is been stuck in forever as in issue #921.

  1. Is there any other way i can increase the speed?
  2. If not, is there any parameter changes I could do in standard workflow to get an accurate species tree?

Thank you in advance!
Samadhi

@Jonathan-Holmes-Bioinformatics

Hi Samadhi9,

To speed up the MSA workflow you can use the new --core --assign function to generate a core orthogroup set and add further proteomes to the pre-computed set of orthogroups resulting in a linear runtime (see github page). This will speed up the OrthoFinder workflow, however it may cost some accuracy in orthogroup assignment.

Alternatively you may be able to build a species tree by aligning a set of single copy orthologs and building a tree from concatenated alignments.

@Samadhi9
Copy link
Author

Samadhi9 commented Dec 20, 2024

Hello Holmes,

Thank you for your reply. I tried --core --assign method; I have DNA data and It gave me the following error.

Command: diamond makedb --in Orthofinder/233_CDS_core/CORE_CDS/OrthoFinder/Results_correct/WorkingDirectory/profile_sequences..10_km.fa -d Orthofinder/233_CDS_core/CORE_CDS/OrthoFinder/Results_correct/WorkingDirectory/profile_sequences..10_kmeans.fa.dmnd

Error: The sequences are expected to be proteins but only contain DNA letters. Use the option --ignore-warnings to proceed

diamond blastp -d Orthofinder/233_CDS_core/CORE_CDS/OrthoFinder/Results_correct/WorkingDirectory/profile_sequences..10_kmeans.fa.dmnd -q Orthofinder/233_CDS_core/CORE_CDS/OrthoFinder/Results_correct/../Results_Dec19/WorkingDirectory/Species62.fa -o Orthofinder/233_CDS_core/CORE_CDS/OrthoFinder/Results_correct/../Results_Dec19/WorkingDirectory/Blast62_-1.txt --more-sensitive -p 1 --quiet -e 0.001 --compress 1

Error opening file Orthofinder/233_CDS_core/CORE_CDS/OrthoFinder/Results_correct/WorkingDirectory/profile_sequences..10_kmeans.fa.dmnd: No such file or directory

I sincerely appreciate it if you could help me to fix this error. Thank you so much in advance.

Best,
Samadhi

PS: The above error was fixed as mentioned in #603

@Jonathan-Holmes-Bioinformatics

Hi Samadhi,

I'm not sure why --core --assign is not working in this case. The method is relatively new and may not be set up for DNA sequences. I will attempt to re-create the problem locally on my end.

As you mentioned in # 603 have you tried installing DIAMOND v2.0.9 and re-running?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants