Trouble running the example #1

josruirod · 2021-02-11T12:36:25Z

Hi, and congratulations for the software. I want to give a try, and I managed to successfully install it (the conda environment and the step-by-step process is much appreciated, but there are few inaccuracies in the readme, such as FIND instead of find when running, or where is install.sh or the folder where the compressed files for the external software has to be downloaded).
When running the example as it's written in readme, I'm getting the following errors, like it's missing some files:

mv: cannot stat '/bin/finder/example/FINDER_test_ARATH/assemblies_psiclass_modified/combined/psiclass_output_sample_0.gtf': No such file or directory
mv: cannot stat '/bin/finder/example/FINDER_test_ARATH/assemblies_psiclass_modified/combined/psiclass_output_sample_1.gtf': No such file or directory
mv: cannot stat '/bin/finder/example/FINDER_test_ARATH/assemblies_psiclass_modified/combined/psiclass_output_sample_2.gtf': No such file or directory
mv: cannot stat '/bin/finder/example/FINDER_test_ARATH/assemblies_psiclass_modified/combined/psiclass_output_sample_3.gtf': No such file or directory
mv: cannot stat '/bin/finder/example/FINDER_test_ARATH/assemblies_psiclass_modified/combined/psiclass_output_sample_4.gtf': No such file or directory
mv: cannot stat '/bin/finder/example/FINDER_test_ARATH/assemblies_psiclass_modified/combined/psiclass_output_sample_5.gtf': No such file or directory
mv: cannot stat '/bin/finder/example/FINDER_test_ARATH/assemblies_psiclass_modified/combined/psiclass_output_sample_6.gtf': No such file or directory
mv: cannot stat '/bin/finder/example/FINDER_test_ARATH/assemblies_psiclass_modified/combined/psiclass_output_sample_7.gtf': No such file or directory
mv: cannot stat '/bin/finder/example/FINDER_test_ARATH/assemblies_psiclass_modified/combined/psiclass_output_sample_8.gtf': No such file or directory
mv: cannot stat '/bin/finder/example/FINDER_test_ARATH/assemblies_psiclass_modified/combined/psiclass_output_vote.gtf': No such file or directory
Traceback (most recent call last):
File "/bin/finder/finder", line 626, in
main()
File "/bin/finder/finder", line 587, in main
orchestrateGeneModelPrediction(options,logger_proxy,logging_mutex)
File "/bin/finder/finder", line 411, in orchestrateGeneModelPrediction
findTranscriptsInEachSampleNotReportedInCombinedAnnotations(options,logger_proxy,logging_mutex)
File "/bin/finder/scripts/findTranscriptsInEachSampleNotReportedInCombinedAnnotations.py", line 17, in findTranscriptsInEachSampleNotReportedInCombinedAnnotations
combined_transcript_info=readAllTranscriptsFromGTFFileInParallel([combined_gtf_filename,"combined","combined"])[0]
File "/bin/finder/scripts/fileReadWriteOperations.py", line 202, in readAllTranscriptsFromGTFFileInParallel
fhr=open(gtf_filename,"r")
FileNotFoundError: [Errno 2] No such file or directory: '/bin/finder/example/FINDER_test_ARATH/assemblies_psiclass_modified/combined/combined.gtf'

Can you please provide some support?
Thanks

sagnikbanerjee15 · 2021-02-11T13:06:44Z

Thank you so much for trying out the software. I will look into this and get back to you.

Thanks

sagnikbanerjee15 · 2021-02-11T22:44:28Z

Hello @josruirod,

I have made some changes to the scripts and updated the install.sh file. I tested on my end, and it works now. Please give it a try and let me know if you face any issues. Also, please send the progress.log file. The install.sh file is under the main directory and the companion softwares need to be downloaded inside the dep directory.

Thank you.

josruirod · 2021-02-12T08:31:31Z

Hello again, thanks for taking the time.
The step by step guide is awesome, thank you very much for that (in the readme you may want to change "cd finder" instead of "cd Finder" after git clone, and step 11 and 14 are not in the src sub-folder but dep, right?)
The installation run without error, also STAR and olego indexes. However, the example is not working. I'm attaching the progress.log and the .error file (extension changed so github accepts it). The .output is empty. Any ideas? Thanks

FINDER_test_ARATH.error.txt
progress.log

sagnikbanerjee15 · 2021-02-12T18:02:58Z

Hello @josruirod,

Thanks for checking out the updates. I really helps to improve the software on our end.

I decided to keep it as cd Finder and not cd finder because the name of the head directory is Finder. The name of the program is finder. I know this is a bit confusing and I wanted to create a bioconda package but Finder depends on softwares that are not allowed to be redistributed. Let me know if the head directory name is in fact Finder when you download it. On my system it downloads as Finder and not finder.

I checked the error file and I think that, for some reason, finder does not have execution permissions. File permissions could have been altered when you download from GitHub. I have updated the install.sh script so that it explicitly provides execution rights to all files under the main directory.

I went through the log file and I realized where the problem is. You probably executed the install.sh file the second time. This created another entry in the ~.bashrc file. Hence, finder was reading in two locations of the program instead of just one. The best practice would be to remove all previous installations from the .bashrc file which could mess things up. Hence, I have changed the install.sh script to install.py script. Now the installation will attempt to locate previous installations of finder and remove them before installing it anew. It will inspect previous installations and remove them only if those are finder installations. So if you have a different software called finder installed on your system, it will be left touched.

Could you please send me the contents of the ~/.bashrc file?

Please give this a try. And I would recommend that you remove the previous download and start afresh. Also, since you had previous installations of finder it might be a good idea to log out of the system and log back in after you have completed the new installation. Let me know if you run into any other problems.

Thank you.

josruirod · 2021-02-13T12:18:10Z

Hi, thanks to you for the software and the support.

So, in my system in my system, Ubuntu 16.04, git clone is indeed creating the folder "finder". But if that's variable, guess it's not important then.

Regarding bashrc, when I rerun, I did start fresh and removed the previous installation. However, you are right that bashrc had twice the same entry (not different locations). Not sure if that could affect, bu I have restored.

So with a new installation, now it's working until it doesn't find some psiclass output I think. I attach again progress jand error files. According to the progress file, psiclass ended, but the folder where finder it's looking for some files is empty. I also attach a file I've seen inside the psiclass folder, maybe an error with my perl version? Thanks

FINDER_test_ARATH.error.txt
progress.log
combined.error.txt

sagnikbanerjee15 · 2021-02-13T13:14:40Z

Hello,

Thanks for rerunning finder. It is weird that ubuntu would download the folder as finder and not Finder. I will update the README about this issue.

About the bashrc, now it should not matter at all. You can install finder as many times you want. The current installation will remove all the previous installations and preserve the latest one. finder runs some dependent softwares for which the path needs to be provided. Hence, it looks for the installation directory. If there are multiple installations (or multiple entries of the same installation) then it can mess things up. Actually, I am glad you reported this issue. Thanks!

I checked the progress.log file and combined.error.txt. I do think it is a perl issue. It is weird that your system is having trouble with multi-threaded support since the environment explicitly install a library for thread support in perl (line 147 of environment.yml). Could you please run which perl and perl -V|grep threads? In the output you should see useithreads=define. In fact psiclass was unable to run without multi-threaded support. I will get in touch with the developers of psiclass to see if they can help us.

Thank you.

sagnikbanerjee15 · 2021-02-13T17:44:57Z

Could you also post the output of perl -v?

Thank you.

josruirod · 2021-02-14T19:11:35Z

Hi, so I decided reboot and git clone/install fresh everything again. So I got during install.sh many errors, operattion not permited for chmod (i tried to execute install.py with sudo but it didn't work). I rerun the example and I'd say now it failed in other step, in braker?
I attach again the progress and the error files. Sorry for the trouble, let's see if we get it solved. Thanks
progress.log
FINDER_test_ARATH.error.txt

sagnikbanerjee15 · 2021-02-14T19:29:25Z

Hello @josruirod,

It's really great that you are testing finder out on your system. It gives us a chance to fix things that are system dependent.

Do not run the install.sh directly. Execute the install.py file. Could you elaborate on the errors you encountered during this step? chmod should not have encountered errors with permission since you are the owner of the files whole permission you wish to change. I will try to install this on different systems to check if I can reproduce the error.

From the progress.log file I can see that all the steps have finished properly. In fact, the problem with perl too seems to have been resolved. Could you please attach the /mnt/data/i7_HDD_4TB/Temp/example/FINDER_test_ARATH/braker.output and /mnt/data/i7_HDD_4TB/Temp/example/FINDER_test_ARATH/braker.error file?

From now on you do not have to remove the output directory. Just run the same command. finder will go through the files and determine the last point of failure and will launch the execution from that point onwards.

Thank you.

josruirod · 2021-02-15T09:18:07Z

Hi, happy to help.

Sorry for the typo, I meant I executed the install.py file. So, I attach the braker.error. Braker.output is empty. I removed the finder directory, git cloned and installed again, and I attach a log of the installation (this log is 400mb, so I attach a link to a cloud).
Nice to know I don't have to remove the output directory

braker.error.txt
https://saco.csic.es/index.php/s/x75wDFHdLWTEpNX

sagnikbanerjee15 · 2021-02-15T17:38:52Z

Hi @josruirod,

Sorry for the late reply and Thanks for your feedback. I had made some changes to the dep directory and I forgot to make the required alterations in the accompanying install.sh file. Sorry for that!! I have removed install.sh and have migrated all the installation commands to install.py. I realized having two different installations file could be a source of confusion.

The reason why you were getting 400MB worth of error messages is because python was trying to change the permissions of all the files and folders on your system. That should not happen anymore. There was a single line that was failing in install.sh which led to the error cascade.

Since, there were errors in the installation process, GeneMark was not properly installed. Please repeat the installation process and you should be all set.

I have included another option in finder to reduce the size of the downloaded fastq files. This will make the test run finish quicker. Please re-run finder with --run_tests option

Thank you.

josruirod · 2021-02-16T12:01:33Z

Hi, and thanks again for the support.
So I executed a fresh installation, no errors, and it ran for a longer time. I now see files in the "final_GTF_files" folder. But I think braker is still failing. I'm attaching the logs, could you check what's wrong please? Braker.gtf is empty.

Thank you

progress.log
FINDER_test_ARATH.output.txt
braker.error.txt
braker.log
braker.output.txt
FINDER_test_ARATH.error.txt

sagnikbanerjee15 · 2021-02-16T12:40:30Z

Hello,

Thanks for running the pipeline again. I am glad to note that it completed running this time. I too was running finder on my end and I was able to reproduce the exact same error as you encountered with braker2. I figured that it is a dependency issue caused by depreciation of some functions in BioPython. The error seems to have occurred in a python script from Augustus. There might be more of such cases. I am attending to those now. I will let you know when everything runs smoothly so that you can give it a try.

Thank you.

josruirod · 2021-02-16T19:42:31Z

Thanks for the explanation! Looking forward to the refined version then!

AnnabelWhibley · 2021-02-16T20:22:06Z

Hello. Thank you both for raising and troubleshooting these issues. I am keen to run Finder and have been running into the same stumbling blocks you report (i.e. now with final gtf files written but an empty braker gtf). I'm looking forward to the next iteration. Thank you for addressing this so quickly.

sagnikbanerjee15 · 2021-02-16T20:35:52Z

Hello @AnnabelWhibley and @josruirod,

Thank you very much for trying out finder. The braker2 run has completed and is now producing the .gtf files. Luckily, the deprecation issue was confined to a single python script in Augustus which I have now replaced with the latest one. Please go ahead and give it a try.

Thank you.

josruirod · 2021-02-17T09:44:24Z

HI there, happy to hear that. I'd say I've reinstalled and reran as previous times, but I'm getting the following error, even when trying just to execute finder help:

File "~/bin/finder/finder", line 263
cmd=f"touch {options.temp_dir}/dummy_protein.fasta"
^
SyntaxError: invalid syntax

Nothing is done due to this error, any input? I may have done something wrong this time

Thanks again

AnnabelWhibley · 2021-02-17T10:23:51Z

My re-install was running OK but crashed out with a memory error during the braker2 phase so I have relaunched it with more resources. Fingers crossed. As an aside, I did find that I needed to manually install the ruffus module into the conda environment
.

josruirod · 2021-02-17T12:53:57Z

My latest problem was a silly mistake, I had not activated the conda environment. It's running now and I'll confirm when it fnishes the example and also a run test with my own data. Thanks

sagnikbanerjee15 · 2021-02-17T13:14:34Z

Hi @josruirod,

Happens to the best of people!! I am glad you figured it out. Let me know if you run into any issues.

Thank you

sagnikbanerjee15 · 2021-02-17T13:16:58Z

Hi @AnnabelWhibley,

Yes, braker can take quite a bit of memory to operate. I did not encounter any issues with the ruffus package while setting up finder in conda. It could possibly be a system issue. Did the installation complete? Or did it give any errors? Could you also send the configuration of the system where you are running finder? I will try to replicate the issue on my end to troubleshoot.

Thank you.

AnnabelWhibley · 2021-02-17T21:21:50Z

Thank you @sagnikbanerjee15. Re: ruffus conda package issue, I can't replicate this and it was easily fixed. I see no errors on installation either time, and the package is clearly there in the environment list. I don't think it is worth your time troubleshooting, it might be some strange SLURM incompatibility at my end.
Run completed (with bug reported in issue#2). Run with 16 threads, max mem usage ~30Gb.

sagnikbanerjee15 · 2021-02-17T21:26:44Z

Hello @AnnabelWhibley,

I am glad to know that the run has completed. I have fixed the bug reported in issue#2.

Thanks.

AnnabelWhibley · 2021-02-17T21:33:21Z

@sagnikbanerjee15. Thank you again for all your support and for developing this tool.

sagnikbanerjee15 · 2021-02-17T21:37:11Z

@AnnabelWhibley It's my pleasure.

josruirod · 2021-02-19T07:30:03Z

Hi, I can confirm the example is now running fine, including the issue #2. Thanks for the work. The gff files from braker seem to be now fine. I'm now proceeding to apply this to my own data. A small side note, in the readme it is said that the columns Description, Read Length (bp), Date, are not necessary nor used, but if they are removed the program fails. It seems they are necessary, even if they are empty or with dummy values. You may want to specify or fix that.

Thanks

sagnikbanerjee15 · 2021-02-19T14:18:12Z

Hi @josruirod,

That's awesome! I am glad that the example is up and running now. Thanks for pointing out the issue with metadata. I have updated the README.

Thank you.

josruirod · 2021-03-03T08:53:56Z

Hi, I canc onfirm finder has now worked with my own datasets and organism. Thanks. Just a side note, it seems to be important that all the columns in the metadata are kept, and not any dummy value can be used or the program will fail.
Thanks for the support

sagnikbanerjee15 · 2021-03-03T11:05:57Z

Hi @josruirod,

Thank you so much for trying finder out on your own dataset. I will test finder with the metadata issue you mentioned. Could you please tell me on which step finder is failing when you leave fields vacant in the metadata file?

Thank you.

josruirod · 2021-03-03T12:06:54Z

Hi, I have not kept those runs, but just try to remove the columns that are said to be not essential, such as description, try a dummy name in Project name other than its't not following the ID format (PRBXXX...), try to use "-" in Description... I think it would be needed to improve that documentation because I had to spend some time tuning the metadata.

Thanks again for the support

josruirod · 2021-03-08T12:43:23Z

HI, just fyi. I've found that in another system I had to manually change the limit of files with ulimit. Otherwise I think STAR was failing, and the error message was not too explicative.

Regards

sagnikbanerjee15 · 2021-03-08T14:43:14Z

Hello @josruirod,

Thank you for reporting this issue. Yes, it is an issue with STAR. I will add a check in finder to verify the ulimit and report warnings.

Thank you.

heri-v · 2021-05-11T15:16:05Z

Hellow Sagnik
Thank you for the good Finder tool.
I followed all installation instructions for finder and everything worked fine except for the command below

(finder_conda_env) -bash-4.2$ finder -h
-bash: finder: command not found

sagnikbanerjee15 · 2021-05-11T15:18:53Z

Thank you for trying out finder. Did the ./install.py command execute properly? Also, you will need to modify the contents of the bashrc file by issuing the commands after ./install.py. Could you please send me the contents of the bashrc file? I can verify that the installation has worked properly or not. You can view its contents via the command cat ~/.bashrc

sagnikbanerjee15 · 2021-05-11T15:19:07Z

Hello Herieth,

Thank you for sending me the contents of the bashrc file. It does seem like finder has been properly installed. Did you execute this command "export PATH=$PATH:$(pwd)"? If it does not work, could you try to log out and log back in again?

As with the metadata file, you have the option of either constructing it in the server or on a csv file in your system. Usually, if you are working with a large number of files, it is easier to create the file locally and then transfer it to the server. If you can tell me Accession numbers and the endedness (paired or single) of the data I can create the metadata file and send it to you. I did have some other users report issues with generating this file. I am working on a way to make this as easy as possible.

heri-v · 2021-05-11T15:45:37Z

?Hey Sagnik,

Thank you so much for the inputs, I exited the server and re-logged in and it worked!

Your paper is awesome!!!

Thank you once again!

sagnikbanerjee15 · 2021-05-11T15:47:23Z

Hello @heri-v,

Thank you for your kind words. I will be happy to assist you in the process. First, let us concentrate on getting the gene annotation, and then we will move forward with differential gene analysis. The accession that you are referring to is the genome. In addition to that, we will require some RNA-Seq samples. Do you have RNA-Seq samples for this organism? If you could share the link of the genome I can check to see if there is any RNA-Seq data on NCBI that you can use. Also, which organism are you working with?

Thank you.

sagnikbanerjee15 · 2021-05-11T16:02:29Z

Hi @heri-v,

Thanks for sending me the genome. I checked NCBI and found that there are several RNA-Seq samples. Let me know if you would like me to prepare the metadata file for you. Please go through the dataset and let me know if there are specific projects you wish to retain to remove.

https://www.ncbi.nlm.nih.gov/sra?term=((Manihot%20esculenta%5BOrganism%5D)%20AND%20"transcriptomic"%5BSource%5D)%20AND%20"illumina"%5BPlatform%5D

Thank you.

heri-v · 2021-05-11T16:08:41Z

?Dear Sagnik, I think combining all 39 hits will increase the chances of getting novel transcripts

sagnikbanerjee15 · 2021-05-11T16:22:53Z

Hello @heri-v,

Yes, I agree that having more RNA-Seq samples will enrich your annotation. Actually, there are 763 RNA-Seq samples in total. While finder is perfectly capable of handling such a large sample, it will take a long time to process all the data. Hence, I would suggest that you go through the RNA-Seq samples and select samples that will create a diverse dataset. You can do that by selecting RNA-Seq samples from a variety of different tissue types and conditions.

Thank you.

heri-v · 2021-05-11T20:30:26Z

There are different olego_index.* files [ olego_index.amb, olego_index.amb, olego_index.ann etc.] which one needs to specified in the below command?

The command run smoothly without errors but the output folder was no where to be seen?

finder -no_cleanup -mf Arabidopsis_thaliana_metadata.csv -n $CPU -gdir_star $PWD/star_index_without_transcriptome -out_dir $PWD/FINDER_test_ARATH -g $PWD/Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa -p $PWD/uniprot_ARATH.fasta -gdir_olego olego_index -preserve 1> $PWD/FINDER_test_ARATH.output 2> $PWD/FINDER_test_ARATH.error

sagnikbanerjee15 · 2021-05-11T21:31:32Z

Hello @heri-v,

Please specify just olego_index. finder will obtain the rest of the files automatically.

If the output folder was not created, then finder must have run into an error. Could you please check the error file?

Thanks.

heri-v · 2021-05-11T21:36:44Z

cd ../example

(finder_conda_env) -bash-4.2$ cat FINDER_test_ARATH.error
usage: finder [-h] --metadatafile METADATAFILE --output_directory
OUTPUT_DIRECTORY --genome GENOME [--cpu CPU]
[--genome_dir_star GENOME_DIR_STAR]
[--genome_dir_olego GENOME_DIR_OLEGO] [--verbose VERBOSE]
[--protein PROTEIN] [--no_cleanup] [--preserve_raw_input_data]
[--checkpoint CHECKPOINT]
[--perform_post_completion_data_cleanup] [--run_tests]
[--addUTR] [--skip_cpd] [--exonerate_gff3 EXONERATE_GFF3]
finder: error: argument --cpu/-n: expected one argument

sagnikbanerjee15 · 2021-05-11T21:38:37Z

You need to set $CPU=30, or how many ever cores you have access to.

Thanks

heri-v · 2021-05-11T21:59:44Z

You need to set $CPU=30, or how many ever cores you have access to.

Thanks

Thanks It worked

sagnikbanerjee15 · 2021-05-11T22:19:38Z

Great! I am glad it worked out.

I will try my best to improve the tutorial.

Thank you.

heri-v · 2021-05-11T22:22:09Z

Thank you, I will get back to you.

heri-v · 2021-05-11T23:09:40Z

After running finder, I checked results in the output directory

finder -no_cleanup -mf Arabidopsis_thaliana_metadata.csv -n $CPU=30 -gdir_star $PWD/star_index_without_transcriptome -out_dir $PWD/FINDER_test_ARATH -g $PWD/Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa -p $PWD/uniprot_ARATH.fasta -gdir_olego olego_index -preserve 1> $PWD/FINDER_test_ARATH.output 2> $PWD/FINDER_test_ARATH.error

cd ....................../FINDER_test_ARATH/final_GTF_files

ls

#Unfortunately the output file was empty.

After checking for errors I got below results

cd /home/herieth/FINDER/Finder/example

cat FINDER_test_ARATH.error

EXITING: fatal input ERROR: runThreadN must be >0, user-defined
runThreadN=0
May 12 00:39:38 ...... FATAL ERROR, exiting