Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running workflow using RefSeq genomes instead of Ensembl #10

Open
MauriAndresMU1313 opened this issue Nov 27, 2024 · 0 comments
Open

Running workflow using RefSeq genomes instead of Ensembl #10

MauriAndresMU1313 opened this issue Nov 27, 2024 · 0 comments

Comments

@MauriAndresMU1313
Copy link

Great for your workflow, really useful guide to perform Genome Annotation.
Unfortunately, I got stuck in the step 4c. Create CDS-only annotation bed file, because I only obtain empty files. I think that I know the reason, so here is my hypothesis.

I'm using genomes from RefSeq, and so far in a lot of examples, if not all, I found that the genome annotation relies on genomes from Ensembl, which of course have another notation related to the IDs.

So, I started running the script WriteChromLengthBedFromFasta.py, using a *_genomic.fna file from the following link. For example, that file only contains a fasta file with the sequence of chromosome 1. After the conversion, my new bed file looks like this:

NC_000001.11	0	248956422
NT_187361.1	0	175055
NT_187362.1	0	32032
NT_187363.1	0	127682
NT_187364.1	0	66860
NT_187365.1	0	40176
NT_187366.1	0	42210
NT_187367.1	0	176043
NT_187368.1	0	40745
NT_187369.1	0	41717

The second column contains only 0. This output made me understand that at some point I got an empty file because the information was incorrect, why I say that?, because other examples available contain 0 and other numbers, so my question is:

  • What do you think is happening?
  • Did anyone try to run this with RefSeq genomes too?
  • Do I need to use another .fna file?

I really want to avoid changing to Ensembl genomes, because I had a lot of work already done with RefSeq data.
Any comment about what could be the source of the issue is more than welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant