-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chromosome order #146
Comments
genomepy does not (currently) support this. This snippet should do the trick:
|
Thank you. Is there a way to perform this, but still use genomepy to manage the new sorted genome? |
Assuming that with manage, you mean also applies this sorting to the support files (fa.sizes, gaps.bed & fa.fai). If so, you can run this after the previous code:
if oldfa and newfa were the same it should just overwrite the old stuff (check if this does what you intended though) |
OK, thank you! |
Thank you for the offer, but I don't think this is a desired feature. Chromosomes are sorted by size, which all tools I know can use, and some tools might not work otherwise (due to assumptions). Natural sorting would also fail for certain genomes (such as those that use roman numerals). |
Are chromosomes sorted by size? That's not what I experienced, for me they were sorted alphabetically, as I indicated in the first post. |
My bad, you're right about the default order. |
Doesn't need to be a function, just the actual list of chromosome names in the right order would work. |
Hi @Phlya, I think we're hesitant to add functionality to genomepy where the use-case is not clear as this is functionality that needs to be tested and maintained down the line and adds additional complexity to the command line tool. |
No worries, I understand if this functionality might be not the most important. My main use-case is that I want to use genomepy in a pipeline to manage the genome reference. Different tools and steps rely on a chromsizes file, and that needs to align with how the data is processed, including the order of chromosomes, and some tools also need the genome sequence. I need to investigate if the order of chromosomes in the fasta file needs to match the order in the chromsizes file, but even if it's not strictly required for the tools I am using now, I am not keen on the idea of having a chromsizes file with a different order than the fasta - I feel that's a dangerous combination that can lead to errors in some cases. And of course I'd rather have my chromosomes sorted in the logical order (which matches the natural sorting for human, but could be arbitrarily different in other organisms). I hope this explains my use case and gives you some context. Thanks! |
Is there a way to specify chromosome order when installing a genome? E.g., when downloading a human genome I want the chromosomes to be sorted in the "normal" way aschr1, chr2, ... chr10, ... chr22, chrX, chrY, chrM. But currently they are sorted as chr1, chr10, ..., chr2, ..., chr22, chr3, ... chr9, chrM, chrX, chrY. Is it just the same order as in UCSC (I downloaded from there)?
The text was updated successfully, but these errors were encountered: