-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to interpret failed reads during modkit extract #220
Comments
Hello @lkwhite, Could you run |
Looks like we have a whole bunch of These are tRNA sequencing data so those lengths are 85-135 nt and we use BWA MEM instead of mm2 for alignment. |
The log was a little too big to attach so I've split it in two parts. |
Hello @lkwhite, It is possible that remora reference anchoring doesn't emit the correct MN tag or doesn't update it when the sequence length changes. Could you try removing the MN tags and seeing if samtools view -bhx MN ${bam} | modkit extract - extract.tsv --log-filepath test_tags.log If the base modification tags are actually incorrect, you'll get different errors. If this works and you want the Let me know, A |
That reduces the % of reads failing, and the ones that fail now say I couldn't find MN in the sam spec, how is Remora using this tag? |
Hello @lkwhite, The MN tag isn't in the spec yet, and remora doesn't use it. But dorado does, so we need to update the recommendation to remove the tags when using Looks like quite a few reads are failing with the "improper data" error. I've extracted the read ids and attached them to this thread: grep -Ei 'record [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} has improper data' ${fp} | grep -oEi '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}' > malformed_read_ids.txt Could you send me a few of these BAM records? Preferably both before and after reference-anchored base modification inference. If the files are too large (or you don't want them on github) you can email me at art.rand[at]nanoporetech.com and we can work out a way to share them. Thanks. |
Just in case anyone else encounters an issue with large number of skipped or error reads following "reference-anchored" remora base modification calling. If you have previously used base modification calling with The correct work-around is to either not use base modification calling in the original |
I have two datasets:
When I run
modkit extract
on these, I get two very different results:Is this an expected behavior? How should I interpret this?
The text was updated successfully, but these errors were encountered: