-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WARNING:Fast5Filter:xxx reads not found! #73
Comments
Hi @markme123 -- thanks for getting in touch. Can you please tell us:
|
fast5_subset_bin = "/opt/miniconda3/bin/fast5_subset" It's hard to tell if it's a problem with some FAST5 files because there are so many files |
Hi @markme123 -- how are you generating your reads list? Where did you get your input files from? Is there a chance you only have the reads from MinKNOW's "pass" output folder, so the filtering has already been completed? |
`#!/usr/bin/env python -- encoding: utf-8 --''' import argparse parser = argparse.ArgumentParser(description="fast5 split, 本程序调用了seqkit") fast5_subset_bin = "/opt/miniconda3/bin/fast5_subset" def reads_list_out(reads_list, out_name): path = os.path.realpath(args.save_path) if args.barcode: |
The above is all my code. The pass and FAIL parts are extracted separately |
Hi @markme123 -- how about the other questions? Where did you get your input files from? Is there a chance you only have the reads from MinKNOW's "pass" output folder, so the filtering has already been completed? |
I have confirmed that FAST5 is all |
Hi @markme123 -- thanks very much for the extra information. I suspect the issue is down to Guppy splitting some of your reads into new ones -- this means that the For example, in your code, change these lines: if args.summary:
data = pd.read_csv(args.summary, sep="\t")
barcode = set(list(data['barcode_arrangement']))
for i in barcode:
failed = data[(data.passes_filtering == 0) & (data.barcode_arrangement == i)]['read_id']
passed = data[(data.passes_filtering > 0) & (data.barcode_arrangement == i)]['read_id']
[...]
if args.summary:
data = pd.read_csv(args.summary, sep="\t")
failed = data[data.passes_filtering == 0]['read_id']
reads_list_out(failed, f"{path}/fail_reads_id_list")
passed = data[data.passes_filtering > 0]['read_id']
reads_list_out(passed, f"{path}/pass_reads_id_list") To this: if args.summary:
data = pd.read_csv(args.summary, sep="\t")
barcode = set(list(data['barcode_arrangement']))
for i in barcode:
failed = data[(data.passes_filtering == 0) & (data.barcode_arrangement == i)]['parent_read_id'] # <== changed to parent_read_id
passed = data[(data.passes_filtering > 0) & (data.barcode_arrangement == i)]['parent_read_id'] # <== changed to parent_read_id
[...]
if args.summary:
data = pd.read_csv(args.summary, sep="\t")
failed = data[data.passes_filtering == 0]['parent_read_id'] # <== changed to parent_read_id
reads_list_out(failed, f"{path}/fail_reads_id_list")
passed = data[data.passes_filtering > 0]['parent_read_id'] # <== changed to parent_read_id
reads_list_out(passed, f"{path}/pass_reads_id_list") Can you try that and see if it works? Note that this method only works with summary files -- it won't work with your |
Fast5_subset fast5 separation pass and FAIL, found that the latest R10 FAST5 has a lot of extraction can not be out .
I checked that there were no fast5 incompletions, and this happened on many R10 versions, but not on R9 .
A third of them didn't come out.
| 3108059 of 5017321|######################## | 61% ETA: 0:24:25
| 3178341 of 5017321|######################## | 63% ETA: 0:23:27
| 3238381 of 5017321|######################### | 64% ETA: 0:22:36
| 3303857 of 5017321|######################### | 65% ETA: 0:21:44
\ 3364211 of 5017321|########################## | 67% ETA: 0:20:59
/ 3378623 of 5017321|########################## | 67% ETA: 0:20:53
| 5017321 of 5017321|#######################################|100% Time: 0:43:08
INFO:Fast5Filter:3377354 reads extracted
WARNING:Fast5Filter:1638696 reads not found!
The text was updated successfully, but these errors were encountered: