Workflow for identifying long-read novel sequences
Make sure you have installed all of the following prerequisites on your machine:
• python2
• minimap2
• NUCmer
• porechop
• NanoFilt
• bedtools
• kraken2
• RepeatMasker
python AF-NS.py -kraken_db db_folder -i input.fq -r ref.fa -o output_folder
We build kraken2 DB including archaea, bacteria, fungi, plasmid, viral and UniVec datasets, the link is as follows
http://www.bio8.cs.hku.hk/novel/kraken2_db.tar
Novel sequences: output_folder/novel.fa