-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutect2 Matched samples purity estimates fails #120
Comments
Looks like you filter out known SNPs. We need the SNPs for allele-specific copy number calling (and PSCBS requires them for annotation). M2 requires the -genotype-germline-sites flag to call germline variants. There is also something wrong with your off-target reads. For WES, they don't add much. If you cannot figure out why the log-ratio standard deviation is so high, you probably get better results without those. |
Thank for your quick reply. I indeed use the "-genotype-germline-sites" to detect germline variants with Mutect2 and assuming the downstream filtering of M2 only assigns a flag to the FILTER field and not remove actual variants.
How would I run analysis without using "log-ratio standard deviation ". Is "--undosd" is an appropriate choice here? just a note I am using xGen IDT BAIT file for all the analyses. Best, |
I would re-run CNVkit without the off-target feature. The lines log-ratio standard deviation display the noise of the tumor vs normal coverage log2-ratio. In your case, the off-target ratios are super noisy, likely the reason you get the crash. You can also try running our normalization instead of CNVkit. Not sure I understand your comment you don't have normal samples to build the NormalDB. You have the normals to run Mutect? |
Thanks again, understood now I will try PureCN normalization. |
Hello Markus, Thanks for all of your suggestions. This time the SD for log-ratios seems OK however, after running almost till end I got below error, for which I can't find any related post. One thing I came across was the SOMATIC flag from issue #108 as well noticed in this tutorial section 2.1 For M2 do I need to run VariantAnnotation to add SOMATIC flag ? and is that what is causing all the issues? Attached here the log file.
Thanks again for your time, |
Hi Nihir, looks like there is an issue where the mapping bias is NaN. This is likely causing the crash. I'll investigate, thanks for reporting. You should not need the SOMATIC flag for M2, it should automatically generate it. The log file looks better now, indeed. Markus |
Can you try again with the M2 VCF as it is generated, i.e. without adding the COMMON flag etc.? |
Hello Markus, Here's the error. When I don't annotate with DB it seems to predict all variants as Somatic. How is it identifying if the variant is somatic or nor ? I don't see any mention of the somatic flag on recent M2 documentation.
Thanks for your continuous support on the issues. Best. |
Did you run Mutect with -genotype-germline-sites? Otherwise it will only call somatic. |
Yes, I enabled the flag. Here is my command line for M2
|
Hmmm. 4000 variants is a weird number if this is whole exome. It's too high for only somatic, but far too low if it includes germline. Do you see something weird when you look at the P_GERMLINE and POP_AF fields? I.e. does it cover the whole expected range for 0 to 0.5-1? |
Sorry if I didn't clarify, the last data is from Panel sequencing of about 400 genes. For my WES run, I was getting variants in a range of 70k to 80K.
|
Oh, ok. I'll add code to check that POPAF is on log-scale and support for GERMQ. Thanks for reporting! |
Hi Nihir, have a look when you have a chance. I don't have matched tumor/normal test data readily available right now, but the latest commit should work. |
Hello Markus, Wow! That was fast. Thank you very much for the quick response. I just tested the code from the latest commit and worked perfectly fine without any additional flags (i.e dbinfoflag or popinfoflag). It can successfully detect the somatic mutation and runs without any error. Seems like no additional annotations are required for Matched samples ran with latest M2. Here are the lines from log file just for information
Best, |
Great. Closing now. Have a look at the Sampleid.pdf output, the SNPs should cleanly follow the provided segmentations. Also check the Sampleid_variants.csv file (or vcf if you provided --vcf). There might be improvements possible, for example variants that should be filtered based on their flags. Please open new issues with those suggestions. |
I just found a bug that essentially turned off all flag filtering for M2 VCFs. |
No worries, thanks for the heads up. Let me know whenever you have an update. Meanwhile, I will take a closer look at the results as you suggested. |
I pushed a commit already. It should remove most variants not labeled as PASS, except germline and PoN. |
Hi, Thanks a lot! |
Just follow the somatic best practices as closely as possible: https://gatk.broadinstitute.org/hc/en-us/articles/360035531132 . So yes, include all filtering steps. To turn the GenomicsDB directory into a PureCN mapping bias RDS file, you can use the Docker image which comes with the genomicsdb R package. If you have matched normals, be sure to provide --genotype-germline-sites true to not remove high quality germline SNPs. Also see #320. |
@lima1 Thank you so much for prompt reply, Again, Thanks for providing this wonderful tool! |
You just provide the VCF as generated by Mutect2 with the --genotype-germline-sites. It needs the tumor SNP allelic fractions, but will use the normal allelic fractions for bias estimation when available. |
Hello Team,
I am trying to use PureCN with a matched tumor and normal samples, however, it's throwing an error. I am attaching the log file here. I have used CNVkit for segmentation and Mutect2 (GATK 4.1.7.0) for variant detection.
Tumor01.PureCN.log
I tried a few things to troubleshoot, none seems to be working.
Any suggestion to resolve the error is appreciated. If there is an issue with samples itself as stated in log file, what would be an alternative way to confirm the behavior?
A note, I don't have normal samples to run NormalDB.
Best,
nihir
The text was updated successfully, but these errors were encountered: