Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding version 2.4 PureCN #311

Closed
maxanes opened this issue Jul 26, 2023 · 10 comments
Closed

Question regarding version 2.4 PureCN #311

maxanes opened this issue Jul 26, 2023 · 10 comments

Comments

@maxanes
Copy link

maxanes commented Jul 26, 2023

Hi,

We are using version bioconductor-purecn V 2.0.2 for our analysis of the selection of clonal mutations, recently we have tried V 2.4.0 and noticed that many mutations are filtered out specifically ones that have BQ < 25, but our VCF file has only MBQ, meaning that uses that for filtering?
The problem is that those are mutations relevant to us such as KRAS/TP53 and so on. We have tried to set min.supporting reds =0 but didn't turn off this filter. Have you any comments/recommendations for the version to use?
Attached are log files for the same sample run with both versions
C14_ffpe.purecn_V2.0.2.log
C14_ffpe.purecn_V2.4.0.log
Thank you in advance.

@lima1
Copy link
Owner

lima1 commented Jul 26, 2023

I can add an option to change the 25 today. The default is pretty low though, so you might want to try to remove the artifacts among them upstream.

@maxanes
Copy link
Author

maxanes commented Jul 26, 2023

I am not sure about this threshold of 20 or 25, does that mean that in the previous version 2.02 variants such as KRAS mutation called clonal with MBQ 20 are not true?
It seems that V2.02 ends up with 19340 variants to use while V2.4 uses only 5002 variants.

@lima1
Copy link
Owner

lima1 commented Jul 26, 2023

5000 variants is wrong for a WES panel, should be at least 20000 heterozygous SNPs. Are you using Mutect2 in a recent GATK4? Any steps that deviate from their best practices? Do you use the baits file (location of baits, not exons) in IntervalFile.R? Do you run Mutect with interval padding to get SNPs in introns? With your coverage, you can use 100bp padding.

@maxanes
Copy link
Author

maxanes commented Jul 26, 2023

I also think that 5000 is too little, yes we use Mutect2 in a recent GATK4, the same baits file (not sure which of those, might be exons) I use for both versions and running it in the same way, Mutect with interval padding 50

@lima1
Copy link
Owner

lima1 commented Jul 26, 2023

Any deviation from here: https://gatk.broadinstitute.org/hc/en-us/articles/360035531132--How-to-Call-somatic-mutations-using-GATK4-Mutect2 ?

The contamination step is not critical, but in general worth it. Same with the baits file. The baits locations give you the cleanest signal.

@maxanes
Copy link
Author

maxanes commented Jul 26, 2023

The only thing that is different is that we use a normal sample instead of pon for Mutect2, and the DB annotation that we do afterward.

@lima1
Copy link
Owner

lima1 commented Jul 26, 2023

The PoN is a powerful way of removing artifacts, I would consider it. Your input VCF contains a lot of artifacts. I can't remember if Mutect2 is using GnomAD in the likelihood model, it might.

@lima1
Copy link
Owner

lima1 commented Jul 26, 2023

And not forget using the --genotype-germline-sites with matched normals (I'm sure you do).

@lima1
Copy link
Owner

lima1 commented Sep 9, 2023

Also likely hit by #320.

@lima1 lima1 closed this as completed Sep 9, 2023
@maxanes
Copy link
Author

maxanes commented Sep 11, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants