From 4e4bae4cc7a2889b079688170c777b1ff5401e39 Mon Sep 17 00:00:00 2001 From: Yossi Farjoun Date: Mon, 7 Oct 2019 14:02:36 -0400 Subject: [PATCH] added example --- VCFv4.4.tex | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/VCFv4.4.tex b/VCFv4.4.tex index 675b7cf7..d263a617 100644 --- a/VCFv4.4.tex +++ b/VCFv4.4.tex @@ -582,7 +582,7 @@ \subsubsection{Genotype fields} \end{itemize} \item HQ (Integer): Haplotype qualities, two comma separated phred qualities. - \item LAA + \item LAA is a sorted list of $n$ distinct integers, where $1 \le n \le \left|\mathrm{ALT}\right|$, giving the (1-based) indices within ALT of the alleles that are observed in the sample. In callsets with many samples, sites may grow to include numerous alternate alleles at the same POS. Usually, few of these alleles are actually observed in any one sample, but each genotype must supply fields like PL and AD for all of the alleles---a very inefficient representation as PL's size is quadratic in the allele count. Similarly, in rare sites, which can be the bulk of the sites, the vast majority of the samples are reference. @@ -611,9 +611,11 @@ \subsubsection{Genotype fields} 4&G&A,T,\textless*\textgreater& LAA:LGT:LAD:LPL& :0/0:30:0\\ 4&G&A,T,\textless*\textgreater& GT:AD:PL& 0/0:30,.,..:0,.,.,.,.,.,.,.,.,.,.,.,.,.,.\\ \end{tabular} - \item LAD: See LAA - \item LGT: See LAA - \item LPL: See LAA + \item LAD: is a list of $n+1$ integers giving read depths (as per AD) for the REF allele and each of the local alleles as listed in LAA. + \item LGT: is the genotype, encoded as allele indexes separated by either of $/$ or $\mid$, as with GT, however, the indexes are into the list consisting of REF and the ALTs referenced by LAA. + So that in the case that LAA is 2,3, LGT=0/2 is equivalent to GT=0/3 and LGT=1/2 is equivalent to GT=2/3 (see example above). + \item LPL: is a list of $n+1 \choose \mathrm{Ploidy}$ integers giving phred-scaled genotype likelihoods (rounded to the closest integer; as per PL) for all possible genotypes given the set of alleles defined in the REF and LAA local alleles. + The precise ordering is defined in the GL paragraph. \item MQ (Integer): RMS mapping quality, similar to the version in the INFO field. \item PL (Integer): The phred-scaled genotype likelihoods rounded to the closest integer, and otherwise defined in the same way as the GL field. \item PP (Integer): The phred-scaled genotype posterior probabilities rounded to the closest integer, and otherwise defined in the same way as the GP field.