Skip to content

Commit

Permalink
Fix on-the-fly indexing of VCF w.r.t virtual offsets.
Browse files Browse the repository at this point in the history
When using bcftools view --write-index -o out.vcf.gz the virtual file
offsets can differ depending on whether we do a bgzf_tell before or
after a flush.  Specifically it could point to the last byte in the
current BGZF block or the first byte in the next BGZF block.
Ultimately both of these resolve to the same physical location, but in
some situations the former may mean attempting to read zero bytes (the
remainder of the bgzf block).  This has been known in the past to be
misinterpreted as an EOF.  (See samtools/samtools#1861)

It also means the contents of the index produced by --write-index and
a separate bcftools index command can yield different results, albeit
both representing the same data.

The fix for the samtools / bcftools issue above (#1672)
when multi-threading inadvertently recreated the bug when not
multi-threading.

Fixes samtools/bcftools#2267
  • Loading branch information
jkbonfield committed Sep 12, 2024
1 parent bf11805 commit b66c6d2
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 0 deletions.
4 changes: 4 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,10 @@ Bug fixes
* Fix small OSS-Fuzz reported issues with CRAM encoding and long
CIGARS and/or illegal positions. (PR #1775, PR #1801, PR #1817)

* Fix issues with on-the-fly indexing of VCF/BCF (bcftools --write-index)
when not using multiple threads. (PR #1837. Fixes samtools/bcftools#2267,
reported by Giulio Genovese)

* Stricter limits on POS / MPOS / TLEN in sam_parse1(). This fixes
a signed overflow reported by OSS-Fuzz and should help prevent other
as-yet undetected bugs. (PR #1812)
Expand Down
2 changes: 2 additions & 0 deletions vcf.c
Original file line number Diff line number Diff line change
Expand Up @@ -4238,6 +4238,8 @@ int vcf_write(htsFile *fp, const bcf_hdr_t *h, bcf1_t *v)
if ( fp->format.compression!=no_compression ) {
if (bgzf_flush_try(fp->fp.bgzf, fp->line.l) < 0)
return -1;
if (fp->idx && !fp->fp.bgzf->mt)
hts_idx_amend_last(fp->idx, bgzf_tell(fp->fp.bgzf));
ret = bgzf_write(fp->fp.bgzf, fp->line.s, fp->line.l);
} else {
ret = hwrite(fp->fp.hfile, fp->line.s, fp->line.l);
Expand Down

0 comments on commit b66c6d2

Please sign in to comment.