Skip to content

v0.6.0

Compare
Choose a tag to compare
@github-actions github-actions released this 01 Jun 04:38
45d3d6d

alevin-fry 0.6.0 release notes

The majority of changes in this release are new features. There are also changes to default behavior.

New features

  • Starting with alevin-fry v0.6.0, it is now possible to use both the parsimony and parsimony-em/full resoluton approaches in USA mode. While this support has been tested, it will remain marked as experimental for at least one release.

  • This release introduces 2 new resolution modes (both work with and without USA mode) named as parsimony-gene and parsimony-gene-em. They work as follows:

    • parsimony-gene is analogous to the parsimony resolution. That is, it builds a parsimonious UMI graph (PUG) which is then resolved to find the most parsimonious cover of observed UMIs. The main difference between parsimony-gene and parsimony is that parsimony finds a transcript-level cover, and then projects each transcript to its parent gene when aggregating counts. However, parsimony-gene first projects the equivalence class labels in the PUG to the gene level, and then resolves the PUG components. This means that a given gene can be considered to cover related UMIs, even if the transcript sets descibing these UMIs is disjoint. This can potentially be less specific than the behavior of parsimony. On the other hand, this mode is expected to be more robust to incomplete annotation. For example, elements like unannotated UTRs may mean that UMIs otherwise inferred to come from distinct molecules may actually arrive from the same initial pre-PCR molecule. The parsimony-gene method is more likely to resolve this situation correctly.

    • parsimony-gene-em is just like the parsimony-gene method described above, except that instead of discarding multi-gene UMIs, those UMIs will be probabilistically resolved using an EM algorithm.

  • A new (hidden) option called --umi-edit-dist has been added. This option takes an integer argument, and it defines the Hamming distance at which a pair of potentially duplicate UMIs are considered to be collapsable. While this general argument has been added as a forward looking feature, the support for non-standard distances among existing resolution methods is limited. Currently, parsimony, parsimony-em, parsimony-gene and parsimony-gene-em may use a distance of either 0 or 1 (the default is 1), while cr-like, cr-like-em, and trivial can only use 0 (the default is 0). If a user attempts to provide an incompatible Hamming distance, resolution mode pair, alevin-fry will report the error and exit.

  • A new (hidden) option called --large-graph-thresh has been added. This option is only relevant in the parsimony, parsimony-em, parsimony-gene and parsimony-gene-em resolution modes (where it defaults to 1,000). This option takes an integer argument, and determines the order (number of vertices) of the parsimonious UMI graph (PUG) beyond which an alternative, simplified algorithm will be applied to resolve UMIs. For graphs having this many nodes or fewer, the parsimony resolution algorithm will be used, for graphs having more than this many nodes, a simpler heuristic will be used. The default value corresponds to the previous, hard-coded, value used in prior versions of alevin-fry.

Changes to default behavior

  • Starting with alevin-fry v0.6.0, mtx is the default output format. That is, even without the --use-mtx flag, the output will be written in mtx format. This is a breaking change with respect to prior versions. Additionally a --use-eds flag has been added. If you want the output in eds format (the prior default format you would get if you ran quant or infer without the --use-mtx flag), then pass the --use-eds flag.

Other changes

  • Starting with alevin-fry v0.6.0, the output of parsimony and parsimony-em (and the new parsimony-gene and parsimony-gene-em) resolution modes is deterministic (sans the order of the reported cell barcodes). In previous versions, the use of a hash state that wasn't explicitly seeded lead to the default Rust behavior of pseudo-randomly seeding this state per-run. This lead to different ordering of evaluating the parsimonious covering, which, in turn, can lead to small differences in the count matrices returned. In v0.6.0, all such hash states are explicitly seeded, leading to deterministic PUG resolution. In order to avoid bias toward the same resolution across cells based on hash order, the hash is seeded for each cell based on a deterministic pattern in addition to the barcode ID of this cell.

Full Changelog: v0.5.1...v0.6.0