v0.6.0
alevin-fry 0.6.0 release notes
The majority of changes in this release are new features. There are also changes to default behavior.
New features
-
Starting with
alevin-fry
v0.6.0, it is now possible to use both theparsimony
andparsimony-em
/full
resoluton approaches in USA mode. While this support has been tested, it will remain marked as experimental for at least one release. -
This release introduces 2 new resolution modes (both work with and without USA mode) named as
parsimony-gene
andparsimony-gene-em
. They work as follows:-
parsimony-gene
is analogous to theparsimony
resolution. That is, it builds a parsimonious UMI graph (PUG) which is then resolved to find the most parsimonious cover of observed UMIs. The main difference betweenparsimony-gene
andparsimony
is thatparsimony
finds a transcript-level cover, and then projects each transcript to its parent gene when aggregating counts. However,parsimony-gene
first projects the equivalence class labels in the PUG to the gene level, and then resolves the PUG components. This means that a given gene can be considered to cover related UMIs, even if the transcript sets descibing these UMIs is disjoint. This can potentially be less specific than the behavior ofparsimony
. On the other hand, this mode is expected to be more robust to incomplete annotation. For example, elements like unannotated UTRs may mean that UMIs otherwise inferred to come from distinct molecules may actually arrive from the same initial pre-PCR molecule. Theparsimony-gene
method is more likely to resolve this situation correctly. -
parsimony-gene-em
is just like theparsimony-gene
method described above, except that instead of discarding multi-gene UMIs, those UMIs will be probabilistically resolved using an EM algorithm.
-
-
A new (hidden) option called
--umi-edit-dist
has been added. This option takes an integer argument, and it defines the Hamming distance at which a pair of potentially duplicate UMIs are considered to be collapsable. While this general argument has been added as a forward looking feature, the support for non-standard distances among existing resolution methods is limited. Currently,parsimony
,parsimony-em
,parsimony-gene
andparsimony-gene-em
may use a distance of either 0 or 1 (the default is 1), whilecr-like
,cr-like-em
, andtrivial
can only use 0 (the default is 0). If a user attempts to provide an incompatible Hamming distance, resolution mode pair,alevin-fry
will report the error and exit. -
A new (hidden) option called
--large-graph-thresh
has been added. This option is only relevant in theparsimony
,parsimony-em
,parsimony-gene
andparsimony-gene-em
resolution modes (where it defaults to 1,000). This option takes an integer argument, and determines the order (number of vertices) of the parsimonious UMI graph (PUG) beyond which an alternative, simplified algorithm will be applied to resolve UMIs. For graphs having this many nodes or fewer, the parsimony resolution algorithm will be used, for graphs having more than this many nodes, a simpler heuristic will be used. The default value corresponds to the previous, hard-coded, value used in prior versions ofalevin-fry
.
Changes to default behavior
- Starting with
alevin-fry
v0.6.0,mtx
is the default output format. That is, even without the--use-mtx
flag, the output will be written inmtx
format. This is a breaking change with respect to prior versions. Additionally a--use-eds
flag has been added. If you want the output ineds
format (the prior default format you would get if you ranquant
orinfer
without the--use-mtx
flag), then pass the--use-eds
flag.
Other changes
- Starting with
alevin-fry
v0.6.0, the output ofparsimony
andparsimony-em
(and the newparsimony-gene
andparsimony-gene-em
) resolution modes is deterministic (sans the order of the reported cell barcodes). In previous versions, the use of a hash state that wasn't explicitly seeded lead to the default Rust behavior of pseudo-randomly seeding this state per-run. This lead to different ordering of evaluating the parsimonious covering, which, in turn, can lead to small differences in the count matrices returned. In v0.6.0, all such hash states are explicitly seeded, leading to deterministic PUG resolution. In order to avoid bias toward the same resolution across cells based on hash order, the hash is seeded for each cell based on a deterministic pattern in addition to the barcode ID of this cell.
Full Changelog: v0.5.1...v0.6.0