Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalizing 'collapse_mutationless_edges' and small fitchcount bugfix #181

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 16 additions & 10 deletions cassiopeia/data/CassiopeiaTree.py
Original file line number Diff line number Diff line change
Expand Up @@ -1646,22 +1646,28 @@ def collapse_unifurcations(self, source: Optional[int] = None) -> None:
self.__cache = {}

def collapse_mutationless_edges(
self, infer_ancestral_characters: bool
self,
infer_ancestral_characters: bool,
distance_function: Callable[[List[int], List[int]], float] = lambda x, y: 0 if x == y else 1
) -> None:
"""Collapses mutationless edges in the tree in-place.

Uses the internal node annotations of a tree to collapse edges with no
mutations. The introduction of a missing data event is considered a
mutation in this context. Either takes the existing character states on
the tree or infers the annotations bottom-up from the samples obeying
An edge (u, v) is considered "mutationless" if the distance between the
character states of u and v is 0. This distance is defined by the
`distance_function`. Either uses the existing character states on the
tree or infers the annotations bottom-up from the samples obeying
Camin-Sokal Parsimony. Preserves the times of nodes that are not removed
by connecting the parent and children of removed nodes by branches with
lengths equal to the total time elapsed from parent to each child.
lengths equal to the total time elapsed from parent to each child. Only
collapses internal edges and will not remove leaf edges.

Args:
tree: A networkx DiGraph object representing the tree
infer_ancestral_characters: Whether to infer the ancestral characters
infer_ancestral_characters: Whether to infer the ancestral character
states of the tree
distance_function: The function defining the distance between the
two sets of character states. Defaults to considering the
distance to be 0 only if the character states exactly match

Raises:
CassiopeiaTreeError if the tree has not been initialized or if
Expand Down Expand Up @@ -1693,9 +1699,9 @@ def collapse_mutationless_edges(
for child in self.children(n):
if not self.is_leaf(child):
t = self.get_branch_length(n, child)
if self.get_character_states(
n
) == self.get_character_states(child):
n_states = self.get_character_states(n)
child_states = self.get_character_states(child)
if distance_function(n_states, child_states) == 0:
for grandchild in self.children(child):
t_ = self.get_branch_length(child, grandchild)
self.__add_edge(n, grandchild)
Expand Down
4 changes: 2 additions & 2 deletions cassiopeia/tools/small_parsimony.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ def fitch_count(
infer_ancestral_states: bool = True,
state_key: str = "S1",
unique_states: Optional[List[str]] = None,
):
) -> pd.DataFrame:
"""Runs the FitchCount algorithm.

Performs the FitchCount algorithm for inferring the number of times that
Expand Down Expand Up @@ -308,7 +308,7 @@ def fitch_count(
" of states that appear in the meta data."
)

if root != cassiopeia_tree.root:
if root is not None and root != cassiopeia_tree.root:
cassiopeia_tree.subset_clade(root)

if infer_ancestral_states:
Expand Down