Rework how blame is passed to parents #1751

cruessler · 2025-01-09T15:29:57Z

Before this PR, blame would have been passed to the first parent in some of the cases where there was more than one.

This PR changes that by only removing a particular suspect from further consideration once it has been compared to all of its parents. For hunks where blame cannot be passed to any parent, we can safely assume that they were introduced by a particular suspect, so we remove those hunks from hunks_to_blame and create a BlameEntry out of them.

We can illustrate the change using the following small example history:

---1----2
    \    \
     3----4---

Let’s now say that we’re blaming a file that has the following content:

line 1 # introduced by (1), then changed by (3)
line 2 # introduced by (1)
line 3 # introduced by (1), then changed by (2)

The resulting blame should look like this:

(3) line 1
(1) line 2
(2) line 3

The previous version of the algorithm would have passed blame to just (2) or (3), depending on which one came first in the list of parents.

I discovered this discrepancy while working on creating a test for #1743.

Performance-wise there doesn’t seem to be a significant difference to main. Where there is, I think it is due to the implementation now being more correct. I haven’t dug really deep into benchmarking the changes, though.

It is possible that there’s still cases that this version of the algorithm does not get right, but I think it is a step in the right direction. Let me know what you think!

Open question

Do we prefer new_unblamed_hunk(…) or UnblamedHunk { … } in tests?

Related to that: there’s a couple of comments in tests that mention range_in_destination although, at this point, this is just the name of a local variable in new_unblamed_hunk (it used to be the name of one of UnblamedHunk’s fields).

I can certainly make this more consistent, but am slightly undecided as to whether it is worth the effort.

Before this commit, blame would have been passed to the first parent in cases where there was more than one. This commit changes that by only removing a particular suspect from further consideration once it has been compared to all of its parents. For hunks where blame cannot be passed to any parent, we can safely assume that they were introduced by a particular suspect, so we remove those hunks from `hunks_to_blame` and create a `BlameEntry` out of them. We can illustrate the change using the following small example history: ```text ---1----2 \ \ 3----4--- ``` Let’s now say that we’re blaming a file that has the following content: ```text line 1 # introduced by (1), then changed by (3) line 2 # introduced by (1) line 3 # introduced by (1), then changed by (2) ``` The resulting blame should look like this: ```text (3) line 1 (1) line 2 (2) line 3 ``` The previous version of the algorithm would have passed blame to just (2) or (3), depending on which one came first in the list of parents.

Byron

It is possible that there’s still cases that this version of the algorithm does not get right, but I think it is a step in the right direction.

I absolutely think so too! This is a great finding and I'd hope that this maybe already fixes blame for all multi-parent cases. And I say that clearly as someone who didn't fully understand what's going on with some parts of the algorithm, something I'd hope to correct at some point.

Do we prefer new_unblamed_hunk(…) or UnblamedHunk { … } in tests?
Related to that: there’s a couple of comments in tests that mention range_in_destination although, at this point, this is just the name of a local variable in new_unblamed_hunk (it used to be the name of one of UnblamedHunk’s fields).
I can certainly make this more consistent, but am slightly undecided as to whether it is worth the effort.

I don't think I have a preference, but vaguely remember having introduced new_unblamed_hunk() for … some reasons I don't entirely recall.

Regarding everything else, I definitely didn't setup my review to the point where I can say that I follow each and every change, but assume it must be right as you are the expert. What I could do is to remove one TODO, which goes along with the general suggestion to try and avoid unnecessary collect() calls.

Byron · 2025-01-10T07:26:36Z

gix-blame/src/file/tests.rs

-        }];
-        let hunks_to_blame = vec![new_unblamed_hunk(1..3, suspect_2, Offset::Added(1))];
+        let suspect = ObjectId::from_hex(b"0000000000000000000000000000000000000000").unwrap();
+        let parent = ObjectId::from_str("1111111111111111111111111111111111111111").unwrap();


Here I thought that a constant would be much less noisy and not harder to read, but I leave it to your preference and maybe a next PR can adjust this if you agree.

Byron · 2025-01-10T07:28:20Z

Argh, I had couple of fixes that I forgot to push, have to adjust this in a follow-up.

cruessler and others added 2 commits January 9, 2025 16:16

Follow clippy suggestions

cbf7f51

Byron approved these changes Jan 10, 2025

View reviewed changes

Byron merged commit dceb47d into GitoxideLabs:main Jan 10, 2025
20 checks passed

Byron mentioned this pull request Jan 10, 2025

wip changes against more than one parent #1753

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework how blame is passed to parents #1751

Rework how blame is passed to parents #1751

cruessler commented Jan 9, 2025

Byron left a comment

Byron Jan 10, 2025

Byron commented Jan 10, 2025

Rework how blame is passed to parents #1751

Rework how blame is passed to parents #1751

Conversation

cruessler commented Jan 9, 2025

Open question

Byron left a comment

Choose a reason for hiding this comment

Byron Jan 10, 2025

Choose a reason for hiding this comment

Byron commented Jan 10, 2025