Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework how blame is passed to parents #1751

Merged

Conversation

cruessler
Copy link
Contributor

Before this PR, blame would have been passed to the first parent in some of the cases where there was more than one.

This PR changes that by only removing a particular suspect from further consideration once it has been compared to all of its parents. For hunks where blame cannot be passed to any parent, we can safely assume that they were introduced by a particular suspect, so we remove those hunks from hunks_to_blame and create a BlameEntry out of them.

We can illustrate the change using the following small example history:

---1----2
    \    \
     3----4---

Let’s now say that we’re blaming a file that has the following content:

line 1 # introduced by (1), then changed by (3)
line 2 # introduced by (1)
line 3 # introduced by (1), then changed by (2)

The resulting blame should look like this:

(3) line 1
(1) line 2
(2) line 3

The previous version of the algorithm would have passed blame to just (2) or (3), depending on which one came first in the list of parents.

I discovered this discrepancy while working on creating a test for #1743.

Performance-wise there doesn’t seem to be a significant difference to main. Where there is, I think it is due to the implementation now being more correct. I haven’t dug really deep into benchmarking the changes, though.

It is possible that there’s still cases that this version of the algorithm does not get right, but I think it is a step in the right direction. Let me know what you think!

Open question

  • Do we prefer new_unblamed_hunk(…) or UnblamedHunk { … } in tests?

    Related to that: there’s a couple of comments in tests that mention range_in_destination although, at this point, this is just the name of a local variable in new_unblamed_hunk (it used to be the name of one of UnblamedHunk’s fields).

    I can certainly make this more consistent, but am slightly undecided as to whether it is worth the effort.

cruessler and others added 2 commits January 9, 2025 16:16
Before this commit, blame would have been passed to the first parent in
cases where there was more than one.

This commit changes that by only removing a particular suspect from
further consideration once it has been compared to all of its parents.
For hunks where blame cannot be passed to any parent, we can safely
assume that they were introduced by a particular suspect, so we remove
those hunks from `hunks_to_blame` and create a `BlameEntry` out of them.

We can illustrate the change using the following small example history:

```text
---1----2
    \    \
     3----4---
```

Let’s now say that we’re blaming a file that has the following content:

```text
line 1 # introduced by (1), then changed by (3)
line 2 # introduced by (1)
line 3 # introduced by (1), then changed by (2)
```

The resulting blame should look like this:

```text
(3) line 1
(1) line 2
(2) line 3
```

The previous version of the algorithm would have passed blame to just
(2) or (3), depending on which one came first in the list of parents.
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that there’s still cases that this version of the algorithm does not get right, but I think it is a step in the right direction.

I absolutely think so too! This is a great finding and I'd hope that this maybe already fixes blame for all multi-parent cases. And I say that clearly as someone who didn't fully understand what's going on with some parts of the algorithm, something I'd hope to correct at some point.

  • Do we prefer new_unblamed_hunk(…) or UnblamedHunk { … } in tests?
    Related to that: there’s a couple of comments in tests that mention range_in_destination although, at this point, this is just the name of a local variable in new_unblamed_hunk (it used to be the name of one of UnblamedHunk’s fields).
    I can certainly make this more consistent, but am slightly undecided as to whether it is worth the effort.

I don't think I have a preference, but vaguely remember having introduced new_unblamed_hunk() for … some reasons I don't entirely recall.

Regarding everything else, I definitely didn't setup my review to the point where I can say that I follow each and every change, but assume it must be right as you are the expert. What I could do is to remove one TODO, which goes along with the general suggestion to try and avoid unnecessary collect() calls.

}];
let hunks_to_blame = vec![new_unblamed_hunk(1..3, suspect_2, Offset::Added(1))];
let suspect = ObjectId::from_hex(b"0000000000000000000000000000000000000000").unwrap();
let parent = ObjectId::from_str("1111111111111111111111111111111111111111").unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I thought that a constant would be much less noisy and not harder to read, but I leave it to your preference and maybe a next PR can adjust this if you agree.

@Byron Byron merged commit dceb47d into GitoxideLabs:main Jan 10, 2025
20 checks passed
@Byron
Copy link
Member

Byron commented Jan 10, 2025

Argh, I had couple of fixes that I forgot to push, have to adjust this in a follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants