-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework how blame is passed to parents #1751
Rework how blame is passed to parents #1751
Conversation
Before this commit, blame would have been passed to the first parent in cases where there was more than one. This commit changes that by only removing a particular suspect from further consideration once it has been compared to all of its parents. For hunks where blame cannot be passed to any parent, we can safely assume that they were introduced by a particular suspect, so we remove those hunks from `hunks_to_blame` and create a `BlameEntry` out of them. We can illustrate the change using the following small example history: ```text ---1----2 \ \ 3----4--- ``` Let’s now say that we’re blaming a file that has the following content: ```text line 1 # introduced by (1), then changed by (3) line 2 # introduced by (1) line 3 # introduced by (1), then changed by (2) ``` The resulting blame should look like this: ```text (3) line 1 (1) line 2 (2) line 3 ``` The previous version of the algorithm would have passed blame to just (2) or (3), depending on which one came first in the list of parents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that there’s still cases that this version of the algorithm does not get right, but I think it is a step in the right direction.
I absolutely think so too! This is a great finding and I'd hope that this maybe already fixes blame for all multi-parent cases. And I say that clearly as someone who didn't fully understand what's going on with some parts of the algorithm, something I'd hope to correct at some point.
- Do we prefer
new_unblamed_hunk(…)
orUnblamedHunk { … }
in tests?
Related to that: there’s a couple of comments in tests that mentionrange_in_destination
although, at this point, this is just the name of a local variable innew_unblamed_hunk
(it used to be the name of one ofUnblamedHunk
’s fields).
I can certainly make this more consistent, but am slightly undecided as to whether it is worth the effort.
I don't think I have a preference, but vaguely remember having introduced new_unblamed_hunk()
for … some reasons I don't entirely recall.
Regarding everything else, I definitely didn't setup my review to the point where I can say that I follow each and every change, but assume it must be right as you are the expert. What I could do is to remove one TODO, which goes along with the general suggestion to try and avoid unnecessary collect()
calls.
}]; | ||
let hunks_to_blame = vec![new_unblamed_hunk(1..3, suspect_2, Offset::Added(1))]; | ||
let suspect = ObjectId::from_hex(b"0000000000000000000000000000000000000000").unwrap(); | ||
let parent = ObjectId::from_str("1111111111111111111111111111111111111111").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I thought that a constant would be much less noisy and not harder to read, but I leave it to your preference and maybe a next PR can adjust this if you agree.
Argh, I had couple of fixes that I forgot to push, have to adjust this in a follow-up. |
Before this PR, blame would have been passed to the first parent in some of the cases where there was more than one.
This PR changes that by only removing a particular suspect from further consideration once it has been compared to all of its parents. For hunks where blame cannot be passed to any parent, we can safely assume that they were introduced by a particular suspect, so we remove those hunks from
hunks_to_blame
and create aBlameEntry
out of them.We can illustrate the change using the following small example history:
Let’s now say that we’re blaming a file that has the following content:
The resulting blame should look like this:
The previous version of the algorithm would have passed blame to just (2) or (3), depending on which one came first in the list of parents.
I discovered this discrepancy while working on creating a test for #1743.
Performance-wise there doesn’t seem to be a significant difference to
main
. Where there is, I think it is due to the implementation now being more correct. I haven’t dug really deep into benchmarking the changes, though.It is possible that there’s still cases that this version of the algorithm does not get right, but I think it is a step in the right direction. Let me know what you think!
Open question
Do we prefer
new_unblamed_hunk(…)
orUnblamedHunk { … }
in tests?Related to that: there’s a couple of comments in tests that mention
range_in_destination
although, at this point, this is just the name of a local variable innew_unblamed_hunk
(it used to be the name of one ofUnblamedHunk
’s fields).I can certainly make this more consistent, but am slightly undecided as to whether it is worth the effort.