New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 742454 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Long OOO (go/where-is-mgiuca)
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

git hyper-blame: many lines are blamed on apparently-unrelated commit

Project Member Reported by avayvod@chromium.org, Jul 13 2017

Issue description

I'm trying to find the author of https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/html/HTMLMediaElement.cpp?rcl=ae67f19c066747ebc8f5cfe2e705b8169b508c21&l=603.

git blame points to Blink reformat so I used git hyper-blame.

hyper-blame however points to https://codereview.chromium.org/1850903002 which however didn't change the lines or even the method I'm looking at.

Using "View in git blame" in Code Search and clicking numerous times through "blame" and "blame^" I got to https://chromium.googlesource.com/chromium/src/+/4563c7c2c70ab5f09a017cad834d995485f75ec0 instead.
 

Comment 1 by mgiuca@chromium.org, Jul 14 2017

Status: Assigned (was: Untriaged)
Summary: git hyper-blame: many lines are blamed on apparently-unrelated commit (was: git hyper-blame doesn't work)
Interesting. I can reproduce and will investigate.

In the future, it would be helpful if you describe the issue in the title (instead of just "it doesn't work") and provide the exact command-line you are using as well as the actual and expected behaviour.

The command line is:

$ git hyper-blame ae67f19c -- third_party/WebKit/Source/core/html/HTMLMediaElement.cpp 

git blame shows this for line 603:
1c4d759e44259 third_party/WebKit/Source/core/html/HTMLMediaElement.cpp    (Blink Reformat            2017-04-09 09:34:54 -0700  603)   InvokeLoadAlgorithm();

git hyper-blame shows:
9e93489f third_party/WebKit/Source/core/html/HTMLMediaElement.cpp    (liberato                    2016-05-03 12:00:27 -0700  603*)   InvokeLoadAlgorithm();

9e93489f (r391318) is also blamed for many of the surrounding lines in HTMLMediaElement::DidMoveToNewDocument, despite the fact that it only touches 5 lines in this file, in HTMLMediaElement::HTMLMediaElement.

There are 3 ignored commits being skipped in this file:
- r463139 (1c4d759e; The Blink Rename)
- r423409 (ab33ca94; reflow comments in core/html/*.{cpp,h},core/html/imports)
- r422256 (1c8e1a77; Reformat blink)

You can see if we just ignore the first two of those, everything is still fine:
$ git hyper-blame --no-default-ignores -i 1c4d759e -i ab33ca94 ae67f19c -- third_party/WebKit/Source/core/html/HTMLMediaElement.cpp 

All lines from 590 to 605 are blamed on r422256 (Reformat blink).

The problem happens when we also ignore r422256:
$ git hyper-blame --no-default-ignores -i 1c4d759e -i ab33ca94 -i 1c8e1a77 ae67f19c -- third_party/WebKit/Source/core/html/HTMLMediaElement.cpp 

All lines from 590 to 605 are blamed on r391318. Every line that was previously blamed on r422256 has switched to r391318.

Let's simplify and look at hyper-blame output from just after the Blink reformat:

$ git hyper-blame --no-default-ignores -i 1c8e1a77 1c8e1a77 -- third_party/WebKit/Source/core/html/HTMLMediaElement.cpp

(Today's line 605 is line 533 at that old date.)

Well well... all lines from 458 to 545 are blamed on r391318. That's because every single one of those lines was touched by the Blink reformat. I think hyper-blame just looks at them all in one big chunk and says "OK, what commit touched these lines previously? ... r391318" so it assigns blame to all of those lines to that revision, even though that revision only touched the beginning of that chunk.

This may require a rethink of the algorithm, or it may just be an edge case we're happy to live with.

Comment 2 by mgiuca@chromium.org, Jul 14 2017

Investigating the behaviour of this specific (problematic) invocation:

$ git hyper-blame --no-default-ignores -i 1c8e1a77 1c8e1a77 -- third_party/WebKit/Source/core/html/HTMLMediaElement.cpp

Internally, this comes down to an approx_lineno_across_revs (on line 533) grossly miscalculating the line motion.

This calls the following diff (note that it's reversed! the - lines are before the Blink reformat and the + lines are after):

git diff -U0 1c8e1a77 1c8e1a77^ -- third_party/WebKit/Source/core/html/HTMLMediaElement.cpp

The hunk of interest:

@@ -379,167 +376,80 @@ URLRegistry* HTMLMediaElement::s_mediaStreamRegistry = 0;

That is: "After the Blink reformat, this 167-line hunk started on line 379. Before the reformat, the corresponding hunk started on line 376 and was only 80 lines." git diff thinks that all the lines from 464 to 545 were written fresh in 1c8e1a77.

Basically this is a very bad diff; since almost all the lines in the file have changed, it's not matching anything logically, it's just finding the odd stray brace that's the same before and after.

So what approx_lineno_across_revs does is, it takes line 533 in the "after" hunk (line 154 of the 167-line hunk that starts on line 379) and tries to find the corresponding line in the "before" hunk. Because lines 464 to 545 were added by 1c8e1a77, it assumes that all of those lines are an extension of the last line of the old hunk, line 455 in 1c8e1a77^. Line 455 in 1c8e1a77^ was last touched by ... r391318!

VERY CURIOUS: if you run the diff in reverse (i.e., in the correct order), you get @@ -376,162 +379,167 @@, which ACTUALLY GIVES THE RIGHT ANSWER HERE. I didn't realise that git diff can give totally different results if you run it backwards (I assumed it would just swap the - and + lines). I'm not sure what the reason is why hyper-blame runs git diff backwards. We could swap it around to fix this case, but I'm not sure if that would just push the problem around (and create new bad cases elsewhere).

Basically, hyper-blame unfortunately is necessarily an approximation. This is a pathological case because git diff has essentially given up making any sense so hyper-blame suffers. But it's kind of an important case because the Blink reformat CL touched every file in WebKit and we want to show correct results for it.

Comment 3 by mgiuca@chromium.org, Jul 14 2017

If we swap it around to use the forward-diff instead of the reverse diff, we get a much more accurate blame on that line. Line 533 after the rename is blamed on Line 530 before the rename (which is off by 6 lines, due to a 6-line growth earlier in the hunk). So we still get the wrong blame:

b919190c third_party/WebKit/Source/core/html/HTMLMediaElement.cpp    (scherkus@chromium.org       2014-02-26 22:21:08 +0000  603*)   InvokeLoadAlgorithm();

But it's close. If you look around near there, you can find lines blamed on 4563c7c2 which is the correct blame for Line 603. And this could then be further improved by actually doing string matching.

I might make this change then, but I don't know how to verify that this is an across-the-board improvement (rather than just fixing this case at the expense of others). I really don't know why diff isn't properly reversible...

Sign in to add a comment