git hyper-blame: many lines are blamed on apparently-unrelated commit |
|
Issue descriptionI'm trying to find the author of https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/html/HTMLMediaElement.cpp?rcl=ae67f19c066747ebc8f5cfe2e705b8169b508c21&l=603. git blame points to Blink reformat so I used git hyper-blame. hyper-blame however points to https://codereview.chromium.org/1850903002 which however didn't change the lines or even the method I'm looking at. Using "View in git blame" in Code Search and clicking numerous times through "blame" and "blame^" I got to https://chromium.googlesource.com/chromium/src/+/4563c7c2c70ab5f09a017cad834d995485f75ec0 instead.
,
Jul 14 2017
Investigating the behaviour of this specific (problematic) invocation: $ git hyper-blame --no-default-ignores -i 1c8e1a77 1c8e1a77 -- third_party/WebKit/Source/core/html/HTMLMediaElement.cpp Internally, this comes down to an approx_lineno_across_revs (on line 533) grossly miscalculating the line motion. This calls the following diff (note that it's reversed! the - lines are before the Blink reformat and the + lines are after): git diff -U0 1c8e1a77 1c8e1a77^ -- third_party/WebKit/Source/core/html/HTMLMediaElement.cpp The hunk of interest: @@ -379,167 +376,80 @@ URLRegistry* HTMLMediaElement::s_mediaStreamRegistry = 0; That is: "After the Blink reformat, this 167-line hunk started on line 379. Before the reformat, the corresponding hunk started on line 376 and was only 80 lines." git diff thinks that all the lines from 464 to 545 were written fresh in 1c8e1a77. Basically this is a very bad diff; since almost all the lines in the file have changed, it's not matching anything logically, it's just finding the odd stray brace that's the same before and after. So what approx_lineno_across_revs does is, it takes line 533 in the "after" hunk (line 154 of the 167-line hunk that starts on line 379) and tries to find the corresponding line in the "before" hunk. Because lines 464 to 545 were added by 1c8e1a77, it assumes that all of those lines are an extension of the last line of the old hunk, line 455 in 1c8e1a77^. Line 455 in 1c8e1a77^ was last touched by ... r391318! VERY CURIOUS: if you run the diff in reverse (i.e., in the correct order), you get @@ -376,162 +379,167 @@, which ACTUALLY GIVES THE RIGHT ANSWER HERE. I didn't realise that git diff can give totally different results if you run it backwards (I assumed it would just swap the - and + lines). I'm not sure what the reason is why hyper-blame runs git diff backwards. We could swap it around to fix this case, but I'm not sure if that would just push the problem around (and create new bad cases elsewhere). Basically, hyper-blame unfortunately is necessarily an approximation. This is a pathological case because git diff has essentially given up making any sense so hyper-blame suffers. But it's kind of an important case because the Blink reformat CL touched every file in WebKit and we want to show correct results for it.
,
Jul 14 2017
If we swap it around to use the forward-diff instead of the reverse diff, we get a much more accurate blame on that line. Line 533 after the rename is blamed on Line 530 before the rename (which is off by 6 lines, due to a 6-line growth earlier in the hunk). So we still get the wrong blame: b919190c third_party/WebKit/Source/core/html/HTMLMediaElement.cpp (scherkus@chromium.org 2014-02-26 22:21:08 +0000 603*) InvokeLoadAlgorithm(); But it's close. If you look around near there, you can find lines blamed on 4563c7c2 which is the correct blame for Line 603. And this could then be further improved by actually doing string matching. I might make this change then, but I don't know how to verify that this is an across-the-board improvement (rather than just fixing this case at the expense of others). I really don't know why diff isn't properly reversible... |
|
►
Sign in to add a comment |
|
Comment 1 by mgiuca@chromium.org
, Jul 14 2017Summary: git hyper-blame: many lines are blamed on apparently-unrelated commit (was: git hyper-blame doesn't work)