Issue metadata
Sign in to add a comment
|
950.8% regression in media.tough_video_cases_extra at 451666:451706 |
||||||||||||||||||||||
Issue description
,
Feb 23 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8986826612443103488
,
Feb 24 2017
=== BISECT JOB RESULTS === NO Perf regression found Bisect Details Configuration: win_perf_bisect Benchmark : media.tough_video_cases_extra Metric : seek/video.html?src_tulip2.ogg_type_audio Revision Result N chromium@451665 1.06429 +- 0.131203 21 good chromium@451706 4.71214 +- 41.3564 21 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8986826612443103488 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=5341844243742720 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Feb 24 2017
It would be really helpful to be able to see the raw data in the logs. It was extremely difficult for me to find this (I wrote this doc with my steps https://docs.google.com/document/d/1cTB1ltAyoMsiqBYogOiyqQbhu5YBReSPMg9TThEBRy0/edit): https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Ftryserver.chromium.perf%2Fwin_perf_bisect%2F7157%2F%2B%2Frecipes%2Fsteps%2FRe-testing_reference_range%2F0%2Fsteps%2FCompare_samples__4_%2F0%2Flogs%2Fjson.output%2F0 https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Ftryserver.chromium.perf%2Fwin_perf_bisect%2F7157%2F%2B%2Frecipes%2Fsteps%2FCompare_samples%2F0%2Flogs%2Fjson.output%2F0 In case the links stop working: { "result": { "U": 88.5, "p": 0.6784703000834179, "significance": "NEED_MORE_DATA" }, "sampleA": [ 1.095, 1.035, 1.025, 1.08, 1.065, 1.14, 1.065, 1.075, 1.095, 1.08, 1.04, 1.07, 1.1, 1.045 ], "sampleB": [ 16.49, 1.02, 1.04, 0.99, 1.07, 1.04, 15.235, 1.045, 1.095, 10.085, 1.105, 1.04, 1.05, 1.03 ] } Clearly this is a regression. But bisect won't continue testing against it because it gives up because the regression is caused by a consistently high rate of outliers. hubbe@'s suggested solution to this is to allow bisection of standard deviations instead of just bisection by mean. Another solution would be to allow an advanced user to turn off the significance checking of bisect and just go with a straight up mean comparison. Then if the user had links to the actual data then the user could decide what happened instead of depending on the algorithm.
,
Feb 24 2017
Are we sure the code is doing the significance testing correctly? If you use http://www.socscistatistics.com/tests/studentttest/Default2.aspx on the data above, you get a significant difference and a p-value of .040646
,
Feb 24 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8986813444316321312
,
Feb 24 2017
I started another bisect job: https://chromeperf.appspot.com/buildbucket_job_status/8986813444316321312 I told it to bisect from 451675 to 451696. This will get use much closer to the solution.
,
Feb 24 2017
=== BISECT JOB RESULTS === NO Perf regression found Bisect Details Configuration: win_perf_bisect Benchmark : media.tough_video_cases_extra Metric : seek/video.html?src_tulip2.ogg_type_audio Revision Result N chromium@451675 1.04405 +- 0.121782 21 good chromium@451696 5.5981 +- 50.7584 21 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8986813444316321312 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=6053508110876672 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Feb 24 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8986805323774654704
,
Feb 24 2017
=== BISECT JOB RESULTS === NO Perf regression found Bisect Details Configuration: win_perf_bisect Benchmark : media.tough_video_cases_extra Metric : seek/video.html?src_tulip2.ogg_type_audio Revision Result N chromium@451682 1.08857 +- 0.18002 21 good chromium@451689 4.34286 +- 30.9647 21 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8986805323774654704 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=6138512761421824 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Feb 24 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8986799131270046672
,
Feb 24 2017
=== BISECT JOB RESULTS === Perf regression found but unable to continue Bisect was stopped because a commit couldn't be classified as either good or bad. Bisect Details Configuration: win_perf_bisect Benchmark : media.tough_video_cases_extra Metric : seek/video.html?src_tulip2.ogg_type_audio To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8986799131270046672 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=5258015743148032 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Feb 24 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8986765245639240864
,
Feb 24 2017
The last bisect try had a failure to build: https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Ftryserver.chromium.perf%2Fwin_perf_bisect%2F7160%2F%2B%2Frecipes%2Fsteps%2FFailure_reason%2F0%2Flogs%2Freason%2F0# https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Ftryserver.chromium.perf%2Fwin_perf_bisect%2F7160%2F%2B%2Frecipes%2Fsteps%2FFailure_reason%2F0%2Flogs%2Freason%2F0# Bisect cannot identify a culprit: Testing the "good" revision failed: Failed to compile revision chromium@451685. Buildbucket job id 8986798035781998192 I was testing against 451685 to 451687 451682 good 451685 doesn't build 451687 unknown 451689 bad I already started another one.
,
Feb 24 2017
,
Feb 24 2017
+dtu re: #c4/5 It's unfortunate that the raw data isn't easier to access, perhaps I can surface it on the top level steps as some sort of link like "Bisecting revision <foo>: raw data". I'll file an issue to see what I can do there. Some notes on your doc, I don't think you *have* to go to the cached logs, you should be able to just access the logdog links from the build page. I think they persist for a lot longer than the old stdio links used to. Btw, if you're finding you're spending a lot of time digging through logs, feel free to ping me over chat or via email and maybe I can save you time in the future :) I think we ran into this issue on this benchmark before, the MWU test isn't very good for these types of regressions. Since it's basically only a few outliers, the test can't get the confidence it needs to continue. Pinpoint will support these kind of use cases, where one might interactively guide or change the comparison but making you wait until that's ready isn't ideal. Maybe there's something we can do in the short-term to at least make this work for this test. Dave mentioned in crbug.com/683184 that letting it run for more samples might help, since this test only takes 23 seconds, I can land something on the staging server to let it run way more samples.
,
Feb 24 2017
=== BISECT JOB RESULTS === NO Perf regression found Bisect Details Configuration: win_perf_bisect Benchmark : media.tough_video_cases_extra Metric : seek/tulip2.ogg_seek_warm Revision Result N chromium@451683 4.86595 +- 39.5491 21 good chromium@451688 4.97881 +- 37.9918 21 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests media.tough_video_cases_extra Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8986765245639240864 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=5057481572614144 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Feb 24 2017
The culprit for the occasional long seeks is 451683 https://chromium.googlesource.com/chromium/src/+/6d0ecacfb66907f1fa666551055c3a748b4817cd
,
Feb 24 2017
I'm out for the rest of the day. Hubbe, could you please find someone to own this bug? Flakiness is bad...
,
Feb 24 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8986746935168020720
,
Feb 24 2017
Triage points to an V8 roll, assigning to hablish@ for further analysis.
,
Feb 24 2017
=== BISECT JOB RESULTS === NO Perf regression found Bisect Details Configuration: staging_win_perf_bisect Benchmark : media.tough_video_cases_extra Metric : seek/video.html?src_tulip2.ogg_type_audio Revision Result N chromium@451665 1.04071 +- 0.114953 21 good chromium@451706 4.57714 +- 36.5892 21 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8986746935168020720 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=5780000701153280 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Feb 27 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8986480342839257296
,
Feb 27 2017
Summary: bisect doesn't work for this because it always just thinks things are outliers but it is really consistent. see comment 4 so I started a bunch of bisect jobs so that I could manually bisect for chromium@451682, we get 1.08857 +- 0.18002 (from comment 10) for chromium@451683, we get 4.86595 +- 39.5491 (from comment 17) so both the mean and the standard deviation have both gone up so I'm pretty sure about this one. I just started one more bisect from 451682 to 451683 to double check. The only change in the V8 roll is https://chromium.googlesource.com/v8/v8/+/2b9840d86f3b51f42918b6bf1a06ff8b62b2464d
,
Feb 27 2017
This is mysterious. Why should a change to SAB suddenly create regressions? Binji/littledan do you have any insights? Given that the 58er branch is going to be cut this week I think this should be reverted . Unfortunately, the revert does not apply cleanly.
,
Feb 27 2017
Yeah, I don't see why this CL could affect the given test, but I can revert anyway.
,
Feb 27 2017
It's pretty suspicious that 451682 is a //media change related to a h264 parser: https://codereview.chromium.org/2702973002. Any chance we're getting some flakiness in the bisect?
,
Feb 27 2017
I did schedule the bisect to run one more time to be certain (and we're waiting on the results for that), but I trust the logic in comment 24. Also, H264 should not have anything to do with this: H264 is a video codec, and this is a regression in audio, not video.
,
Feb 27 2017
But yes, if there is random variation governing when the bisect has outliers, then the result could be false and the culprit could be 451682.
,
Feb 27 2017
Ah, good point. I was thrown off by "media.tough_video_cases_extra" and missed the "seek/video.html?src_tulip2.ogg_type_audio" part. :-) Logic seems reasonable in #24, just thought it seemed strange that a change to SharedArrayBuffers (in v8, behind a flag) would have anything to do with a media regression. This change seemed closer in spirit (and the commit position is very close too...) Still working on the v8 revert, there were a number of changes on top of it, so it's a bit tricky to untangle.
,
Feb 27 2017
Sorry you're having to do a lot of work for this... Hopefully the bisect comes back soon to confirm things.
,
Feb 27 2017
Landed revert in v8: https://codereview.chromium.org/2715223003 Don't have a v8 roll yet, but the auto-roller should kick in at some point.
,
Feb 27 2017
=== BISECT JOB RESULTS === NO Perf regression found Bisect Details Configuration: win_perf_bisect Benchmark : media.tough_video_cases_extra Metric : seek/video.html?src_tulip2.ogg_type_audio Revision Result N chromium@451682 1.06595 +- 0.126811 21 good chromium@451683 4.77238 +- 35.1662 21 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8986480342839257296 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=5788413199908864 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Feb 28 2017
Thanks for landing the revert. As comment 33 shows, we are sure that 451683 is the culprit.
,
Feb 28 2017
Thanks binji. Revert rolling in right now: https://codereview.chromium.org/2722463005/
,
Feb 28 2017
The next V8 roll is in progress at https://codereview.chromium.org/2722463005/ . When this lands, it would be interesting to know whether the regression has been addressed. (Sorry, but I won't be able to check the graphs myself.)
,
Feb 28 2017
The V8 roll is chromium 453568. The regression was still present in 453556, but it is fixed in chromium 453609: https://chromeperf.appspot.com/group_report?bug_id=695653 This means that we're fairly sure that the V8 roll was the fix. I'm marking this fixed. Please file a new bug and link to this one for further work in figuring out what went wrong.
,
Feb 28 2017
Yes, looks like it was the roll. I'm pretty surprised, but that's why we measure instead of guess. Thanks!
,
Mar 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/v8/v8.git/+/5acd2a207fb9a4c432e3868be77cd870eed7e169 commit 5acd2a207fb9a4c432e3868be77cd870eed7e169 Author: Michael Hablich <hablich@chromium.org> Date: Thu Mar 02 12:54:02 2017 Merged: This is a speculative chain of reverts to improve a Chrome perf regression. See crbug.co ... Revision: 5a04f4fd68d1d35d704cdc0dee0719c5354a8094 BUG= chromium:695653 LOG=N NOTRY=true NOPRESUBMIT=true NOTREECHECKS=true TBR=binji@chromium.org Review-Url: https://codereview.chromium.org/2725873004 . Cr-Commit-Position: refs/branch-heads/5.8@{#7} Cr-Branched-From: eda659cc5e307f20ac1ad542ba12ab32eaf4c7ef-refs/heads/5.8.283@{#1} Cr-Branched-From: 4310cd02d2160b1457baed81a2f40063eb264a21-refs/heads/master@{#43429} [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/BUILD.gn [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/bootstrapper.cc [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/bootstrapper.h [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/builtins/builtins-sharedarraybuffer.cc [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/builtins/builtins.h [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/debug/mirrors.js [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/heap/heap.cc [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/heap/heap.h [add] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/js/harmony-atomics.js [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/js/prologue.js [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/runtime/runtime-atomics.cc [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/runtime/runtime-futex.cc [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/runtime/runtime.h [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/snapshot/natives-common.cc [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/snapshot/natives-external.cc [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/snapshot/natives.h [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/v8.gyp [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/test/cctest/heap/test-spaces.cc [modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/test/mjsunit/harmony/futex.js [add] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/test/mjsunit/minmax-simple.js
,
Mar 2 2017
Filed a new bug to address the performance issues and reland in https://bugs.chromium.org/p/v8/issues/detail?id=6033. I was mistaken; I can view this performance graph. It looks like the performance regressed again, in a V8 roll that included the related patch https://codereview.chromium.org/2697013009 .
,
Apr 11 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8982652053865000784
,
Apr 11 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8982632394884768336
,
Apr 11 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8982619786831825552
,
Apr 11 2017
=== BISECT JOB RESULTS === NO Perf regression found Bisect Details Configuration: win_perf_bisect Benchmark : media.tough_video_cases_extra Metric : seek/tulip2.ogg_seek_warm Revision Result N chromium@451683 5.10738 +- 41.4188 21 good chromium@451688 4.28262 +- 32.8693 21 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests media.tough_video_cases_extra Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8982652053865000784 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=5057481572614144 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Apr 11 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8982592499577142224
,
Apr 11 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8982584698436322576
,
Apr 11 2017
=== BISECT JOB RESULTS === NO Perf regression found Bisect Details Configuration: win_perf_bisect Benchmark : media.tough_video_cases_extra Metric : seek/video.html?src_tulip2.ogg_type_audio Revision Result N chromium@451685 6.5919 +- 57.4585 21 good chromium@451687 4.8069 +- 35.9 21 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8982632394884768336 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=5258015743148032 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you! |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by hubbe@google.com
, Feb 23 2017