New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 695653 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Feb 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression

Blocked on: View detail
issue 683184
issue 695884



Sign in to add a comment

950.8% regression in media.tough_video_cases_extra at 451666:451706

Project Member Reported by hubbe@google.com, Feb 23 2017

Issue description

See the link to graphs below.
 

Comment 1 by hubbe@google.com, Feb 23 2017

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=695653

Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?keys=agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDghJWOrAkM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDghNyLoAsM


Bot(s) for this bug's original alert(s):

chromium-rel-win7-dual
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Feb 24 2017


=== BISECT JOB RESULTS ===
NO Perf regression found

Bisect Details
  Configuration: win_perf_bisect
  Benchmark    : media.tough_video_cases_extra
  Metric       : seek/video.html?src_tulip2.ogg_type_audio

Revision             Result                   N
chromium@451665      1.06429 +- 0.131203      21      good
chromium@451706      4.71214 +- 41.3564       21      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8986826612443103488

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5341844243742720


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
Cc: simonhatch@chromium.org crouleau@chromium.org
It would be really helpful to be able to see the raw data in the logs. It was extremely difficult for me to find this (I wrote this doc with my steps https://docs.google.com/document/d/1cTB1ltAyoMsiqBYogOiyqQbhu5YBReSPMg9TThEBRy0/edit): 

https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Ftryserver.chromium.perf%2Fwin_perf_bisect%2F7157%2F%2B%2Frecipes%2Fsteps%2FRe-testing_reference_range%2F0%2Fsteps%2FCompare_samples__4_%2F0%2Flogs%2Fjson.output%2F0

https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Ftryserver.chromium.perf%2Fwin_perf_bisect%2F7157%2F%2B%2Frecipes%2Fsteps%2FCompare_samples%2F0%2Flogs%2Fjson.output%2F0

In case the links stop working: 


{
  "result": {
    "U": 88.5,
    "p": 0.6784703000834179,
    "significance": "NEED_MORE_DATA"
  },
  "sampleA": [
    1.095,
    1.035,
    1.025,
    1.08,
    1.065,
    1.14,
    1.065,
    1.075,
    1.095,
    1.08,
    1.04,
    1.07,
    1.1,
    1.045
  ],
  "sampleB": [
    16.49,
    1.02,
    1.04,
    0.99,
    1.07,
    1.04,
    15.235,
    1.045,
    1.095,
    10.085,
    1.105,
    1.04,
    1.05,
    1.03
  ]
}

Clearly this is a regression. But bisect won't continue testing against it because it gives up because the regression is caused by a consistently high rate of outliers. hubbe@'s suggested solution to this is to allow bisection of standard deviations instead of just bisection by mean. Another solution would be to allow an advanced user to turn off the significance checking of bisect and just go with a straight up mean comparison. Then if the user had links to the actual data then the user could decide what happened instead of depending on the algorithm.
Are we sure the code is doing the significance testing correctly? If you use http://www.socscistatistics.com/tests/studentttest/Default2.aspx on the data above, you get a significant difference and a p-value of .040646
I started another bisect job: https://chromeperf.appspot.com/buildbucket_job_status/8986813444316321312

I told it to bisect from 451675 to 451696. This will get use much closer to the solution.
Project Member

Comment 8 by 42576172...@developer.gserviceaccount.com, Feb 24 2017


=== BISECT JOB RESULTS ===
NO Perf regression found

Bisect Details
  Configuration: win_perf_bisect
  Benchmark    : media.tough_video_cases_extra
  Metric       : seek/video.html?src_tulip2.ogg_type_audio

Revision             Result                   N
chromium@451675      1.04405 +- 0.121782      21      good
chromium@451696      5.5981 +- 50.7584        21      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8986813444316321312

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=6053508110876672


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
Project Member

Comment 10 by 42576172...@developer.gserviceaccount.com, Feb 24 2017


=== BISECT JOB RESULTS ===
NO Perf regression found

Bisect Details
  Configuration: win_perf_bisect
  Benchmark    : media.tough_video_cases_extra
  Metric       : seek/video.html?src_tulip2.ogg_type_audio

Revision             Result                  N
chromium@451682      1.08857 +- 0.18002      21      good
chromium@451689      4.34286 +- 30.9647      21      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8986805323774654704

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=6138512761421824


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
Project Member

Comment 12 by 42576172...@developer.gserviceaccount.com, Feb 24 2017


=== BISECT JOB RESULTS ===
Perf regression found but unable to continue

Bisect was stopped because a commit couldn't be classified as either
good or bad.


Bisect Details
  Configuration: win_perf_bisect
  Benchmark    : media.tough_video_cases_extra
  Metric       : seek/video.html?src_tulip2.ogg_type_audio


To Run This Test
  src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8986799131270046672

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5258015743148032


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
The last bisect try had a failure to build: https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Ftryserver.chromium.perf%2Fwin_perf_bisect%2F7160%2F%2B%2Frecipes%2Fsteps%2FFailure_reason%2F0%2Flogs%2Freason%2F0#

https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Ftryserver.chromium.perf%2Fwin_perf_bisect%2F7160%2F%2B%2Frecipes%2Fsteps%2FFailure_reason%2F0%2Flogs%2Freason%2F0#

Bisect cannot identify a culprit: Testing the "good" revision failed: Failed to compile revision chromium@451685. Buildbucket job id 8986798035781998192

I was testing against 451685 to 451687

451682 good
451685 doesn't build
451687 unknown
451689 bad

I already started another one.
Blockedon: 695884
Blockedon: 683184
+dtu

re: #c4/5

It's unfortunate that the raw data isn't easier to access, perhaps I can surface it on the top level steps as some sort of link like "Bisecting revision <foo>: raw data". I'll file an issue to see what I can do there. Some notes on your doc, I don't think you *have* to go to the cached logs, you should be able to just access the logdog links from the build page. I think they persist for a lot longer than the old stdio links used to. 

Btw, if you're finding you're spending a lot of time digging through logs, feel free to ping me over chat or via email and maybe I can save you time in the future :)

I think we ran into this issue on this benchmark before, the MWU test isn't very good for these types of regressions. Since it's basically only a few outliers, the test can't get the confidence it needs to continue.

Pinpoint will support these kind of use cases, where one might interactively guide or change the comparison but making you wait until that's ready isn't ideal. Maybe there's something we can do in the short-term to at least make this work for this test. Dave mentioned in crbug.com/683184 that letting it run for more samples might help, since this test only takes 23 seconds, I can land something on the staging server to let it run way more samples.
Project Member

Comment 17 by 42576172...@developer.gserviceaccount.com, Feb 24 2017


=== BISECT JOB RESULTS ===
NO Perf regression found

Bisect Details
  Configuration: win_perf_bisect
  Benchmark    : media.tough_video_cases_extra
  Metric       : seek/tulip2.ogg_seek_warm

Revision             Result                  N
chromium@451683      4.86595 +- 39.5491      21      good
chromium@451688      4.97881 +- 37.9918      21      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests media.tough_video_cases_extra

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8986765245639240864

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5057481572614144


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
The culprit for the occasional long seeks is 451683 https://chromium.googlesource.com/chromium/src/+/6d0ecacfb66907f1fa666551055c3a748b4817cd


Owner: hubbe@chromium.org
I'm out for the rest of the day.

Hubbe, could you please find someone to own this bug? Flakiness is bad...

Comment 21 by hubbe@chromium.org, Feb 24 2017

Cc: hubbe@chromium.org
Owner: hablich@chromium.org
Triage points to an V8 roll, assigning to hablish@ for further analysis.

Project Member

Comment 22 by 42576172...@developer.gserviceaccount.com, Feb 24 2017


=== BISECT JOB RESULTS ===
NO Perf regression found

Bisect Details
  Configuration: staging_win_perf_bisect
  Benchmark    : media.tough_video_cases_extra
  Metric       : seek/video.html?src_tulip2.ogg_type_audio

Revision             Result                   N
chromium@451665      1.04071 +- 0.114953      21      good
chromium@451706      4.57714 +- 36.5892       21      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8986746935168020720

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5780000701153280


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
Cc: hablich@chromium.org
Owner: littledan@chromium.org
Status: Assigned (was: Untriaged)
Summary:

bisect doesn't work for this because it always just thinks things are outliers
but it is really consistent.

see comment 4
so I started a bunch of bisect jobs so that I could manually bisect
for chromium@451682, we get       1.08857 +- 0.18002
(from comment 10)

for chromium@451683, we get      4.86595 +- 39.5491
(from comment 17)

so both the mean and the standard deviation have both gone up
so I'm pretty sure about this one.
I just started one more bisect from 451682 to 451683 to double check.

The only change in the V8 roll is https://chromium.googlesource.com/v8/v8/+/2b9840d86f3b51f42918b6bf1a06ff8b62b2464d
Cc: binji@chromium.org
Components: Blink>JavaScript
This is mysterious. Why should a change to SAB suddenly create regressions? Binji/littledan do you have any insights?

Given that the 58er branch is going to be cut this week I think this should be reverted . Unfortunately, the revert does not apply cleanly.

Comment 26 by binji@chromium.org, Feb 27 2017

Yeah, I don't see why this CL could affect the given test, but I can revert anyway.

Comment 27 by binji@chromium.org, Feb 27 2017

It's pretty suspicious that 451682 is a //media change related to a h264 parser: https://codereview.chromium.org/2702973002. Any chance we're getting some flakiness in the bisect?
I did schedule the bisect to run one more time to be certain (and we're waiting on the results for that), but I trust the logic in comment 24. 

Also, H264 should not have anything to do with this: H264 is a video codec, and this is a regression in audio, not video.
But yes, if there is random variation governing when the bisect has outliers, then the result could be false and the culprit could be 451682.

Comment 30 by binji@chromium.org, Feb 27 2017

Ah, good point. I was thrown off by "media.tough_video_cases_extra" and missed the "seek/video.html?src_tulip2.ogg_type_audio" part. :-)

Logic seems reasonable in #24, just thought it seemed strange that a change to SharedArrayBuffers (in v8, behind a flag) would have anything to do with a media regression. This change seemed closer in spirit (and the commit position is very close too...)

Still working on the v8 revert, there were a number of changes on top of it, so it's a bit tricky to untangle.
Sorry you're having to do a lot of work for this... Hopefully the bisect comes back soon to confirm things.

Comment 32 by binji@chromium.org, Feb 27 2017

Landed revert in v8: https://codereview.chromium.org/2715223003

Don't have a v8 roll yet, but the auto-roller should kick in at some point.
Project Member

Comment 33 by 42576172...@developer.gserviceaccount.com, Feb 27 2017


=== BISECT JOB RESULTS ===
NO Perf regression found

Bisect Details
  Configuration: win_perf_bisect
  Benchmark    : media.tough_video_cases_extra
  Metric       : seek/video.html?src_tulip2.ogg_type_audio

Revision             Result                   N
chromium@451682      1.06595 +- 0.126811      21      good
chromium@451683      4.77238 +- 35.1662       21      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8986480342839257296

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5788413199908864


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
Thanks for landing the revert. As comment 33 shows, we are sure that 451683 is the culprit.
Thanks binji.

Revert rolling in right now: https://codereview.chromium.org/2722463005/
The next V8 roll is in progress at https://codereview.chromium.org/2722463005/ . When this lands, it would be interesting to know whether the regression has been addressed. (Sorry, but I won't be able to check the graphs myself.)
Status: Fixed (was: Assigned)
The V8 roll is chromium 453568. The regression was still present in 453556, but it is fixed in chromium 453609: https://chromeperf.appspot.com/group_report?bug_id=695653

This means that we're fairly sure that the V8 roll was the fix.

I'm marking this fixed. Please file a new bug and link to this one for further work in figuring out what went wrong.

Comment 38 by binji@chromium.org, Feb 28 2017

Yes, looks like it was the roll. I'm pretty surprised, but that's why we measure instead of guess. Thanks!
Project Member

Comment 39 by bugdroid1@chromium.org, Mar 2 2017

Labels: merge-merged-5.8
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/5acd2a207fb9a4c432e3868be77cd870eed7e169

commit 5acd2a207fb9a4c432e3868be77cd870eed7e169
Author: Michael Hablich <hablich@chromium.org>
Date: Thu Mar 02 12:54:02 2017

Merged: This is a speculative chain of reverts to improve a Chrome perf regression. See crbug.co ...

Revision: 5a04f4fd68d1d35d704cdc0dee0719c5354a8094

BUG= chromium:695653 
LOG=N
NOTRY=true
NOPRESUBMIT=true
NOTREECHECKS=true
TBR=binji@chromium.org

Review-Url: https://codereview.chromium.org/2725873004 .
Cr-Commit-Position: refs/branch-heads/5.8@{#7}
Cr-Branched-From: eda659cc5e307f20ac1ad542ba12ab32eaf4c7ef-refs/heads/5.8.283@{#1}
Cr-Branched-From: 4310cd02d2160b1457baed81a2f40063eb264a21-refs/heads/master@{#43429}

[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/BUILD.gn
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/bootstrapper.cc
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/bootstrapper.h
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/builtins/builtins-sharedarraybuffer.cc
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/builtins/builtins.h
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/debug/mirrors.js
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/heap/heap.cc
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/heap/heap.h
[add] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/js/harmony-atomics.js
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/js/prologue.js
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/runtime/runtime-atomics.cc
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/runtime/runtime-futex.cc
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/runtime/runtime.h
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/snapshot/natives-common.cc
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/snapshot/natives-external.cc
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/snapshot/natives.h
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/src/v8.gyp
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/test/cctest/heap/test-spaces.cc
[modify] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/test/mjsunit/harmony/futex.js
[add] https://crrev.com/5acd2a207fb9a4c432e3868be77cd870eed7e169/test/mjsunit/minmax-simple.js

Filed a new bug to address the performance issues and reland in https://bugs.chromium.org/p/v8/issues/detail?id=6033. I was mistaken; I can view this performance graph. It looks like the performance regressed again, in a V8 roll that included the related patch https://codereview.chromium.org/2697013009 .
Project Member

Comment 44 by 42576172...@developer.gserviceaccount.com, Apr 11 2017


=== BISECT JOB RESULTS ===
NO Perf regression found

Bisect Details
  Configuration: win_perf_bisect
  Benchmark    : media.tough_video_cases_extra
  Metric       : seek/tulip2.ogg_seek_warm

Revision             Result                  N
chromium@451683      5.10738 +- 41.4188      21      good
chromium@451688      4.28262 +- 32.8693      21      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests media.tough_video_cases_extra

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8982652053865000784

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5057481572614144


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
Project Member

Comment 47 by 42576172...@developer.gserviceaccount.com, Apr 11 2017


=== BISECT JOB RESULTS ===
NO Perf regression found

Bisect Details
  Configuration: win_perf_bisect
  Benchmark    : media.tough_video_cases_extra
  Metric       : seek/video.html?src_tulip2.ogg_type_audio

Revision             Result                 N
chromium@451685      6.5919 +- 57.4585      21      good
chromium@451687      4.8069 +- 35.9         21      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.tulip2.ogg.type.audio media.tough_video_cases_extra

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8982632394884768336

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5258015743148032


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!

Sign in to add a comment