New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 633118 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Sep 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

47.3% regression in speedometer at 408809:408851

Project Member Reported by alexclarke@chromium.org, Aug 1 2016

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=633118

Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?keys=agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgICgxv-SuAoM


Bot(s) for this bug's original alert(s):

win-zenbook
Cc: qyears...@chromium.org
Owner: qyears...@chromium.org

=== Auto-CCing suspected CL author qyearsley@chromium.org ===

Hi qyearsley@chromium.org, the bisect results pointed to your CL below as possibly
causing a regression. Please have a look at this info and see whether
your CL be related.


===== BISECT JOB RESULTS =====
Status: completed


===== SUSPECTED CL(s) =====
Subject : Add new baselines for mac-retina after swapping the machine.
Author  : qyearsley
Commit description:
  
Background: In order to have a mac retina layout test try bot with a
matching continuous bot, the slave for "WebKit Mac10.11 (retina)", which
was a MacBookPro11,3 with Nvidia GPU, was swapped for a MacBookPro11,5
with AMD GPU (http://crrev.com/2184313002).

After the builder came back online with the new slave,
svg/text/combining-character-queries.html was failing consistently.

See results:
https://storage.googleapis.com/chromium-layout-test-archives/WebKit_Mac10_11__retina_/5141/layout-test-results/results.html

Review-Url: https://codereview.chromium.org/2193353004
Cr-Commit-Position: refs/heads/master@{#408828}
Commit  : 77aba94595e0cab5b370933cd5ef6865b55128d6
Date    : Sat Jul 30 01:13:34 2016


===== TESTED REVISIONS =====
Revision         Mean     Std Dev  N    Good?
chromium@408808  6055.73  188.278  8    good
chromium@408819  5976.28  141.639  210  good
chromium@408824  6007.06  205.535  210  good
chromium@408827  5986.98  41.0731  18   good
chromium@408828  5952.56  49.1998  12   bad    <--
chromium@408829  5949.1   99.9408  210  bad
chromium@408851  5923.47  29.9338  8    bad

Bisect job ran on: winx64_zen_perf_bisect
Bug ID: 633118

Test Command: src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --also-run-disabled-tests speedometer
Test Metric: Total/Total
Relative Change: 1.33%
Score: 99.5

Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/winx64_zen_perf_bisect/builds/325
Job details: https://chromeperf.appspot.com/buildbucket_job_status/9005542101985729456


Not what you expected? We'll investigate and get back to you!
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5814255685730304

| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Tests>AutoBisect.  Thank you!
Owner: alexclarke@chromium.org
Couldn't have been r408828, since that CL just changed layout test baselines, and would only affect layout tests.
The ref shows some movement too, so this might be a false alert.  Still it doesn't move perfectly (might be because things are noisy).  Trying another bisect with a lot of iterations.
Cc: robert...@chromium.org
Note, I believe that the repeat_count parameter doesn't do anything now (https://github.com/catapult-project/catapult/issues/2602).

Actual repeat count is variable, and bisect repeats as many times as it needs in order to try to find a statistically significant result, e.g. above in that last bisect job, some revisions were repeated 210 times.

I wonder if doing this led to finding a false positive in this case?

===== BISECT JOB RESULTS =====
Status: completed


=== Bisection aborted ===
The bisect was aborted because The metric values for the initial "good" and "bad" revisions do not represent a clear regression.
Please contact the the team (see below) if you believe this is in error.

=== Warnings ===
The following warnings were raised by the bisect job:

 * Bisect failed to reproduce the regression with enough confidence.

===== TESTED REVISIONS =====
Revision         Mean     Std Dev  N   Good?
chromium@404000  7274.57  627.125  18  good
chromium@409000  6911.0   550.37   18  bad

Bisect job ran on: winx64_zen_perf_bisect
Bug ID: 633118

Test Command: src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --also-run-disabled-tests speedometer
Test Metric: Total/Total
Relative Change: 11.77%
Score: 0

Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/winx64_zen_perf_bisect/builds/347
Job details: https://chromeperf.appspot.com/buildbucket_job_status/9005420267585263088


Not what you expected? We'll investigate and get back to you!
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5859066958577664

| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Tests>AutoBisect.  Thank you!
Cc: -qyears...@chromium.org
Cc: alexclarke@chromium.org
Owner: toyoshim@chromium.org
Status: WontFix (was: Assigned)
Original regression around r408809 seems to be recovered around r409065.
I do not have a strong confidence, but could be cause of noise because reference run showed slightly worse results in the same range.

Sign in to add a comment