New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 813853 link

Starred by 4 users

Issue metadata

Status: WontFix
Owner:
OOO until 2019-01-24
Closed: Feb 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression

Blocking:
issue 804174



Sign in to add a comment

14.4%-18% regression in media.desktop at 536997:537090

Project Member Reported by crouleau@google.com, Feb 20 2018

Issue description

See the link to graphs below.
 
Project Member

Comment 1 by 42576172...@developer.gserviceaccount.com, Feb 20 2018

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=813853

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=80fe9b815ec9fd1903f307aa39370da3de204d1f1d31a33f3187a40a7cf807a0


Bot(s) for this bug's original alert(s):

chromium-rel-mac11-air
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Feb 20 2018

Cc: d...@chromium.org dpranke@chromium.org tandrii@chromium.org kbr@chromium.org
Owner: kbr@chromium.org
Status: Assigned (was: Untriaged)
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/11e02307840000

Revert "[test_env.py] Warm up vpython virtualenv cache on swarming task shards." by kbr@chromium.org
https://chromium.googlesource.com/chromium/src/+/7dde857919af2f59fcab10264d696301a15c64da

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions

Comment 4 by kbr@chromium.org, Feb 20 2018

Blocking: 804174
Cc: nednguyen@chromium.org
Status: WontFix (was: Assigned)
If the change to test_env.py really boosted benchmark scores then that's pretty interesting and perhaps concerning because benchmarks should be more isolated from their environment than that.

Cc: charliea@chromium.org
Very interesting, but I think this can be a bit unavoidable because the regressed metric here is power, which is measured for the whole machine by Battor device. Anything that affect the platform will affect the metric. 
+charliea@ is working on reducing noise.
I'm trying to understand what the issue is a little bit better here.

I think that I understand what vpython is: it's Python that goes through an additional layer of indirection (like rvm, ruby version manager) that better allows you to control the dependencies (e.g. Python version, Python packages) that your program is pulling in.

So the case here seems to be that vpython start up time can vary and, because some of our benchmarks have that vpython start up time in their critical path, this start up time is introducing noise to the tests.

Also, it seems that we're able to warm up the vpython cache by just booting up vpython really quickly in our test harness before starting the test. Warming up this cache reduces that start up time variability and therefore reduces noise.

So the graphs above clearly show that warming up the vpython cache beforehand adversely impacted power during media.desktop. What instance of vpython is started while we're measuring the average power of media.desktop, though? It seems like Telemetry should be long-started by this point, no?

Comment 8 by kbr@chromium.org, Feb 21 2018

Cc: iannucci@chromium.org
+iannucci who is the vpython expert

I suspect that warming up vpython so much earlier -- before the test target even had a chance to run -- eliminated the variability of warming it up when running the test target.

I gather that all LUCI bots are going to use vpython by default, so that should further eliminate variability.

Comment 9 by kbr@chromium.org, Feb 21 2018

Cc: briander...@chromium.org
 Issue 813954  has been merged into this issue.
kbr@ yep, I kind of understand how that might change the results of some sort of microbenchmark in which many of the measurements take place 1s after Python startup or something, but it seems *nuts* that it'd change the overall power consumed by a long (10-20s) story by something like 20%. Based on the graphs, it obviously does, but that kind of shatters my belief that we have even close to a grasp on the factors that might affect power noise in our benchmark harness.

Comment 11 by kbr@chromium.org, Feb 21 2018

Cc: alexclarke@chromium.org
 Issue 814203  has been merged into this issue.

Comment 12 by kbr@chromium.org, Feb 21 2018

 Issue 814204  has been merged into this issue.

Comment 13 by kbr@chromium.org, Feb 21 2018

 Issue 814205  has been merged into this issue.
FTR: vpython startup time definitely can vary a lot:
  * If the virtualenv indicated by the .vpython spec file doesn't exist:
    * It fetches the wheels from CIPD (which may be cached)
    * It generates a new virtualenv
    * It installs the wheels into that new virtualenv
  * It launches python in the virtualenv

Warming up the cache beforehand will skip out all the variable bits though, and is the recommended solution. Swarming tasks should also persist these virtualenvs between runs (using a named cache directory), which should also make the warmup phase a no-op most of the time.

Comment 15 by dtu@chromium.org, Feb 22 2018

Cc: sullivan@chromium.org
Sorry, the Pinpoint message is a little misleading here. The vpython CL didn't cause a perf regression -- the "difference" it's talking about is from failing to passing.
Cc: -dpranke@chromium.org

Comment 17 by kbr@chromium.org, Feb 23 2018

Cc: dpranke@chromium.org
 Issue 814184  has been merged into this issue.

Comment 18 by kbr@chromium.org, Feb 23 2018

 Issue 814202  has been merged into this issue.

Comment 19 by kbr@chromium.org, Mar 15 2018

Cc: hjd@google.com
 Issue 821381  has been merged into this issue.

Sign in to add a comment