New issue
Advanced search Search tips

Issue 785999 link

Starred by 0 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: ----



Sign in to add a comment

Figure out how to run a subset of Chromium perf benchmarks for each V8 revision

Project Member Reported by serg...@chromium.org, Nov 16 2017

Issue description

There are a number of builders that run on master.chromium.perf.try for as many Chromium revisions as we have capacity for:
 - Win Builder FYI
 - Win Clang Builder
 - Android Builder FYI
 - Android arm64 Builder FYI
 - Battor Agent Linux
 - Battor Agent Mac
 - Battor Agent Win
 - Linux Compile FYI

V8 is rolled into Chromium therefore even if we had capacity to run tests on each Chromium revision, we'd still have large bisect range for V8. Instead, we'd like to set up builder(s) that run a subset of benchmarks on each revision of V8.

Open questions:
 - Which benchmarks do we want to run? On which platforms?
 - Where do we get additional hardware for the bot? Which hardware specs should it have? Do we want to configure this bot to be less noisy, e.g. set CPU governor to performance or to powersafe mode?
 - Which Chromium revision should each V8 revision be integrated into?
 
All of these builders run chromium recipe: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium.py.

It uses specs from chromium_tests recipe module to define what's being checked out: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_perf_fyi.py. Here we can override gclient_config with a custom config that allows to specify V8 version, e.g.

  @CHROMIUM_CONFIG_CTX(includes=['chromium'])
  def chromium_v8_perf(c):
    c.solutions[0].revision = chromium_revision
    c.revisions['v8'] = v8_revision

To have it run on each V8 revision, we'll need to have a scheduler on V8 repo. I am not sure yet how we can pass this revision from the scheduler to config above.
The config passing and running on V8-side should be trivial. Non-trivial is how we update Chromium and how we report those updates to chrome-perf. We didn't have a good solution to this in the past.

If we just asynchronously update Chromium by setting it to HEAD, we'll deal with all the upstream noise and known issues from Chromium also on V8 side. We need a way to hide all this. E.g. seldom update Chromium (e.g. once a day), do those updates in separation from updating V8, and somehow hide those updates in Chromeperf. E.g. normalize the number we post by subtracting the diffs in the Chromium builds. Or mark those points, such that Chromeperf ignores them.
Thanks Michael for pointing out this issue. Perhaps we can run all revisions of V8 against the same Chromium revision during a day. At the end of the day we'll re-run last tested revision of V8 with newer Chrome and use the difference and adjust offset applied to the next section of the graph.

hablich@: Can you please help to answer questions from post #0?
sullivan@: Do you have a solution on server-side allowing to hide regressions when updating Chromium? If not, would your team have cycles to implement it?
Cc: nedngu...@google.com
I am owning the benchmarking infra, so please cc me on these bugs.
Friendly ping on #3.
Cc: simonhatch@chromium.org
+simonhatch to answer question in #3
Owner: simonhatch@chromium.org
Simon, can you please answer the question in #3?
Cc: -simonhatch@chromium.org serg...@chromium.org
Cc: benjhayden@chromium.org
cc-ing benjhayden since Simon is OOO. Any thoughts on how v8 could upload data with continuous v8 revisions, every once in a while updating the chrome build, and ignoring benchmark regressions that came from Chrome? The problem seems similar to ignoring noise from device swapping on the perf waterfall.
Owner: benjhayden@chromium.org
Cc: -serg...@chromium.org
Owner: serg...@chromium.org
Sorry, was OOO.

Seems like we could allow some sort of metadata along with the upload, akin to device id, that find_anomalies could read and use to ignore alerts.
That would address the issue of alerts, but graphs would still be confusing. I wonder if we could run certain builds with different versions of Chromium, but same V8 version and report the difference to chromeperf as a baseline at a given revision. Then chromeperf would deduct that value from all points from subsequent revisions up to the next baseline.

WDYT?
Sorry, to be clear in my suggestion we'd have something up on the graph to show that the version changed.

If you want the graphs to be continuous and not even have potential jumps around these version changes, could you not keep track of the baseline value yourself and subtract before uploading?
That's a good point. The only issue I can see here is persistent storage for the last know baseline. On LUCI, we can not store files on the bot anymore, so we'd need an external service to persist data. I wonder if we can somehow use chromeperf API to query last 100 data points and store baseline in the supplemental columns, e.g. data points will look as following:

  1. chrome_rev=1, v8_rev=1, baseline=200, actual=210, metric=10   (first build)
  2. chrome_rev=1, v8_rev=2, actual=211, metric=11                 (new V8 revision, Chrome is the same, no regression - just noise)
  3. chrome_rev=2, v8_rev=2, baseline=230, actual=240, metric=10   (new Chrome revision without changing V8 revision,
                                                                    no V8 regression reported despite Chrome regression)
  4. chrome_rev=2, v8_rev=3, actual=250, metric=20                 (V8 regression detected)

Above, the 'baseline' value is stored as a supplemental column, actual metric value is only know to the recipe (not uploaded to ChromePerf) and metric value is the one displayed on the graph.

Sign in to add a comment