Figure out how to run a subset of Chromium perf benchmarks for each V8 revision |
||||||||
Issue descriptionThere are a number of builders that run on master.chromium.perf.try for as many Chromium revisions as we have capacity for: - Win Builder FYI - Win Clang Builder - Android Builder FYI - Android arm64 Builder FYI - Battor Agent Linux - Battor Agent Mac - Battor Agent Win - Linux Compile FYI V8 is rolled into Chromium therefore even if we had capacity to run tests on each Chromium revision, we'd still have large bisect range for V8. Instead, we'd like to set up builder(s) that run a subset of benchmarks on each revision of V8. Open questions: - Which benchmarks do we want to run? On which platforms? - Where do we get additional hardware for the bot? Which hardware specs should it have? Do we want to configure this bot to be less noisy, e.g. set CPU governor to performance or to powersafe mode? - Which Chromium revision should each V8 revision be integrated into?
,
Nov 17 2017
The config passing and running on V8-side should be trivial. Non-trivial is how we update Chromium and how we report those updates to chrome-perf. We didn't have a good solution to this in the past. If we just asynchronously update Chromium by setting it to HEAD, we'll deal with all the upstream noise and known issues from Chromium also on V8 side. We need a way to hide all this. E.g. seldom update Chromium (e.g. once a day), do those updates in separation from updating V8, and somehow hide those updates in Chromeperf. E.g. normalize the number we post by subtracting the diffs in the Chromium builds. Or mark those points, such that Chromeperf ignores them.
,
Nov 17 2017
Thanks Michael for pointing out this issue. Perhaps we can run all revisions of V8 against the same Chromium revision during a day. At the end of the day we'll re-run last tested revision of V8 with newer Chrome and use the difference and adjust offset applied to the next section of the graph. hablich@: Can you please help to answer questions from post #0? sullivan@: Do you have a solution on server-side allowing to hide regressions when updating Chromium? If not, would your team have cycles to implement it?
,
Nov 17 2017
I am owning the benchmarking infra, so please cc me on these bugs.
,
Dec 11 2017
Friendly ping on #3.
,
Dec 11 2017
+simonhatch to answer question in #3
,
Aug 7
Simon, can you please answer the question in #3?
,
Aug 7
,
Aug 7
cc-ing benjhayden since Simon is OOO. Any thoughts on how v8 could upload data with continuous v8 revisions, every once in a while updating the chrome build, and ignoring benchmark regressions that came from Chrome? The problem seems similar to ignoring noise from device swapping on the perf waterfall.
,
Aug 7
,
Aug 7
,
Aug 13
Sorry, was OOO. Seems like we could allow some sort of metadata along with the upload, akin to device id, that find_anomalies could read and use to ignore alerts.
,
Aug 13
That would address the issue of alerts, but graphs would still be confusing. I wonder if we could run certain builds with different versions of Chromium, but same V8 version and report the difference to chromeperf as a baseline at a given revision. Then chromeperf would deduct that value from all points from subsequent revisions up to the next baseline. WDYT?
,
Aug 13
Sorry, to be clear in my suggestion we'd have something up on the graph to show that the version changed. If you want the graphs to be continuous and not even have potential jumps around these version changes, could you not keep track of the baseline value yourself and subtract before uploading?
,
Aug 15
That's a good point. The only issue I can see here is persistent storage for the last know baseline. On LUCI, we can not store files on the bot anymore, so we'd need an external service to persist data. I wonder if we can somehow use chromeperf API to query last 100 data points and store baseline in the supplemental columns, e.g. data points will look as following:
1. chrome_rev=1, v8_rev=1, baseline=200, actual=210, metric=10 (first build)
2. chrome_rev=1, v8_rev=2, actual=211, metric=11 (new V8 revision, Chrome is the same, no regression - just noise)
3. chrome_rev=2, v8_rev=2, baseline=230, actual=240, metric=10 (new Chrome revision without changing V8 revision,
no V8 regression reported despite Chrome regression)
4. chrome_rev=2, v8_rev=3, actual=250, metric=20 (V8 regression detected)
Above, the 'baseline' value is stored as a supplemental column, actual metric value is only know to the recipe (not uploaded to ChromePerf) and metric value is the one displayed on the graph.
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by serg...@chromium.org
, Nov 16 2017