Enable reference builds for media_perftests and other performance gtests. |
||||
Issue descriptionmedia_perftests, angle_perftests, and other gtest perf tests don't have reference builds to compare against, so we can't easily tell if a regression is caused by a change in Chrome or a change in some environmental factor like a device driver or the temperature of the lab or something else like that. Here's an example: https://chromeperf.appspot.com/group_report?bug_id=710017 The media_perftests graph does not have a reference build, but the tough_video_cases graphs do. Here's another example: https://chromeperf.appspot.com/report?sid=3181fd43529579c50e9c1cad577a8f1058c1a97620154e978c70235126e72a3c +watk@, johnchen@, dalecurtis@ interested media team members +jmadill@ angle_perftests owner +enne@ cc_perftests owner +reveman@ gpu_perftests owner +primiano@ tracing_perftests owner I'm unsure about how this works. Perhaps +sullivan@ could point us to someone who can point us to relevant code?
,
Apr 24 2017
,
Apr 24 2017
The way this is done in Telemetry is the binary under test (BUT) is a controllable variable. It could be a TOT browser, or it could be a reference browser stored in cloud storage.
For C++ tests like media_perftests and angle_perftest, there is no detachment between BUT & the test driver, hence the same model doesn't work here. So to make it work, I think there is two options:
1) Detach the BUT & the test driver (the current telemetry way). I am not sure whether we can do some dynamical linking magic here. (primiano?)
Another option is create a lightweight test driver, something python script that does:
if options.ref_binary:
# Download the ref binary from cloud storage
subprocess.Popen('ref_media_perftests')
else:
subprocess.Popen('out/media_perftests')
2) Redesign the way ref build work from ground up at the recipe level. A ref build run is a swarming run specified by a fixed revision. Then we can use the swarming service to kick off a run on that fixed revision (very similar to how pinpoint would trigger the job).
The nice thing about this option is updating the ref build is probably as simple as updating that fixed revision.
,
Apr 25 2017
Isn't this a problem we can solve with bisect? i.e. if the bisect cannot repro, ignore? Honestly, at least as far tracing_perftest is concerned, ref builds is IMHO not worth the effort and is going to increase waterfall maintenance cost.
,
Apr 25 2017
issue 710017 is an example of lack of a reference build causing additional work for sheriffs. When there isn't a reference build, it takes a good bit of work to understand what happened (1. run bisect, 2. see that culprit wasn't found, 3. re-run bisect, 4. figure out that it was a device change maybe, 5. close bug WontFix). When there is a reference build, it only takes a glance. It is correct that this will increase costs shortterm, but generally gtest perf tests are very quick, especially compared to Telemetry tests, so the increase should be small. media_perftests for example only takes ~17 seconds to run all of its cases. In general, we should be encouraging the use of gtest perf tests wherever possible because they tend to isolate regressions in an easily reproducible way and they are quicker to run. If you think about it in terms of the testing pyramid, Telemetry tests are E2E tests, and gtest perf tests are unit or integration tests. If we can provide better features like reference build runs for the gtests, then developers will write those instead of Telemetry tests where possible. So actually, increasing the features of the gtest perf tests with *reduce* waterfall maintenance longterm because we will have more maintainable test cases. nednyguyen@'s solutions both seem reasonable and not too difficult.
,
Apr 26 2017
+ Yuri for performance_browser_tests
,
May 10 2017
Yes, I'd love to see this for performance_browser_tests as well. :) BTW--I'm also a perf sheriff and I having the ref signals seems to eliminate a HUGE number of false-positives. When we don't have the ref signals, the bisects have to be run multiple times, with varying ranges, just to be sure there isn't a regression. It's very time-consuming to manage.
,
Jun 29 2017
I hit this during my media perf sheriff rotation today -- bug 738085 is an example where hopefully the bisect will find a culprit. If it doesn't, more human time will be spent per the 1st paragraph of https://crbugcom/714812#c5.
,
Feb 20 2018
Hit again : bug 812708 . |
||||
►
Sign in to add a comment |
||||
Comment 1 by crouleau@chromium.org
, Apr 24 2017