New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 714812 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 710017



Sign in to add a comment

Enable reference builds for media_perftests and other performance gtests.

Project Member Reported by crouleau@chromium.org, Apr 24 2017

Issue description

media_perftests, angle_perftests, and other gtest perf tests don't have reference builds to compare against, so we can't easily tell if a regression is caused by a change in Chrome or a change in some environmental factor like a device driver or the temperature of the lab or something else like that.

Here's an example:
https://chromeperf.appspot.com/group_report?bug_id=710017
The media_perftests graph does not have a reference build, but the tough_video_cases graphs do.

Here's another example: https://chromeperf.appspot.com/report?sid=3181fd43529579c50e9c1cad577a8f1058c1a97620154e978c70235126e72a3c

+watk@, johnchen@, dalecurtis@ interested media team members
+jmadill@ angle_perftests owner
+enne@ cc_perftests owner
+reveman@ gpu_perftests owner
+primiano@ tracing_perftests owner

I'm unsure about how this works. Perhaps +sullivan@ could point us to someone who can point us to relevant code?
 
Blocking: 710017
Cc: nedngu...@google.com
Components: Speed>Benchmarks
The way this is done in Telemetry is the binary under test (BUT) is a controllable variable. It could be a TOT browser, or it could be a reference browser stored in cloud storage.

For C++ tests like media_perftests and angle_perftest, there is no detachment between BUT & the test driver, hence the same model doesn't work here. So to make it work, I think there is two options:

1) Detach the BUT & the test driver (the current telemetry way). I am not sure whether we can do some dynamical linking magic here. (primiano?)

Another option is create a lightweight test driver, something python script that does:
if options.ref_binary:
   # Download the ref binary from cloud storage
  subprocess.Popen('ref_media_perftests')
else:
  subprocess.Popen('out/media_perftests')

2) Redesign the way ref build work from ground up at the recipe level. A ref build run is a swarming run specified by a fixed revision. Then we can use the swarming service to kick off a run on that fixed revision (very similar to how pinpoint would trigger the job).

The nice thing about this option is updating the ref build is probably as simple as updating that fixed revision. 
Isn't this a problem we can solve with bisect? i.e. if the bisect cannot repro, ignore?
Honestly, at least as far tracing_perftest is concerned, ref builds is IMHO not worth the effort and is going to increase waterfall maintenance cost.
 issue 710017  is an example of lack of a reference build causing additional work for sheriffs. When there isn't a reference build, it takes a good bit of work to understand what happened (1. run bisect, 2. see that culprit wasn't found, 3. re-run bisect, 4. figure out that it was a device change maybe, 5. close bug WontFix). When there is a reference build, it only takes a glance.

It is correct that this will increase costs shortterm, but generally gtest perf tests are very quick, especially compared to Telemetry tests, so the increase should be small. media_perftests for example only takes ~17 seconds to run all of its cases. In general, we should be encouraging the use of gtest perf tests wherever possible because they tend to isolate regressions in an easily reproducible way and they are quicker to run. If you think about it in terms of the testing pyramid, Telemetry tests are E2E tests, and gtest perf tests are unit or integration tests. If we can provide better features like reference build runs for the gtests, then developers will write those instead of Telemetry tests where possible. So actually, increasing the features of the gtest perf tests with *reduce* waterfall maintenance longterm because we will have more maintainable test cases.

nednyguyen@'s solutions both seem reasonable and not too difficult.
Cc: m...@chromium.org
+ Yuri for performance_browser_tests

Comment 7 by m...@chromium.org, May 10 2017

Yes, I'd love to see this for performance_browser_tests as well. :)

BTW--I'm also a perf sheriff and I having the ref signals seems to eliminate a HUGE number of false-positives. When we don't have the ref signals, the bisects have to be run multiple times, with varying ranges, just to be sure there isn't a regression. It's very time-consuming to manage.
Cc: wolenetz@chromium.org
I hit this during my media perf sheriff rotation today --  bug 738085  is an example where hopefully the bisect will find a culprit. If it doesn't, more human time will be spent per the 1st paragraph of https://crbugcom/714812#c5.
Hit again :  bug 812708 .

Sign in to add a comment