Reducing the cycle time of perf waterfall from 10 hours to 1 hour |
||||||||||||||||||||||||
Issue descriptionCurrently, the cycle time of perf waterfall are almost 10 hours (see the graph in go/perf-waterfall-cycle-time - Googler only). We open this as the meta bug to get it down to 1 hour time. This is an aggressive goal but we believe this projects has a lot of benefits: 1) Long cycle time makes it hard for perf sheriffs to green up the perf waterfall. This is because there will be many failures perf build (due to many CLs), and sheriffs CLs to disable failing tests won't affect test runs much later (at which points maybe the test is passing again). 2) Long cycle time makes delays the time when a developer lands a CL that causes the regression to the time they get notified. This makes it significantly harder for them to fix the issue because many CLs may have been landed of top of their CL. 3) Long cycle time means there would be many CLs in between builds, makes it hard for both bisect bot & humans to figure out which change list caused the perf regression. Project road map & blocking bugs will be added to this cover bug later. ⛆ |
|
|
,
Apr 19 2017
,
Apr 19 2017
,
Apr 19 2017
,
Apr 19 2017
,
Apr 19 2017
,
Apr 19 2017
Sorry I wasn't more involved in this earlier, but could we come up with a non-arbitrary rule about how much time budget we can give per user story? For example, if we stick with 1 hour (I like this as it's consistent with the CQ), the other two variables are: - # of devices - time it takes a device to run a test (overhead) - test timeout - # of tests When we control the # of devices (which we currently do), we become prescriptive about the number of tests that can possibly run to get us under and hour. If we filtered all of our tests with this in mind: first pass == no tests that take longer than x minutes to run (enforcement of timeout). second pass == no tests that are duplicated by other tests (mutual exclusion). third pass == prioritized tests and draw a line. Once we have a line, we can then see what more hardware will buy in terms of tests AND buffer for future tests.
,
Apr 20 2017
Ben: that's a fine idea & have been used in many test framework. I think we should consider, though the challenge is integragration perf test' cycle time can be vastly diverse. Taking loading for example: loading a site can take any where from 0.5s to 10s of seconds. And folks do want to test all spectrum for coverage completeness.
,
Apr 20 2017
That's a good point. But even with these examples, we could potentially limit what tests we're willing to cover in the lab. There's a trade-off between coverage and...well...this cost/time-to-bisect/test-run-time thing we keep bringing up.
,
Apr 20 2017
,
Apr 20 2017
,
Apr 25 2017
,
Apr 27 2017
,
May 1 2017
,
May 6 2017
,
Jun 2 2017
,
Jul 19 2017
,
Aug 22 2017
,
Dec 27 2017
Moving over to ned.
,
Feb 15 2018
,
Feb 15 2018
,
Jun 28 2018
,
Jul 6
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b7e98079d9dc9a2230d62c587144097171bff499 commit b7e98079d9dc9a2230d62c587144097171bff499 Author: Ned Nguyen <nednguyen@google.com> Date: Fri Jul 06 02:05:47 2018 Reduce the swarming timeout of perf tests NOTRY=true TBR=eyaich@chromium.org Bug: 713345 Cq-Include-Trybots: master.tryserver.chromium.perf:obbs_fyi Change-Id: Ibc1954a26040e8899f9ff13fee0756ca0a9b1481 Reviewed-on: https://chromium-review.googlesource.com/1127480 Commit-Queue: Ned Nguyen <nednguyen@google.com> Reviewed-by: Ned Nguyen <nednguyen@google.com> Cr-Commit-Position: refs/heads/master@{#572882} [modify] https://crrev.com/b7e98079d9dc9a2230d62c587144097171bff499/testing/buildbot/chromium.perf.fyi.json [modify] https://crrev.com/b7e98079d9dc9a2230d62c587144097171bff499/testing/buildbot/chromium.perf.json [modify] https://crrev.com/b7e98079d9dc9a2230d62c587144097171bff499/tools/perf/core/perf_data_generator.py
,
Jul 15
,
Jan 7
So I am not sure we are actively working on this goal anymore. We were able to reduce to ~2-3 hours and our current goal is to at least maintain this. Adding Caleb as an FYI of the goal and closing.
,
Jan 7
Since we aren't quite at 1 hour yet, I will take this bug and see if there is anything we can do with it. This bug is hopefully because we can link to it as a reason to increase hardware and keep test runs short. |
|||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||||
Comment 1 by nedngu...@google.com
, Apr 19 2017