New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 713345 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on: View detail
issue 708363
issue 757933
issue 857283
issue 863768
issue 700086
issue 710253
issue 711794
issue 713346
issue 713357
issue 715177
issue 716025

Blocking:
issue 657183
issue 728882



Sign in to add a comment

Reducing the cycle time of perf waterfall from 10 hours to 1 hour

Project Member Reported by nedngu...@google.com, Apr 19 2017

Issue description

Currently, the cycle time of perf waterfall are almost 10 hours (see the graph in go/perf-waterfall-cycle-time - Googler only).

We open this as the meta bug to get it down to 1 hour time. This is an aggressive goal but we believe this projects has a lot of benefits:
1) Long cycle time makes it hard for perf sheriffs to green up the perf waterfall. This is because there will be many failures perf build (due to many CLs), and sheriffs CLs to disable failing tests won't affect test runs much later (at which points maybe the test is passing again).
2) Long cycle time makes delays the time when a developer lands a CL that causes the regression to the time they get notified. This makes it significantly harder for them to fix the issue because many CLs may have been landed of top of their CL.
3) Long cycle time means there would be many CLs in between builds, makes it hard for both bisect bot & humans to figure out which change list caused the perf regression.

Project road map & blocking bugs will be added to this cover bug later.
 
Cc: dpranke@chromium.org
Status: Assigned (was: Untriaged)
Summary: Reducing the cycle time of perf waterfall from 10 hours to 1 hour (was: Reducing the cycle time of perf waterfall)
Blockedon: 713346
Blockedon: 710253
Blockedon: 713357
Sorry I wasn't more involved in this earlier, but could we come up with a non-arbitrary rule about how much time budget we can give per user story? For example, if we stick with 1 hour (I like this as it's consistent with the CQ), the other two variables are:
- # of devices
- time it takes a device to run a test (overhead)
- test timeout
- # of tests

When we control the # of devices (which we currently do), we become prescriptive about the number of tests that can possibly run to get us under and hour. If we filtered all of our tests with this in mind: first pass == no tests that take longer than x minutes to run (enforcement of timeout). second pass == no tests that are duplicated by other tests (mutual exclusion). third pass == prioritized tests and draw a line.

Once we have a line, we can then see what more hardware will buy in terms of tests AND buffer for future tests.
Ben: that's a fine idea & have been used in many test framework. I think we should consider, though the challenge is integragration perf test' cycle time can be vastly diverse.
Taking loading for example: loading a site can take any where from 0.5s to 10s of seconds. And folks do want to test all spectrum for coverage completeness.
That's a good point. But even with these examples, we could potentially limit what tests we're willing to cover in the lab. There's a trade-off between coverage and...well...this cost/time-to-bisect/test-run-time thing we keep bringing up.
Blockedon: 711794
Blockedon: 700086
Blockedon: 715177
Blockedon: 716025
Blockedon: 708363
Cc: martiniss@chromium.org charliea@chromium.org
 Issue 691582  has been merged into this issue.
Blocking: 728882
Project Member

Comment 17 by sheriffbot@chromium.org, Jul 19 2017

Labels: Hotlist-Google
Blockedon: 757933
Owner: nedngu...@google.com
Moving over to ned.
Cc: -nedngu...@google.com nednguyen@chromium.org ashleymarie@chromium.org
Owner: eyaich@chromium.org
Blocking: 657183
Blockedon: 857283
Project Member

Comment 23 by bugdroid1@chromium.org, Jul 6

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b7e98079d9dc9a2230d62c587144097171bff499

commit b7e98079d9dc9a2230d62c587144097171bff499
Author: Ned Nguyen <nednguyen@google.com>
Date: Fri Jul 06 02:05:47 2018

Reduce the swarming timeout of perf tests

NOTRY=true
TBR=eyaich@chromium.org

Bug: 713345
Cq-Include-Trybots: master.tryserver.chromium.perf:obbs_fyi
Change-Id: Ibc1954a26040e8899f9ff13fee0756ca0a9b1481
Reviewed-on: https://chromium-review.googlesource.com/1127480
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Cr-Commit-Position: refs/heads/master@{#572882}
[modify] https://crrev.com/b7e98079d9dc9a2230d62c587144097171bff499/testing/buildbot/chromium.perf.fyi.json
[modify] https://crrev.com/b7e98079d9dc9a2230d62c587144097171bff499/testing/buildbot/chromium.perf.json
[modify] https://crrev.com/b7e98079d9dc9a2230d62c587144097171bff499/tools/perf/core/perf_data_generator.py

Blockedon: 863768
Cc: crouleau@google.com
Status: Fixed (was: Assigned)
So I am not sure we are actively working on this goal anymore.  We were able to reduce to ~2-3 hours and our current goal is to at least maintain this.

Adding Caleb as an FYI of the goal and closing.
Cc: -nednguyen@chromium.org
Owner: crouleau@chromium.org
Status: Assigned (was: Fixed)
Since we aren't quite at 1 hour yet, I will take this bug and see if there is anything we can do with it. This bug is hopefully because we can link to it as a reason to increase hardware and keep test runs short.

Sign in to add a comment