New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 642093 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Buildbucket builds time out after 24 hours

Project Member Reported by dtu@chromium.org, Aug 29 2016

Issue description

Sometimes the perf sheriff will triage a lot of bugs at the same time, and kick off a lot of perf bisect jobs, temporarily causing long waiting times. Especially if they're long-running benchmarks, they may wait hours before they start to run. But buildbucket kills the build after 24 hours.

I think this timeout is specified in the code here?
https://chromium.googlesource.com/infra/infra/+/master/appengine/cr-buildbucket/model.py#16

Here's a sample build that was killed with a timeout.
https://cr-buildbucket.appspot.com/_ah/api/buildbucket/v1/builds/9003524776114588688

The documentation also needs to be updated, because it says "cancelation_reason": "TIMEOUT" is not supported. Also, the timestamps are in microseconds, not milliseconds.


Annie's numbers in this doc imply that this is happening for <= ~20% of bisect jobs.
https://docs.google.com/document/d/15MHxbrr-qpPSCjeqQjEcUBa2AxcXgoXaYuVQ2dGhc1s/edit

We are also working to decrease the runtime of bisects, but this may take a little longer.
https://github.com/catapult-project/catapult/issues/1811
 

Comment 1 by dtu@chromium.org, Aug 29 2016

Cc: dtu@chromium.org

Comment 2 by dtu@chromium.org, Aug 29 2016

Cc: no...@chromium.org
Cc: benhenry@chromium.org vadimsh@chromium.org
Components: Infra>Client>Perf
Owner: vadimsh@chromium.org
Status: Assigned (was: Untriaged)
I'll take a look.
You've correctly identified BUILD_TIMEOUT as responsible for this. It is global constant. The correct solution would be to make each build has its own timeout, but it's a non-trivial change and I don't really know the buildbucket code that well. So I'll just bump BUILD_TIMEOUT to be 36 hours. 

Would 36 hours be enough?

Comment 6 by dtu@chromium.org, Aug 29 2016

I think 48 might be safer?
Ok.
Status: Fixed (was: Assigned)
It is now 2 days (deployed the change just now).

https://chromium.googlesource.com/infra/infra/+/87eb792374dd4e4aaf2c175cd12b4d195c3dfa64

(not sure why Bugdroid ignored the CL...)

Sign in to add a comment