Buildbucket builds time out after 24 hours |
|||||
Issue descriptionSometimes the perf sheriff will triage a lot of bugs at the same time, and kick off a lot of perf bisect jobs, temporarily causing long waiting times. Especially if they're long-running benchmarks, they may wait hours before they start to run. But buildbucket kills the build after 24 hours. I think this timeout is specified in the code here? https://chromium.googlesource.com/infra/infra/+/master/appengine/cr-buildbucket/model.py#16 Here's a sample build that was killed with a timeout. https://cr-buildbucket.appspot.com/_ah/api/buildbucket/v1/builds/9003524776114588688 The documentation also needs to be updated, because it says "cancelation_reason": "TIMEOUT" is not supported. Also, the timestamps are in microseconds, not milliseconds. Annie's numbers in this doc imply that this is happening for <= ~20% of bisect jobs. https://docs.google.com/document/d/15MHxbrr-qpPSCjeqQjEcUBa2AxcXgoXaYuVQ2dGhc1s/edit We are also working to decrease the runtime of bisects, but this may take a little longer. https://github.com/catapult-project/catapult/issues/1811
,
Aug 29 2016
,
Aug 29 2016
,
Aug 29 2016
I'll take a look.
,
Aug 29 2016
You've correctly identified BUILD_TIMEOUT as responsible for this. It is global constant. The correct solution would be to make each build has its own timeout, but it's a non-trivial change and I don't really know the buildbucket code that well. So I'll just bump BUILD_TIMEOUT to be 36 hours. Would 36 hours be enough?
,
Aug 29 2016
I think 48 might be safer?
,
Aug 29 2016
Ok.
,
Aug 29 2016
It is now 2 days (deployed the change just now). https://chromium.googlesource.com/infra/infra/+/87eb792374dd4e4aaf2c175cd12b4d195c3dfa64 (not sure why Bugdroid ignored the CL...) |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by dtu@chromium.org
, Aug 29 2016