New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 908521 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 850113



Sign in to add a comment

Builds that time out should set an appropriate failure reason.

Project Member Reported by erikc...@chromium.org, Nov 26

Issue description

Here are two examples of builds that timed out after 3 hours:

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/134701
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/134609
"Results: Internal Failure"

In both builds, it's unclear from looking at the build results page and the build bucket results why the build failed:

https://apis-explorer.appspot.com/apis-explorer/?base=https://cr-buildbucket.appspot.com/_ah/api#p/buildbucket/v1/buildbucket.get?id=8929296488882468544&_h=4&

To contrast, a build that fails due to a test sets a flag in the buildbucket 'result_details_json' property. FAILURE_TYPE: 'TEST_FAILURE'
https://apis-explorer.appspot.com/apis-explorer/?base=https://cr-buildbucket.appspot.com/_ah/api#p/buildbucket/v1/buildbucket.get?id=8929222739657332880&_h=1&

And the corresponding build results page:
https://ci.chromium.org/b/8929222739657332880

Also sets an explanation "Failure unit_tests (with patch) Failure unit_tests (retry summary) Failure unit_tests (retry with patch) Failure unit_tests (retry with patch summary)".

My proposal is that all failing builds should set a failure_reason in result_details_json. For time outs, it should be 'TIMED_OUT'. 

+jbudorick -- WDYT?
+stgao -- We'll probably want to make sure that this information gets plumbed through Find-It & Flakiness detector.
 
Cc: no...@chromium.org
Components: Infra>Platform>Milo Infra>Platform>Buildbucket
I definitely agree that this isn't currently clear. I think this should be handled either in buildbucket or in milo; not sure which, so adding both components. I think result_details_json is a buildbucket thing, so +nodir can probably comment on whether your specific proposal is reasonable.
jbudorick: My first thought was that we would just update the chromium_test recipe to have a handler for SIG_TERM to populate the relevant properties, which will propagate the relevant information to buildbucket & milo UI.

Could you clarify why this isn't a good approach?
Blockedon: 850113
Cc: iannucci@chromium.org
result_details_json is a legacy protocol. The new protocol is go/build-proto and there is a place for this kind of info: Build.infra_failure_reason field. We can add a timeout field to InfraFailureReason.

Then milo can render it appropriately. This also requires fixing 850113

--

I am not sure it is possible to setup a signal handler in recipes. +Robbie

Sign in to add a comment