New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 724281 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Master failures should cause self-destruction

Project Member Reported by davidri...@chromium.org, May 18 2017

Issue description

The following master build was doomed but self-destruction didn't kick in because it didn't self-inspect:
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/14638

RegenPortageCache failed earlier on, and the master waited for the entire set of slaves to succeed before failing instead of recognizing inevitable doom. 

From https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F14638%2F%2B%2Frecipes%2Fsteps%2FCommitQueueCompletion%2F0%2Fstdout
@@@STEP_TEXT@master-paladin: The RegenPortageCache stage failed: <class 'chromite.lib.cros_build_lib.RunCommandError'>: return code: 1; command: cros_sdk -- egencache --update --repo chromiumos --jobs 8
cmd=['cros_sdk', '--', 'egencache', '--update', '--repo', 'chr@@@
14:17:02: WARNING: The following builders failed with this manifest:
master-paladin
Please check the logs of the failing builders for details.

The run also ran into https://bugs.chromium.org/p/chromium/issues/detail?id=724269 so I'm not 100% sure if it was going to try and land changes or not because master-paladin had failed.

(I'm not 100% this is the desired behaviour, but worth discussing).
 

Comment 1 by nxia@chromium.org, May 19 2017

Cc: -nxia@chromium.org
Owner: nxia@chromium.org
mysql> select * from failureTable where build_stage_id=45163424;
+---------+----------------+------------------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+------------+---------------------+
| id      | build_stage_id | outer_failure_id | exception_type    | exception_message                                                                                                                                                                                                                                | exception_category | extra_info | timestamp           |
+---------+----------------+------------------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+------------+---------------------+
| 2021102 |       45163424 |             NULL | BackgroundFailure | <class 'chromite.lib.cros_build_lib.RunCommandError'>: return code: 1; command: cros_sdk -- egencache --update --repo chromiumos --jobs 8
cmd=['cros_sdk', '--', 'egencache', '--update', '--repo', 'chromiumos', '--jobs', '8'], cwd=/b/cbuild/ | unknown            | NULL       | 2017-05-18 18:54:08 |
| 2021103 |       45163424 |          2021102 | RunCommandError   | return code: 1; command: cros_sdk -- egencache --update --repo chromiumos --jobs 8
cmd=['cros_sdk', '--', 'egencache', '--update', '--repo', 'chromiumos', '--jobs', '8'], cwd=/b/cbuild/repository/src/third_party/chromiumos-overlay           | unknown            | NULL       | 2017-05-18 18:54:08 |



the errors on RegenPortageCache are an UNKNOWN errors, so all changes should be rejected in theory. I think it makes sense to stop this CQ run earlier and reject all changes. 

Comment 2 by nxia@chromium.org, Jun 2 2017

Cc: pho...@chromium.org nxia@chromium.org davidjames@chromium.org dgarr...@chromium.org
 Issue 728401  has been merged into this issue.
Another instance: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/15133

Addendum:
In this case, the bad CL was: https://chrome-internal-review.googlesource.com/c/397209/
It was detected, so the other CLs will be retried.

But, how about detecting irrelevant changes when master fails?
We don't build packages on master (no board), so we can't use the same logic to figure out what is relevant....
Cc: -davidjames@chromium.org
Components: -Infra>Client>ChromeOS Infra>Client>ChromeOS>CI
Owner: ----
Status: Available (was: Untriaged)
Looks like a good starter bug.

Comment 7 by nxia@chromium.org, Jun 8 2018

Cc: -nxia@chromium.org

Sign in to add a comment