New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 797572 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner:
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug



Sign in to add a comment

Commit queue failing almost every run with "aborted by self-destruction" from paladins

Project Member Reported by derat@chromium.org, Dec 25 2017

Issue description

The Chrome OS commit queue has been failing almost every run since build #17269: https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17269

There were a few failures just before this, but I think they were a different problem. Since then, nearly all of the child builders die with "aborted by self-destruction".

I don't see any meaningful errors on the child builders. For example, at https://luci-milo.appspot.com/buildbot/chromeos/cyan-paladin/4833, the "HWTest [provision]" stage passed, but "HWTest [bvt-arc]" failed after 30 minutes. The job's logs are extremely short and just end with this:

12-22-2017 [08:41:23] Created suite job: http://cautotest-prod.corp.google.com/afe/#tab_id=view_job&object_id=164561876
--create_and_return was specified, terminating now.
Will return from run_suite with status: OK
08:41:23: INFO: Re-run swarming_cmd to avoid buildbot salency check.
08:41:23: INFO: RunCommand: /b/c/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpZMqkiO/tmp2c_S5h/temp_summary.json --raw-cmd --task-name cyan-paladin/R65-10239.0.0-rc2-bvt-arc --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 9000 --io-timeout 9000 --hard-timeout 9000 --expiration 1200 '--tags=priority:CQ' '--tags=suite:bvt-arc' '--tags=build:cyan-paladin/R65-10239.0.0-rc2' '--tags=task_name:cyan-paladin/R65-10239.0.0-rc2-bvt-arc' '--tags=board:cyan' -- /usr/local/autotest/site_utils/run_suite.py --build cyan-paladin/R65-10239.0.0-rc2 --board cyan --suite_name suite_attr_wrapper --pool cq --file_bugs False --priority CQ --timeout_mins 90 --retry True --max_retries 5 --minimum_duts 4 --offload_failures_only False --suite_args "{'attr_filter': u'(suite:bvt-arc) and (subsystem:default)'}" --job_keyvals "{'cidb_build_stage_id': 65928655L, 'cidb_build_id': 2153542, 'datastore_parent_key': ('Build', 2153542, 'BuildStage', 65928655L)}" --test_args "{'fast': 'True'}" -m 164561876

The job is at http://cautotest-prod.corp.google.com/afe/#tab_id=view_job&object_id=164561876, but it shows a status of "1 Completed", and I don't see any obvious problems there either.

The CQ has been passing occasionally since these failures started, e.g. https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17301, but when I look at those runs, there's usually still a bunch of "aborted by self-destruction" messages from the paladin builders. It's not clear to my why these runs have different outcomes from the others.
 
Status: WontFix (was: Assigned)
From https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F17302%2F%2B%2Frecipes%2Fsteps%2FCommitQueueCompletion%2F0%2Fstdout I see:

"""
23:31:02: INFO: Processing relevant changes of build whirlwind-paladin status COMPLETED result FAILURE
23:31:02: INFO: Processing relevant changes of build sentry-paladin status COMPLETED result FAILURE
23:31:02: INFO: Build sentry-paladin failed with not ignorable failures, will not submit changes: CL:843596 CL:843597 CL:843654
23:31:02: INFO: will_submit set contains 0 changes: []
might_submit set contains 0 changes: []
will_not_submit set contains 3 changes: [CL:843596 CL:843597 CL:843654]
23:31:02: WARNING: This build will self-destruct given the results of relevant change triages.
"""

It looks like the master has gotten a failure that can't be ignored from at least one slave, and thus determines that the CLs under test will not be submitted. The master then stops waiting for the other unfinished slaves and return early since there's no point wasting time running the rest tests, thus the "aborted by self-destruction" on the other slaves.

As for the green CQ run with several "aborted by self-destruction", e.g. https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17301, it's similar except that the master believes the CLs under test can be submitted without waiting the rest tests to finish:

"""
21:34:00: INFO: Moving CL:841905 to will_submit set, because their relevant builds completed successfully or all failures are ignorable or passed in CQ history.
21:34:00: INFO: will_submit set contains 1 changes: [CL:841905]
might_submit set contains 0 changes: []
will_not_submit set contains 0 changes: []
21:34:00: WARNING: This build will self-destruct given the results of relevant change triages.
21:34:00: INFO: This build will self-destruct with success.
"""

Closing this bug as WAI.

Comment 2 by derat@chromium.org, Dec 25 2017

Mergedinto: 797314
Status: Duplicate (was: WontFix)
Sorry, I wasn't filing this to track a supposed issue with how the CQ processes results, but rather to figure out why almost all CQ runs are failing. It looks like you disabled a test at issue 797314 that will hopefully fix this, though -- thanks!

Sign in to add a comment