New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 719789 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Jun 2017
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

CancelObsoleteSlaveBuilds needs to only kill master's slaves.

Project Member Reported by dgarr...@chromium.org, May 9 2017

Issue description

CancelObsoleteSlaveBuilds kills all slave builds of the same build type on the same branch.

Given two Android PFQ masters on the same branch, both PFQs will kill the slaves for the other each time they start.

This will be... counterproductive.

We need to do a better job of restricting which builds are killed.

Two options:

1) Find the buildbucket_id of the previous master build, then kill slave builds associated with it.

2) Lookup the slave config names for the current master, and kill in-progress builds of all of those slave config builds.
 
Passing to Ningning to come up with a good solution. I'll land a CL to disable this functionality for the Android PFQ, for now.

Comment 2 by aut...@google.com, May 9 2017

Labels: -current-issue

Comment 3 by nxia@chromium.org, May 10 2017

1) has potential issue:

master_1 got canceled, its slaves kept running; master_2 started and failed before it reached the CleanUpStage to abort the slaves of master_1; master_3 started and tried to abort slaves of master_2, but actually the running slaves were from master_1.

2) Are the slave config names of the two masters different?
1) True.
2) Today slave names are unique between masters. Post-swarming this might not be true. In particular, trybots may be running with the same config names.
This problem will also affect the release waterfall, since all of the builders there have the same build type.
Project Member

Comment 6 by bugdroid1@chromium.org, May 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/52acedd403d15fdc71c90ee1a80bbb8f71245438

commit 52acedd403d15fdc71c90ee1a80bbb8f71245438
Author: Don Garrett <dgarrett@google.com>
Date: Fri May 12 01:11:56 2017

build_stages: Disable slave killing for android pfq.

Our logic to kill old slaves when the master starts up will break when
we have two Android PFQs running in parallel. Disable it until we have
a clean solution in place.

BUG= chromium:719789 
TEST=run_tests

Change-Id: If53419ebefdc302a18d8b5acf11140bb41e5fd1b
Reviewed-on: https://chromium-review.googlesource.com/499630
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Ningning Xia <nxia@chromium.org>
Commit-Queue: Don Garrett <dgarrett@chromium.org>

[modify] https://crrev.com/52acedd403d15fdc71c90ee1a80bbb8f71245438/cbuildbot/stages/build_stages.py

Comment 7 by nxia@chromium.org, Jun 8 2017

Status: Started (was: Untriaged)
Sweet!

How soon will we trust this enough to re-enable the auto-builder cleanup for Android PFQ?

Comment 9 by nxia@chromium.org, Jun 19 2017

As long as the current android pfq master builds have passed the CleanUpStage, we can enable the cleanup for android PFQ masters

Comment 10 by nxia@chromium.org, Jun 21 2017

Owner: dgarr...@chromium.org
The fix has been merged at https://chromium-review.googlesource.com/c/527271/. You can revert your CL at #6.
Project Member

Comment 11 by bugdroid1@chromium.org, Jun 24 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/d31b035c95f971529d9b36d0fddc8d334965ca13

commit d31b035c95f971529d9b36d0fddc8d334965ca13
Author: Don Garrett <dgarrett@chromium.org>
Date: Sat Jun 24 05:56:49 2017

Revert "build_stages: Disable slave killing for android pfq."

This reverts commit 52acedd403d15fdc71c90ee1a80bbb8f71245438.

Ningning fixed the bug this was working around, so we don't need the work around any more.

BUG= chromium:719789 

Change-Id: Ib8dc9321b71381ba7dda1b65812223b2399bd67a
Reviewed-on: https://chromium-review.googlesource.com/542943
Commit-Ready: Don Garrett <dgarrett@chromium.org>
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Ningning Xia <nxia@chromium.org>

[modify] https://crrev.com/d31b035c95f971529d9b36d0fddc8d334965ca13/cbuildbot/stages/build_stages.py

Owner: nxia@chromium.org
Status: Fixed (was: Started)

Comment 13 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment