New issue
Advanced search Search tips

Issue 851152 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 15
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 847602
issue 851183
issue 880550

Blocking:
issue 895498



Sign in to add a comment

Implement ChromeOS side of build affiinity on LUCI swarming

Project Member Reported by dgarr...@chromium.org, Jun 8 2018

Issue description

I chatted with maruel@ about build affinity.

The final solution is this:

A) He will add "optional" dimensions to the buildbucket request API interface that are translated into fallback tasks (the internal swarming implementation).

B) We will update our build request logic on master builders to fetch the name of the bot used for the previous build of the slave and request bot name as an optional dimension. This will result in the same bot being preferred but not required.

C) We will create a pool of builders dedicated to affinity slaves. Since no non-affinity builders will be able to use this pool, these builders should always sit idle except when needed for an affinity build.

 
Summary: Implement ChromeOS side of build affiinity on LUCI swarming (was: Convert Incremental builders to master/slave)
We have two groups of builders which require affinity:

1) Incremental buidlers.
2) CQ.

Since step two will be implemented in the master builder for the CQ (it's the one scheduling the slaves, it makes sense to reuse that logic for incremental builders.

Therefore, I'll convert our incremental builders to a master/slave group.
Owner: dgarr...@chromium.org
Blockedon: 851183
Blockedon: 847602
Components: -Infra>Client>ChromeOS>Test Infra>Client>ChromeOS>CI
Status: Assigned (was: Untriaged)
Blockedon: 880550
Labels: LUCI-ChromeInternal LUCI-Blocker-ChromeInternal
Owner: la...@chromium.org
Owner: athilenius@chromium.org
Here is an example of what the new build request should look like, in hacked up format:

https://chromium-review.googlesource.com/c/chromiumos/chromite/+/1263435
Owner: dgarr...@chromium.org
Status: Started (was: Assigned)
I'm impatient, starting now.
Ha ha ha. That's fine, I'm on-call all week, so not much happening right now. I also picked up crbug.com/825241 as a starter in favor of this one.
Project Member

Comment 13 by bugdroid1@chromium.org, Oct 10

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/45b0b4d4a09ad28619e5c0c0742d3a8d40bdb682

commit 45b0b4d4a09ad28619e5c0c0742d3a8d40bdb682
Author: Don Garrett <dgarrett@google.com>
Date: Wed Oct 10 22:59:10 2018

Project Member

Comment 14 by bugdroid1@chromium.org, Oct 10

Labels: merge-merged-config
The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/manifest-internal/+/ddfad7487e046a2ec39020da8de77e78fcd7b7ae

commit ddfad7487e046a2ec39020da8de77e78fcd7b7ae
Author: Don Garrett <dgarrett@google.com>
Date: Wed Oct 10 23:08:27 2018

Project Member

Comment 15 by bugdroid1@chromium.org, Oct 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/ca5f91f78dea9b5199a838c0e4becceec1c7270a

commit ca5f91f78dea9b5199a838c0e4becceec1c7270a
Author: Don Garrett <dgarrett@google.com>
Date: Fri Oct 12 22:18:26 2018

request_build: Add support for bot requests.

Add the ability to request a specific bot for a build with fallback to
the LUCI Builder bot requirements. This is a required feature for
build afinity.

BUG= chromium:851152 
TEST=run_tests

Change-Id: I939674a61a893b84c5317567d1fa2177403775b9
Reviewed-on: https://chromium-review.googlesource.com/1263435
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Alec Thilenius <athilenius@google.com>

[modify] https://crrev.com/ca5f91f78dea9b5199a838c0e4becceec1c7270a/lib/request_build.py
[modify] https://crrev.com/ca5f91f78dea9b5199a838c0e4becceec1c7270a/lib/request_build_unittest.py

Project Member

Comment 16 by bugdroid1@chromium.org, Oct 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/f873587814e0e310e3625cb30ac996101dcb71a1

commit f873587814e0e310e3625cb30ac996101dcb71a1
Author: Don Garrett <dgarrett@google.com>
Date: Fri Oct 12 22:18:27 2018

config_lib: Add build_affinity config option.

The "build_affinity" option will tell master builders to attempt to
make a best effort attempt to run this build config on the same bot
each time.

This is only useful for slave builds that run on swarming.

Also, add the LUCI_BUILDER_INCREMENTAL and LUCI_BUILDER_CQ constants,
and a unittest to ensure all builds with affinity are running on them.

BUG= chromium:851152 
TEST=chromeos_config_unittest

Change-Id: Iedd97fac8ec5cfc196165c59d46a96d3b9122ce5
Reviewed-on: https://chromium-review.googlesource.com/1273970
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Alec Thilenius <athilenius@google.com>
Reviewed-by: Mike Nichols <mikenichols@chromium.org>

[modify] https://crrev.com/f873587814e0e310e3625cb30ac996101dcb71a1/lib/config_lib.py
[modify] https://crrev.com/f873587814e0e310e3625cb30ac996101dcb71a1/config/config_dump.json
[modify] https://crrev.com/f873587814e0e310e3625cb30ac996101dcb71a1/config/chromeos_config_unittest.py

Project Member

Comment 17 by bugdroid1@chromium.org, Oct 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/3a211357f21546dda3362a71fef889049d368a69

commit 3a211357f21546dda3362a71fef889049d368a69
Author: Don Garrett <dgarrett@google.com>
Date: Fri Oct 12 22:18:27 2018

buildbucket_lib: Add bot_id helpers.

Add helper methods for extracting result_details_json from a build
result, and for extracting the bot id from a build result.

These are needed to be able to look up the previous bot id for build
affinity purposes.

BUG= chromium:851152 
TEST=None (yet)

Change-Id: I8beb36d98d41224c63b38f389aa9de999759007e
Reviewed-on: https://chromium-review.googlesource.com/1273971
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Alec Thilenius <athilenius@google.com>

[modify] https://crrev.com/3a211357f21546dda3362a71fef889049d368a69/lib/buildbucket_lib.py
[modify] https://crrev.com/3a211357f21546dda3362a71fef889049d368a69/lib/buildbucket_lib_unittest.py

Project Member

Comment 18 by bugdroid1@chromium.org, Oct 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/b345f4161503600ec30744b13c10ff48eb06ed1a

commit b345f4161503600ec30744b13c10ff48eb06ed1a
Author: Don Garrett <dgarrett@google.com>
Date: Fri Oct 12 22:18:27 2018

ScheduleSlavesStage: Implement build affinity for slave builds.

Implement logic to support build affinity for slave builds, if they
are running on swarming, and they slave config requests it.

BUG= chromium:851152 
TEST=None

Change-Id: I604f4746335593b3ef397cd4ca4be42299b02757
Reviewed-on: https://chromium-review.googlesource.com/1273972
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Alec Thilenius <athilenius@google.com>

[modify] https://crrev.com/b345f4161503600ec30744b13c10ff48eb06ed1a/cbuildbot/stages/scheduler_stages.py

Believed fixed, verifying agaist the incremental builders now.
Blocking: 895498
This has been deployed for the chromeos incremental builders, and appears to be working.

I'm currently shutting down a builder, to ensure that the build properly migrates to a different machine in the pool (which has a single spare). Afterwards, I'll bring the downed machine back, to ensure the build doesn't migrate back unexpectedly.
Cc: mikenichols@chromium.org athilenius@chromium.org
Cool! Random side questions: I assume swarming has a heterogeneous builder pool (so these will be running on n1-highmem-32 equivalents)?
Labels: -Pri-3 Pri-1
Status: Fixed (was: Started)
Yep.

Testing shows that this is working correctly, EXCEPT:

If a build is aborted, there is no results_details_json value in it's buildbucket entry. Given the current implementation, that means that the previous builder is not detected, and affinity is lost.

From a builder point of view, this might be fine since we wipe chroot contents after an aborted build, but it wasn't anticipated behavior.

I'm going to call this acceptable.
Note, aborted builds are common in the CQ, because the master will abort some slaves early.
Project Member

Comment 26 by bugdroid1@chromium.org, Oct 16

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/9eb69bf314cc66618efee341bea5b011e4b8557f

commit 9eb69bf314cc66618efee341bea5b011e4b8557f
Author: Don Garrett <dgarrett@google.com>
Date: Tue Oct 16 00:12:48 2018

Project Member

Comment 27 by bugdroid1@chromium.org, Oct 18

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/c686be21aa381844d886e71d8c8724ffd9a6f59e

commit c686be21aa381844d886e71d8c8724ffd9a6f59e
Author: Don Garrett <dgarrett@google.com>
Date: Thu Oct 18 16:11:27 2018

Sign in to add a comment