Implement ChromeOS side of build affiinity on LUCI swarming |
||||||||||||||
Issue descriptionI chatted with maruel@ about build affinity. The final solution is this: A) He will add "optional" dimensions to the buildbucket request API interface that are translated into fallback tasks (the internal swarming implementation). B) We will update our build request logic on master builders to fetch the name of the bot used for the previous build of the slave and request bot name as an optional dimension. This will result in the same bot being preferred but not required. C) We will create a pool of builders dedicated to affinity slaves. Since no non-affinity builders will be able to use this pool, these builders should always sit idle except when needed for an affinity build.
,
Jun 8 2018
,
Jun 8 2018
,
Jun 9 2018
,
Jul 25
,
Sep 4
,
Sep 12
,
Oct 3
,
Oct 4
,
Oct 9
Here is an example of what the new build request should look like, in hacked up format: https://chromium-review.googlesource.com/c/chromiumos/chromite/+/1263435
,
Oct 10
I'm impatient, starting now.
,
Oct 10
Ha ha ha. That's fine, I'm on-call all week, so not much happening right now. I also picked up crbug.com/825241 as a starter in favor of this one.
,
Oct 10
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/45b0b4d4a09ad28619e5c0c0742d3a8d40bdb682 commit 45b0b4d4a09ad28619e5c0c0742d3a8d40bdb682 Author: Don Garrett <dgarrett@google.com> Date: Wed Oct 10 22:59:10 2018
,
Oct 10
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/manifest-internal/+/ddfad7487e046a2ec39020da8de77e78fcd7b7ae commit ddfad7487e046a2ec39020da8de77e78fcd7b7ae Author: Don Garrett <dgarrett@google.com> Date: Wed Oct 10 23:08:27 2018
,
Oct 12
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/ca5f91f78dea9b5199a838c0e4becceec1c7270a commit ca5f91f78dea9b5199a838c0e4becceec1c7270a Author: Don Garrett <dgarrett@google.com> Date: Fri Oct 12 22:18:26 2018 request_build: Add support for bot requests. Add the ability to request a specific bot for a build with fallback to the LUCI Builder bot requirements. This is a required feature for build afinity. BUG= chromium:851152 TEST=run_tests Change-Id: I939674a61a893b84c5317567d1fa2177403775b9 Reviewed-on: https://chromium-review.googlesource.com/1263435 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Don Garrett <dgarrett@chromium.org> Reviewed-by: Alec Thilenius <athilenius@google.com> [modify] https://crrev.com/ca5f91f78dea9b5199a838c0e4becceec1c7270a/lib/request_build.py [modify] https://crrev.com/ca5f91f78dea9b5199a838c0e4becceec1c7270a/lib/request_build_unittest.py
,
Oct 12
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/f873587814e0e310e3625cb30ac996101dcb71a1 commit f873587814e0e310e3625cb30ac996101dcb71a1 Author: Don Garrett <dgarrett@google.com> Date: Fri Oct 12 22:18:27 2018 config_lib: Add build_affinity config option. The "build_affinity" option will tell master builders to attempt to make a best effort attempt to run this build config on the same bot each time. This is only useful for slave builds that run on swarming. Also, add the LUCI_BUILDER_INCREMENTAL and LUCI_BUILDER_CQ constants, and a unittest to ensure all builds with affinity are running on them. BUG= chromium:851152 TEST=chromeos_config_unittest Change-Id: Iedd97fac8ec5cfc196165c59d46a96d3b9122ce5 Reviewed-on: https://chromium-review.googlesource.com/1273970 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Don Garrett <dgarrett@chromium.org> Reviewed-by: Alec Thilenius <athilenius@google.com> Reviewed-by: Mike Nichols <mikenichols@chromium.org> [modify] https://crrev.com/f873587814e0e310e3625cb30ac996101dcb71a1/lib/config_lib.py [modify] https://crrev.com/f873587814e0e310e3625cb30ac996101dcb71a1/config/config_dump.json [modify] https://crrev.com/f873587814e0e310e3625cb30ac996101dcb71a1/config/chromeos_config_unittest.py
,
Oct 12
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/3a211357f21546dda3362a71fef889049d368a69 commit 3a211357f21546dda3362a71fef889049d368a69 Author: Don Garrett <dgarrett@google.com> Date: Fri Oct 12 22:18:27 2018 buildbucket_lib: Add bot_id helpers. Add helper methods for extracting result_details_json from a build result, and for extracting the bot id from a build result. These are needed to be able to look up the previous bot id for build affinity purposes. BUG= chromium:851152 TEST=None (yet) Change-Id: I8beb36d98d41224c63b38f389aa9de999759007e Reviewed-on: https://chromium-review.googlesource.com/1273971 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Don Garrett <dgarrett@chromium.org> Reviewed-by: Alec Thilenius <athilenius@google.com> [modify] https://crrev.com/3a211357f21546dda3362a71fef889049d368a69/lib/buildbucket_lib.py [modify] https://crrev.com/3a211357f21546dda3362a71fef889049d368a69/lib/buildbucket_lib_unittest.py
,
Oct 12
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/b345f4161503600ec30744b13c10ff48eb06ed1a commit b345f4161503600ec30744b13c10ff48eb06ed1a Author: Don Garrett <dgarrett@google.com> Date: Fri Oct 12 22:18:27 2018 ScheduleSlavesStage: Implement build affinity for slave builds. Implement logic to support build affinity for slave builds, if they are running on swarming, and they slave config requests it. BUG= chromium:851152 TEST=None Change-Id: I604f4746335593b3ef397cd4ca4be42299b02757 Reviewed-on: https://chromium-review.googlesource.com/1273972 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Don Garrett <dgarrett@chromium.org> Reviewed-by: Alec Thilenius <athilenius@google.com> [modify] https://crrev.com/b345f4161503600ec30744b13c10ff48eb06ed1a/cbuildbot/stages/scheduler_stages.py
,
Oct 15
Believed fixed, verifying agaist the incremental builders now.
,
Oct 15
,
Oct 15
This has been deployed for the chromeos incremental builders, and appears to be working. I'm currently shutting down a builder, to ensure that the build properly migrates to a different machine in the pool (which has a single spare). Afterwards, I'll bring the downed machine back, to ensure the build doesn't migrate back unexpectedly.
,
Oct 15
,
Oct 15
Cool! Random side questions: I assume swarming has a heterogeneous builder pool (so these will be running on n1-highmem-32 equivalents)?
,
Oct 15
Yep. Testing shows that this is working correctly, EXCEPT: If a build is aborted, there is no results_details_json value in it's buildbucket entry. Given the current implementation, that means that the previous builder is not detected, and affinity is lost. From a builder point of view, this might be fine since we wipe chroot contents after an aborted build, but it wasn't anticipated behavior. I'm going to call this acceptable.
,
Oct 15
Note, aborted builds are common in the CQ, because the master will abort some slaves early.
,
Oct 16
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/9eb69bf314cc66618efee341bea5b011e4b8557f commit 9eb69bf314cc66618efee341bea5b011e4b8557f Author: Don Garrett <dgarrett@google.com> Date: Tue Oct 16 00:12:48 2018
,
Oct 18
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/c686be21aa381844d886e71d8c8724ffd9a6f59e commit c686be21aa381844d886e71d8c8724ffd9a6f59e Author: Don Garrett <dgarrett@google.com> Date: Thu Oct 18 16:11:27 2018 |
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by dgarr...@chromium.org
, Jun 8 2018