De-complicate swarming CIPD support |
||||
Issue descriptionDoc: https://docs.google.com/document/d/1jDVvRR2UgfTLW3JFcdc74EsZXSIs34ZYFwrJLtQDxsw/edit# TL;DR: * Swarming will accept a list-of-strings which is a cipd ensurefile * UI will display ensure file * Swarming pipes this file as-is all the way to the cipd client on the bot * cipd client on the bot spits out a pinned version of the ensure file * UI displays the pinned ensure file This came up as part of the fix for the git rollout; currently swarming's cipd support is convoluted to the point where it's validation logic and cipd's actual validation logic don't match. In particular, different platforms need different versions of the same package; cipd supports this in the ensurefile (and folks working with cipd understand this format). However, swarming assumes that all cipd package patterns are unique identifiers (they're not), and that every package pattern that it asks cipd to install will actually create an installed package (it doesn't). This bug is about removing knowledge from swarming about how cipd works, and making it just a dumb transit. This way users can use the format they know (ensure file), and it will behave the way they expect (this file is fed verbatim into the cipd client). Any extensions to the ensure file format will thus only need to change the client (and not swarming).
,
Dec 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/9cfcdf6dc94830ff2674e5b079fa36e21e156c7f commit 9cfcdf6dc94830ff2674e5b079fa36e21e156c7f Author: Robert Iannucci <iannucci@chromium.org> Date: Wed Dec 13 20:10:42 2017 [cipd] Hook up ensure file ServiceURL and refactor makeCipdClient. The ensure file $ServiceURL was previously unconnected, and makeCipdClient invocation was awkward (would have become more awkward in a subsequent CL). R=vadimsh@chromium.org Bug: 794657 Change-Id: I3fce4eb43a4563f47300128be73f2bf164e6566c Reviewed-on: https://chromium-review.googlesource.com/823763 Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> [modify] https://crrev.com/9cfcdf6dc94830ff2674e5b079fa36e21e156c7f/cipd/client/cli/friendly.go [modify] https://crrev.com/9cfcdf6dc94830ff2674e5b079fa36e21e156c7f/cipd/client/cli/main.go
,
Dec 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/83bd7db390ce83a506b6585394e3daa31ffb9a9d commit 83bd7db390ce83a506b6585394e3daa31ffb9a9d Author: Robert Iannucci <iannucci@chromium.org> Date: Wed Dec 13 20:11:00 2017 [cipd] Add -ensure-file-output option to ensure command. Outputs a canonicalized version of the resolved ensure-file. This means that all subdirs, package patterns and versions have been resolved, and all extraneous information (like $VerifiedPlatform and comments) are stripped. Any package patterns which don't match the current platform are removed. R=vadimsh@chromium.org Bug: 794657 Change-Id: I74c2698f463f9435c7fed4f95cf0dda9bf570fa3 Reviewed-on: https://chromium-review.googlesource.com/823822 Commit-Queue: Robbie Iannucci <iannucci@chromium.org> Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> [modify] https://crrev.com/83bd7db390ce83a506b6585394e3daa31ffb9a9d/cipd/client/cipd/ensure/file.go [modify] https://crrev.com/83bd7db390ce83a506b6585394e3daa31ffb9a9d/cipd/client/cipd/ensure/file_test.go [modify] https://crrev.com/83bd7db390ce83a506b6585394e3daa31ffb9a9d/cipd/client/cipd/ensure/good_test.go [modify] https://crrev.com/83bd7db390ce83a506b6585394e3daa31ffb9a9d/cipd/client/cli/main.go
,
Dec 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/1166f0f8ee9545a678e118095d9dacab77761f3b commit 1166f0f8ee9545a678e118095d9dacab77761f3b Author: Robert Iannucci <iannucci@chromium.org> Date: Wed Dec 13 21:39:43 2017 [cipd] Remove magical <path> default value for flag. This causes the command line parser to assume it's a required flag. R=nodir@chromium.org, vadimsh@chromium.org Bug: 794657 Change-Id: I0d6ba99ea1b433ef0333c5dc1ece896bb13cf1a4 Reviewed-on: https://chromium-review.googlesource.com/825767 Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> Commit-Queue: Robbie Iannucci <iannucci@chromium.org> [modify] https://crrev.com/1166f0f8ee9545a678e118095d9dacab77761f3b/cipd/client/cli/main.go
,
Feb 27 2018
,
Jun 15 2018
Not working on this, but I'd like to pick it up again.
,
Oct 18
,
Oct 22
Realized I pretty much need to do this to deploy git. The semantic versions that we get by building git ourselves vs. the versions we get from git-for-windows are quite different, and there's not a good way to reconcile them (e.g. 2.19.1.chromium16 vs 2.19.1.windows.1). I could hack something there, but it won't last.
So, I'd like to finally fix this bug (i.e. so we can have packages specified in full CIPD ensure-file syntax like `package/${platform=windows-amd64} version` without having swarming interpret that/freak out. Currently it believes that this is an invalid package name). The alternative would be to make task template application conditional on the task request dimensions, but I think this would be way overkill, and would just pull us further away from a simple system. CIPD already has this per-platform-package-selection thing on lock, so I think the right thing to do is to change swarming to stop being judgy and just let CIPD do its thing :). The other benefit of this is that the Swarming API will require fewer mental gymnastics when using CIPD packages (copy and paste on the swarming UI would work, for example). This will also move us closer to fixing issue 897193.
So, I think my plan of attack is:
1. Remove task_request.CipdInput.client_package and task_result.CipdPins.client_package
* No one uses this.
* Since it's run with the 'system' service account, having the binary able to change at the task level seems like a misfeature at best.
2. Change internal ChromeInfraEvent event mon proto to have new cipd_ensure_file_input field.
* Is this necessary, or should I just stop populating the cipd_input field from swarming? I don't think anyone has used/uses this data, and
if we start logging to BQ soon, maybe it won't matter?
3. Add TaskProperties.cipd_ensure_file && TaskResult.resolved_cipd_ensure_file (any ideas on how to break this CL up?)
* Convert incoming cipd_input field into cipd_ensure_file (for new tasks)
* Send data to bot as cipd_ensure_file (instead of cipd_input)
* Have bot send resolved ensure-file back as resolved_cipd_ensure_file (instead of cipd_pins)
* Change UI to show Input/Resolved files instead of cipd_input&&cipd_pins
* Change task template proto to have chunks of ensure-file that get concatenated (replacing the current cipd_package stuff)
* Swarming will have a stateless ensure_file parser to:
* extract @Subdir statements (to match existing conflict-detection behavior with task templates)
* extract $ServiceURL statements (to match existing cipd_input.server default-setting behavior)
* Swarming will otherwise be oblivious to the file content
* On load, render task_request.TaskProperties.cipd_input into the ensure-file format transparently (for old tasks)
* On load, render task_result._TaskResultCommon.cipd_pins into the ensure-file format transparently (for old tasks)
I'm nervous about the potential size of the third CL there. Is there something I can do to break it up?
,
Oct 22
Actually, currently swarming will accept `package/${platform=windows-amd64}`, but it will freak out when the linux bots decide to (correctly) not deploy that package due to the way that cipd_input/cipd_pins are communicated to/from the bot.
,
Oct 23
Why not make the task template for specific pools? 2 doesn't matter for the short term. "Change task template proto to have chunks of ensure-file that get concatenated" will be tricky, as I feel it'll be very easy to create malformed ensure file. You can break the last CL by creating a @property which loads either the old or new one, but always returns the new format. You do this without even storing the new format yet. Once everything is migrated, including the APIs, then you can start storing the new format.
,
Oct 23
Making the task template for specific pools doesn't help; we have (intentionally) mixed platforms in the same pool. To make per-pool templates work, we'd need to segregate all pools by os and cpu, which defeats the point of dimensions I think. FWIW, the ensure-file format is pretty concatenatable (which is why I proposed it). I think in practice it will be fine (all the task template stuff for a given task lives in a single .cfg file, so its all local-context). Your idea for breaking up step 3 sgtm.
,
Nov 30
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/bb1c9c53e703124c350f82f250a1970b6fa0f92e commit bb1c9c53e703124c350f82f250a1970b6fa0f92e Author: Robert Iannucci <iannucci@chromium.org> Date: Fri Nov 30 02:04:23 2018 [swarming] Update details API to work with pool-provided isolate defaults. This also improves the testing for pools_config. Bug: 794657, 908361 Change-Id: I0eabda89b2900d86181ac2be6465d3cf9099114a Reviewed-on: https://chromium-review.googlesource.com/c/1351362 Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> Commit-Queue: Robbie Iannucci <iannucci@chromium.org> [modify] https://crrev.com/bb1c9c53e703124c350f82f250a1970b6fa0f92e/appengine/swarming/handlers_bot_test.py [modify] https://crrev.com/bb1c9c53e703124c350f82f250a1970b6fa0f92e/appengine/swarming/handlers_endpoints.py [modify] https://crrev.com/bb1c9c53e703124c350f82f250a1970b6fa0f92e/appengine/swarming/handlers_endpoints_test.py [modify] https://crrev.com/bb1c9c53e703124c350f82f250a1970b6fa0f92e/appengine/swarming/server/pools_config.py [modify] https://crrev.com/bb1c9c53e703124c350f82f250a1970b6fa0f92e/appengine/swarming/server/pools_config_test.py [modify] https://crrev.com/bb1c9c53e703124c350f82f250a1970b6fa0f92e/appengine/swarming/server/task_request.py [modify] https://crrev.com/bb1c9c53e703124c350f82f250a1970b6fa0f92e/appengine/swarming/test_env_handlers.py |
||||
►
Sign in to add a comment |
||||
Comment 1 by iannucci@chromium.org
, Dec 13 2017