New issue
Advanced search Search tips

Issue 794657 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue 731547
issue 749709



Sign in to add a comment

De-complicate swarming CIPD support

Project Member Reported by iannucci@chromium.org, Dec 13 2017

Issue description

Doc: https://docs.google.com/document/d/1jDVvRR2UgfTLW3JFcdc74EsZXSIs34ZYFwrJLtQDxsw/edit#

TL;DR:
  * Swarming will accept a list-of-strings which is a cipd ensurefile
  * UI will display ensure file
  * Swarming pipes this file as-is all the way to the cipd client on the bot
  * cipd client on the bot spits out a pinned version of the ensure file
  * UI displays the pinned ensure file

This came up as part of the fix for the git rollout; currently swarming's cipd support is convoluted to the point where it's validation logic and cipd's actual validation logic don't match. In particular, different platforms need different versions of the same package; cipd supports this in the ensurefile (and folks working with cipd understand this format). However, swarming assumes that all cipd package patterns are unique identifiers (they're not), and that every package pattern that it asks cipd to install will actually create an installed package (it doesn't).

This bug is about removing knowledge from swarming about how cipd works, and making it just a dumb transit. This way users can use the format they know (ensure file), and it will behave the way they expect (this file is fed verbatim into the cipd client). Any extensions to the ensure file format will thus only need to change the client (and not swarming).
 
Cc: aga...@chromium.org
+agable for fyi
Project Member

Comment 2 by bugdroid1@chromium.org, Dec 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-go.git/+/9cfcdf6dc94830ff2674e5b079fa36e21e156c7f

commit 9cfcdf6dc94830ff2674e5b079fa36e21e156c7f
Author: Robert Iannucci <iannucci@chromium.org>
Date: Wed Dec 13 20:10:42 2017

[cipd] Hook up ensure file ServiceURL and refactor makeCipdClient.

The ensure file $ServiceURL was previously unconnected, and
makeCipdClient invocation was awkward (would have become more awkward
in a subsequent CL).

R=vadimsh@chromium.org

Bug: 794657
Change-Id: I3fce4eb43a4563f47300128be73f2bf164e6566c
Reviewed-on: https://chromium-review.googlesource.com/823763
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/9cfcdf6dc94830ff2674e5b079fa36e21e156c7f/cipd/client/cli/friendly.go
[modify] https://crrev.com/9cfcdf6dc94830ff2674e5b079fa36e21e156c7f/cipd/client/cli/main.go

Project Member

Comment 3 by bugdroid1@chromium.org, Dec 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-go.git/+/83bd7db390ce83a506b6585394e3daa31ffb9a9d

commit 83bd7db390ce83a506b6585394e3daa31ffb9a9d
Author: Robert Iannucci <iannucci@chromium.org>
Date: Wed Dec 13 20:11:00 2017

[cipd] Add -ensure-file-output option to ensure command.

Outputs a canonicalized version of the resolved ensure-file. This means
that all subdirs, package patterns and versions have been resolved, and
all extraneous information (like $VerifiedPlatform and comments) are
stripped. Any package patterns which don't match the current platform
are removed.

R=vadimsh@chromium.org

Bug: 794657
Change-Id: I74c2698f463f9435c7fed4f95cf0dda9bf570fa3
Reviewed-on: https://chromium-review.googlesource.com/823822
Commit-Queue: Robbie Iannucci <iannucci@chromium.org>
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/83bd7db390ce83a506b6585394e3daa31ffb9a9d/cipd/client/cipd/ensure/file.go
[modify] https://crrev.com/83bd7db390ce83a506b6585394e3daa31ffb9a9d/cipd/client/cipd/ensure/file_test.go
[modify] https://crrev.com/83bd7db390ce83a506b6585394e3daa31ffb9a9d/cipd/client/cipd/ensure/good_test.go
[modify] https://crrev.com/83bd7db390ce83a506b6585394e3daa31ffb9a9d/cipd/client/cli/main.go

Project Member

Comment 4 by bugdroid1@chromium.org, Dec 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-go.git/+/1166f0f8ee9545a678e118095d9dacab77761f3b

commit 1166f0f8ee9545a678e118095d9dacab77761f3b
Author: Robert Iannucci <iannucci@chromium.org>
Date: Wed Dec 13 21:39:43 2017

[cipd] Remove magical <path> default value for flag.

This causes the command line parser to assume it's a required flag.

R=nodir@chromium.org, vadimsh@chromium.org

Bug: 794657
Change-Id: I0d6ba99ea1b433ef0333c5dc1ece896bb13cf1a4
Reviewed-on: https://chromium-review.googlesource.com/825767
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>
Commit-Queue: Robbie Iannucci <iannucci@chromium.org>

[modify] https://crrev.com/1166f0f8ee9545a678e118095d9dacab77761f3b/cipd/client/cli/main.go

Blocking: 731547
Status: Assigned (was: Started)
Not working on this, but I'd like to pick it up again.
Owner: iannu...@google.com
Realized I pretty much need to do this to deploy git. The semantic versions that we get by building git ourselves vs. the versions we get from git-for-windows are quite different, and there's not a good way to reconcile them (e.g. 2.19.1.chromium16 vs 2.19.1.windows.1). I could hack something there, but it won't last.

So, I'd like to finally fix this bug (i.e. so we can have packages specified in full CIPD ensure-file syntax like `package/${platform=windows-amd64} version` without having swarming interpret that/freak out. Currently it believes that this is an invalid package name). The alternative would be to make task template application conditional on the task request dimensions, but I think this would be way overkill, and would just pull us further away from a simple system. CIPD already has this per-platform-package-selection thing on lock, so I think the right thing to do is to change swarming to stop being judgy and just let CIPD do its thing :). The other benefit of this is that the Swarming API will require fewer mental gymnastics when using CIPD packages (copy and paste on the swarming UI would work, for example). This will also move us closer to fixing issue 897193.

So, I think my plan of attack is:
  1. Remove task_request.CipdInput.client_package and  task_result.CipdPins.client_package
    * No one uses this.
    * Since it's run with the 'system' service account, having the binary able to change at the task level seems like a misfeature at best.

  2. Change internal ChromeInfraEvent event mon proto to have new cipd_ensure_file_input field.
    * Is this necessary, or should I just stop populating the cipd_input field from swarming? I don't think anyone has used/uses this data, and
      if we start logging to BQ soon, maybe it won't matter? 

  3. Add TaskProperties.cipd_ensure_file && TaskResult.resolved_cipd_ensure_file (any ideas on how to break this CL up?)
    * Convert incoming cipd_input field into cipd_ensure_file (for new tasks)
    * Send data to bot as cipd_ensure_file (instead of cipd_input)
    * Have bot send resolved ensure-file back as resolved_cipd_ensure_file (instead of cipd_pins)
    * Change UI to show Input/Resolved files instead of cipd_input&&cipd_pins
    * Change task template proto to have chunks of ensure-file that get concatenated (replacing the current cipd_package stuff)
    * Swarming will have a stateless ensure_file parser to:
      * extract @Subdir statements (to match existing conflict-detection behavior with task templates)
      * extract $ServiceURL statements (to match existing cipd_input.server default-setting behavior)
      * Swarming will otherwise be oblivious to the file content
    * On load, render task_request.TaskProperties.cipd_input into the ensure-file format transparently (for old tasks)
    * On load, render task_result._TaskResultCommon.cipd_pins into the ensure-file format transparently (for old tasks)

I'm nervous about the potential size of the third CL there. Is there something I can do to break it up?
Actually, currently swarming will accept `package/${platform=windows-amd64}`, but it will freak out when the linux bots decide to (correctly) not deploy that package due to the way that cipd_input/cipd_pins are communicated to/from the bot.
Why not make the task template for specific pools?

2 doesn't matter for the short term.

"Change task template proto to have chunks of ensure-file that get concatenated" will be tricky, as I feel it'll be very easy to create malformed ensure file.

You can break the last CL by creating a @property which loads either the old or new one, but always returns the new format. You do this without even storing the new format yet. Once everything is migrated, including the APIs, then you can start storing the new format.
Making the task template for specific pools doesn't help; we have (intentionally) mixed platforms in the same pool. To make per-pool templates work, we'd need to segregate all pools by os and cpu, which defeats the point of dimensions I think.

FWIW, the ensure-file format is pretty concatenatable (which is why I proposed it). I think in practice it will be fine (all the task template stuff for a given task lives in a single .cfg file, so its all local-context).

Your idea for breaking up step 3 sgtm.

Sign in to add a comment