New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 639975 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Oct 2016
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 550684



Sign in to add a comment

DM+swarming needs to correctly pin CIPD packages

Project Member Reported by iannucci@chromium.org, Aug 22 2016

Issue description

Scenario:

Create a quest Q which depends on CIPD package P at "latest".

An attempt A is created for this quest, which begins by sending execution E1 to swarming. Swarming then runs, and pins "latest" to some non-arbitrary value (say "deadbeef"). The task runs with "deadbeef" until E1 takes a dependency on something. At this point E1 terminates, and A goes into the blocked state. At some point later, A becomes unblocked, and DM sends E2 to swarming. E2 runs, and pins "latest" again, except that this time it pins it to "badc0ffee".

What this practically means is that multiple different versions of the various CIPD packages will be used during the life of a single Attempt, which is counter-intuitive, and could introduce difficult-to-debug problems (e.g. "deadbeef" and "badc0ffee" have different formats for their "state" that carries over from execution to execution).

To counteract this, the swarming distributor should remember the pinning resolutions that it makes and re-use them from execution to execution of the same attempt.

Plan:
  * swarming's client exposes these resolutions as part of the task metadata
  * dm+swarming distributor implementation reads this metadata as part of the GetStatus call and records it in the execution state when the execution finishes.
  * dm+swarming use the recorded pins when building new executions (e.g. if "PreviousState" includes cipd resolutions, use them).
 
Owner: iannucci@chromium.org
Status: Started (was: Untriaged)
Project Member

Comment 3 by bugdroid1@chromium.org, Aug 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/external/github.com/luci/luci-py.git/+/1200be8efa2bc9543d6404887b6746f58e0de827

commit 1200be8efa2bc9543d6404887b6746f58e0de827
Author: iannucci <iannucci@chromium.org>
Date: Tue Aug 30 22:52:22 2016

Add CIPD pin reporting to swarming.

This will cause run_isolated.py to report the fully resolved versions of the CIPD packages that it actually ended up using. DM will use this information to ensure that all executions of a given Attempt use the same versions.

R=maruel@chromium.org, nodir@chromium.org, vadimsh@chromium.org
BUG= 639975 

Review-Url: https://codereview.chromium.org/2267363004

[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/cipd.py
[add] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/cipd_test.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/handlers_bot.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/handlers_bot_test.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/handlers_frontend.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/message_conversion.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/server/task_request.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/server/task_result.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/server/task_result_test.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/server/task_scheduler.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/server/task_scheduler_test.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/swarming_bot/bot_code/task_runner.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/swarming_rpcs.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/appengine/swarming/templates/user_task.html
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/client/cipd.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/client/run_isolated.py
[modify] https://crrev.com/1200be8efa2bc9543d6404887b6746f58e0de827/client/tests/run_isolated_test.py

Project Member

Comment 5 by bugdroid1@chromium.org, Sep 20 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/external/github.com/luci/luci-go.git/+/fa6f67917f657b339926828ac352e26e4d4dfb1e

commit fa6f67917f657b339926828ac352e26e4d4dfb1e
Author: iannucci <iannucci@chromium.org>
Date: Tue Sep 20 01:40:14 2016

Refactor distributor API so that methods always get the Quest_Desc too.

This is so that various methods can see the original quest description body,
which can contain data directed at the distributor (or the distributor adaptor
in the case of swarming).

R=dnj@chromium.org, vadimsh@chromium.org
BUG= 639975 

Review-Url: https://codereview.chromium.org/2347973003

[modify] https://crrev.com/fa6f67917f657b339926828ac352e26e4d4dfb1e/dm/appengine/distributor/config.go
[modify] https://crrev.com/fa6f67917f657b339926828ac352e26e4d4dfb1e/dm/appengine/distributor/distributor.go
[modify] https://crrev.com/fa6f67917f657b339926828ac352e26e4d4dfb1e/dm/appengine/distributor/fake/fake.go
[modify] https://crrev.com/fa6f67917f657b339926828ac352e26e4d4dfb1e/dm/appengine/distributor/jobsim/distributor.go
[modify] https://crrev.com/fa6f67917f657b339926828ac352e26e4d4dfb1e/dm/appengine/distributor/notify_execution.go
[modify] https://crrev.com/fa6f67917f657b339926828ac352e26e4d4dfb1e/dm/appengine/distributor/swarming/v1/distributor.go
[modify] https://crrev.com/fa6f67917f657b339926828ac352e26e4d4dfb1e/dm/appengine/distributor/swarming/v1/isolate.go
[delete] https://crrev.com/3518c0e283b1f61e65913b651c9d87ec43c0fd93/dm/appengine/distributor/task_description.go
[modify] https://crrev.com/fa6f67917f657b339926828ac352e26e4d4dfb1e/dm/appengine/distributor/test_registry.go
[modify] https://crrev.com/fa6f67917f657b339926828ac352e26e4d4dfb1e/dm/appengine/mutate/schedule_execution.go
[modify] https://crrev.com/fa6f67917f657b339926828ac352e26e4d4dfb1e/dm/appengine/mutate/timeout_execution.go

Project Member

Comment 6 by bugdroid1@chromium.org, Sep 28 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/external/github.com/luci/luci-go.git/+/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c

commit d5fba5ee301b374e9c161fcfa8c4693b1a058e2c
Author: iannucci <iannucci@chromium.org>
Date: Tue Sep 27 22:55:07 2016

Add snapshotting for CIPD packages and dimensions to DM.

This makes DM automatically pin all cipd packages used by an attempt, and optionally allows the quest to specify dimensions to pin as well. This prevents confusing bugs where e.g. a certain version of kitchen is used for the first execution of an attempt, but a different version is used for a subsequent execution. This means things like the 'state' that DM passes between executions doesn't need to worry about being interpreted by different versions of the various cipd packages in the job.

Pinning dimensions could be useful for pinning instance ids (for affinity), but more trivially is useful for pinning os/cpu when pinning a VERY generic cipd spec (e.g. things with ${platform} in the spec). In this case you could potentially formulate a quest that could run on ANY platform and get consistent re-executions.

BUG= 639975 

Review-Url: https://codereview.chromium.org/2338153003

[add] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/cipd.go
[add] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/cipd.pb.go
[add] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/cipd.proto
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/config.pb.go
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/isolate_ref.pb.go
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/normalize.go
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/params.pb.go
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/params.proto
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/result.pb.go
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/result.proto
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/appengine/distributor/swarming/v1/distributor.go
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/tools/jobsim_client/generate_ensure_graph_data_req.py

Project Member

Comment 7 by bugdroid1@chromium.org, Sep 28 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/external/github.com/luci/luci-go.git/+/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c

commit d5fba5ee301b374e9c161fcfa8c4693b1a058e2c
Author: iannucci <iannucci@chromium.org>
Date: Tue Sep 27 22:55:07 2016

Add snapshotting for CIPD packages and dimensions to DM.

This makes DM automatically pin all cipd packages used by an attempt, and optionally allows the quest to specify dimensions to pin as well. This prevents confusing bugs where e.g. a certain version of kitchen is used for the first execution of an attempt, but a different version is used for a subsequent execution. This means things like the 'state' that DM passes between executions doesn't need to worry about being interpreted by different versions of the various cipd packages in the job.

Pinning dimensions could be useful for pinning instance ids (for affinity), but more trivially is useful for pinning os/cpu when pinning a VERY generic cipd spec (e.g. things with ${platform} in the spec). In this case you could potentially formulate a quest that could run on ANY platform and get consistent re-executions.

BUG= 639975 

Review-Url: https://codereview.chromium.org/2338153003

[add] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/cipd.go
[add] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/cipd.pb.go
[add] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/cipd.proto
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/config.pb.go
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/isolate_ref.pb.go
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/normalize.go
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/params.pb.go
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/params.proto
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/result.pb.go
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/api/distributor/swarming/v1/result.proto
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/appengine/distributor/swarming/v1/distributor.go
[modify] https://crrev.com/d5fba5ee301b374e9c161fcfa8c4693b1a058e2c/dm/tools/jobsim_client/generate_ensure_graph_data_req.py

Status: Fixed (was: Started)
This was fixed by that last patch btw.

Sign in to add a comment