New issue
Advanced search Search tips

Issue 901936 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Task

Blocking:
issue 914807
issue 916198
issue 923548



Sign in to add a comment

Use GCE VM signed metadata for Swarming bot authentication

Project Member Reported by vadimsh@chromium.org, Nov 5

Issue description

(Internal) design doc: http://go/swarming-gce-vm-auth
 
Project Member

Comment 1 by bugdroid1@chromium.org, Nov 5

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/77a08a0e81f9950884cff5ca46de9b7cfe9f8503

commit 77a08a0e81f9950884cff5ca46de9b7cfe9f8503
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Mon Nov 05 22:13:39 2018

[auth] A function to verify JWTs produced by Google auth backends.

It is easy enough to be just implemented (rather than pulled in via some third
party library that will need periodic maintenance to be up-to-date).

Usage:
  certs = signature.get_google_oauth2_certs()
  header, payload = tokens.verify_jwt(jwt, certs)

R=fmatenaar@chromium.org
BUG=901936

Change-Id: I7d30322e2358a85d038c562173d8e0508758bd4c
Reviewed-on: https://chromium-review.googlesource.com/c/1318135
Reviewed-by: Felix Matenaar <fmatenaar@chromium.org>
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/77a08a0e81f9950884cff5ca46de9b7cfe9f8503/appengine/components/components/auth/signature.py
[modify] https://crrev.com/77a08a0e81f9950884cff5ca46de9b7cfe9f8503/appengine/components/components/auth/tokens.py
[modify] https://crrev.com/77a08a0e81f9950884cff5ca46de9b7cfe9f8503/appengine/components/components/auth/tokens_test.py

Project Member

Comment 2 by bugdroid1@chromium.org, Nov 6

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/1868a1be32a6e7905c23ac1368b0a31545a03f70

commit 1868a1be32a6e7905c23ac1368b0a31545a03f70
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Tue Nov 06 20:27:13 2018

[swarming] Add gce.signed_metadata_token(...) call.

It asks GCE metadata server to produce a signed metadata JWT with given
audience. The token is then cached in the memory until its expiration time.

Will be used from get_authentication_headers() hook.

R=maruel@chromium.org
BUG=901936

Change-Id: I22cadfb1d321b32145c8f02cb863cadaf2b1bc88
Reviewed-on: https://chromium-review.googlesource.com/c/1318142
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/1868a1be32a6e7905c23ac1368b0a31545a03f70/appengine/swarming/swarming_bot/api/platforms/gce.py
[modify] https://crrev.com/1868a1be32a6e7905c23ac1368b0a31545a03f70/appengine/swarming/swarming_bot/api/platforms/gce_test.py

Project Member

Comment 3 by bugdroid1@chromium.org, Nov 9

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/a283f1aa4dec97d9b162b1bc55e572018e2cbd6a

commit a283f1aa4dec97d9b162b1bc55e572018e2cbd6a
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Fri Nov 09 01:50:55 2018

[swarming] Preparation for making 'auth' in BotGroup proto repeated.

This CL refactors API between bot_groups_config and bot_auth modules. The next
step would be to make the proto itself repeated too.

One complication is that when using multiple auth methods, we don't want to
log errors for ones that fail, if others succeed (since we'll end up with
constant stream of errors in this case, its expected situation). So error
logging is now delayed until all auth methods are checked.

R=maruel@chromium.org
BUG=901936

Change-Id: I6da2ce1c5249ac9572b053c62b3a5d76aaed5790
Reviewed-on: https://chromium-review.googlesource.com/c/1322129
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/a283f1aa4dec97d9b162b1bc55e572018e2cbd6a/appengine/swarming/handlers_bot_test.py
[modify] https://crrev.com/a283f1aa4dec97d9b162b1bc55e572018e2cbd6a/appengine/swarming/server/bot_auth.py
[modify] https://crrev.com/a283f1aa4dec97d9b162b1bc55e572018e2cbd6a/appengine/swarming/server/bot_groups_config.py
[modify] https://crrev.com/a283f1aa4dec97d9b162b1bc55e572018e2cbd6a/appengine/swarming/server/bot_groups_config_test.py

Project Member

Comment 5 by bugdroid1@chromium.org, Nov 12

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/ed7b7ee29ce5da8953a313d6570b5be7db941e2a

commit ed7b7ee29ce5da8953a313d6570b5be7db941e2a
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Mon Nov 12 22:06:29 2018

[swarming] Add 'swarming/bot_auth/success' metric.

It is a number of successful bot authentication events per auth method.

R=maruel@chromium.org
BUG=901936

Change-Id: I002f138fdd2cb4b73d99bba251e0fa1708bdf918
Reviewed-on: https://chromium-review.googlesource.com/c/1330303
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/ed7b7ee29ce5da8953a313d6570b5be7db941e2a/appengine/swarming/server/bot_auth.py
[modify] https://crrev.com/ed7b7ee29ce5da8953a313d6570b5be7db941e2a/appengine/swarming/ts_mon_metrics.py

Project Member

Comment 6 by bugdroid1@chromium.org, Nov 16

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/9a720faf564128218b5ecb1135aee4c61c5a7bdd

commit 9a720faf564128218b5ecb1135aee4c61c5a7bdd
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Fri Nov 16 19:11:22 2018

[auth] Put additional info extracted from credentials into the auth state.

This generalizes 'is_superuser' handling to also include other stuff which can
be extracted from credentials during authentication.

In particular, AuthDetails will carry GCE project name extracted from the signed
VM metadata token.

R=maruel@chromium.org
BUG=901936

Change-Id: I4034f8c23ef7a4f688460817eef11044a6d65ff0
Reviewed-on: https://chromium-review.googlesource.com/c/1332792
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/9a720faf564128218b5ecb1135aee4c61c5a7bdd/appengine/components/components/auth/api.py
[modify] https://crrev.com/9a720faf564128218b5ecb1135aee4c61c5a7bdd/appengine/components/components/auth/api_test.py
[modify] https://crrev.com/9a720faf564128218b5ecb1135aee4c61c5a7bdd/appengine/components/components/auth/check.py
[modify] https://crrev.com/9a720faf564128218b5ecb1135aee4c61c5a7bdd/appengine/components/components/auth/endpoints_support.py
[modify] https://crrev.com/9a720faf564128218b5ecb1135aee4c61c5a7bdd/appengine/components/components/auth/endpoints_support_test.py
[modify] https://crrev.com/9a720faf564128218b5ecb1135aee4c61c5a7bdd/appengine/components/components/auth/handler.py
[modify] https://crrev.com/9a720faf564128218b5ecb1135aee4c61c5a7bdd/appengine/components/components/auth/handler_test.py
[modify] https://crrev.com/9a720faf564128218b5ecb1135aee4c61c5a7bdd/appengine/components/components/auth/machine_auth.py
[modify] https://crrev.com/9a720faf564128218b5ecb1135aee4c61c5a7bdd/appengine/components/components/auth/prpc.py
[modify] https://crrev.com/9a720faf564128218b5ecb1135aee4c61c5a7bdd/appengine/components/components/auth/prpc_test.py

Project Member

Comment 7 by bugdroid1@chromium.org, Nov 20

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/22d9c4408044589d22b77cec31aa45cabe1076a9

commit 22d9c4408044589d22b77cec31aa45cabe1076a9
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Tue Nov 20 20:03:38 2018

[auth] Add an authentication method that understands GCE VM tokens.

This is not wired to anything yet.

GCE VMs are authenticated as 'bot:<instance-name>@gce.<project>[.<realm>]',
e.g. 'bot:swarm1-c4@gce.chromecompute.google.com'. This string will show up
in logs and in UI.

But users of 'gce_vm_authentication' are encouraged to use get_auth_details()'s
'gce_instance' and 'gce_project' fields for authorization checks instead of
parsing 'bot:...' string.

For the example above their values would be gce_instance='swarm1-c4' and
gce_project='google.com:chromecompute'.

R=maruel@chromium.org
BUG=901936

Change-Id: Idce13fe6b4b1f0aaa5ae8ea33eea110445d8a9a8
Reviewed-on: https://chromium-review.googlesource.com/c/1340547
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/22d9c4408044589d22b77cec31aa45cabe1076a9/appengine/components/components/auth/__init__.py
[modify] https://crrev.com/22d9c4408044589d22b77cec31aa45cabe1076a9/appengine/components/components/auth/api.py
[add] https://crrev.com/22d9c4408044589d22b77cec31aa45cabe1076a9/appengine/components/components/auth/gce_vm_auth.py
[add] https://crrev.com/22d9c4408044589d22b77cec31aa45cabe1076a9/appengine/components/components/auth/gce_vm_auth_test.py

Project Member

Comment 8 by bugdroid1@chromium.org, Dec 7

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/66fee4ed9a8ba8a28b042270b512781d8f32a9d6

commit 66fee4ed9a8ba8a28b042270b512781d8f32a9d6
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Fri Dec 07 22:53:14 2018

[swarming] Add require_gce_vm_token field to BotAuth proto.

Allows to define GCE authenticated bots as
auth {
  require_gce_vm_token: { project: "gce-project-name" }
}

This also regenerates all _pb2.py with protoc v3.6.1 as a side effect.

R=maruel@chromium.org
BUG=901936

Change-Id: I6c97154d17c8b52611ce57a56e20d1acc804b103
Reviewed-on: https://chromium-review.googlesource.com/c/1368628
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/66fee4ed9a8ba8a28b042270b512781d8f32a9d6/appengine/swarming/handlers_bot_test.py
[modify] https://crrev.com/66fee4ed9a8ba8a28b042270b512781d8f32a9d6/appengine/swarming/proto/bots.proto
[modify] https://crrev.com/66fee4ed9a8ba8a28b042270b512781d8f32a9d6/appengine/swarming/proto/bots_pb2.py
[modify] https://crrev.com/66fee4ed9a8ba8a28b042270b512781d8f32a9d6/appengine/swarming/proto/config_pb2.py
[modify] https://crrev.com/66fee4ed9a8ba8a28b042270b512781d8f32a9d6/appengine/swarming/proto/plugin_pb2.py
[modify] https://crrev.com/66fee4ed9a8ba8a28b042270b512781d8f32a9d6/appengine/swarming/proto/plugin_prpc_pb2.py
[modify] https://crrev.com/66fee4ed9a8ba8a28b042270b512781d8f32a9d6/appengine/swarming/proto/pools_pb2.py
[modify] https://crrev.com/66fee4ed9a8ba8a28b042270b512781d8f32a9d6/appengine/swarming/proto/tasks_pb2.py
[modify] https://crrev.com/66fee4ed9a8ba8a28b042270b512781d8f32a9d6/appengine/swarming/server/bot_auth.py
[modify] https://crrev.com/66fee4ed9a8ba8a28b042270b512781d8f32a9d6/appengine/swarming/server/bot_groups_config.py
[modify] https://crrev.com/66fee4ed9a8ba8a28b042270b512781d8f32a9d6/appengine/swarming/server/bot_groups_config_test.py

Project Member

Comment 9 by bugdroid1@chromium.org, Dec 13

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/0ae3864c38fd8408fd4ab18981cef2cedcc85aeb

commit 0ae3864c38fd8408fd4ab18981cef2cedcc85aeb
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Thu Dec 13 02:28:02 2018

Project Member

Comment 10 by bugdroid1@chromium.org, Dec 13

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/f62fb59c6d5e6963c5f5946322e973e672d4c138

commit f62fb59c6d5e6963c5f5946322e973e672d4c138
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Thu Dec 13 19:39:50 2018

[swarming] Start accepting X-Luci-Gce-Vm-Token headers.

They are optional for now in a sense that if such header is present, but it
cannot be parsed or verified, it is just skipped.

During the migration bots will keep sending both X-Luci-Gce-Vm-Token and
X-Luci-Machine-Token headers. If X-Luci-Gce-Vm-Token breaks for some reason,
authentication will fallback to X-Luci-Machine-Token.

BUG=901936
R=maruel@chromium.org

Change-Id: I818f138c7f34b3bf05bf57368edaeeb8a0d55b1d
Reviewed-on: https://chromium-review.googlesource.com/c/1375015
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/f62fb59c6d5e6963c5f5946322e973e672d4c138/appengine/components/components/auth/api.py
[modify] https://crrev.com/f62fb59c6d5e6963c5f5946322e973e672d4c138/appengine/swarming/handlers_bot.py
[modify] https://crrev.com/f62fb59c6d5e6963c5f5946322e973e672d4c138/appengine/swarming/server/bot_auth.py
[modify] https://crrev.com/f62fb59c6d5e6963c5f5946322e973e672d4c138/appengine/swarming/server/bot_auth_test.py

Blocking: 914807
Project Member

Comment 12 by bugdroid1@chromium.org, Dec 14

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/57e2295517bca4c526fde69ee3d5eb0ef360a4ca

commit 57e2295517bca4c526fde69ee3d5eb0ef360a4ca
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Fri Dec 14 01:04:46 2018

Project Member

Comment 13 by bugdroid1@chromium.org, Dec 14

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/310ca8d818b35dcf23450a678171f7b377e2828b

commit 310ca8d818b35dcf23450a678171f7b377e2828b
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Fri Dec 14 01:54:24 2018

The rollout process (currently being done on chromium-swarm-dev) is a bit clanky:
1. Modify bots.cfg first (adding require_gce_vm_token for ALL GCE bots that currently use require_luci_machine_token, but also keeping  require_luci_machine_token).
2. Land the change, wait until it applies everywhere.
3. Modify bot_config.py to start sending GCE VM tokens if the bot runs on GCE.
4. Land the change, confirm everything still works and bots use GCE VM auth for real (via viceroy console).
5. Wait a month or so... (in case something breaks horribly).
5. Remove require_luci_machine_token{} fallbacks from bots.cfg.
6. Remove reading of luci token files from bot_config.py, remove luci_machine_tokend cron and other related stuff.
Project Member

Comment 15 by bugdroid1@chromium.org, Dec 14

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/b4daaf6fbad3cf5f00be9de5a8202367e1aceda0

commit b4daaf6fbad3cf5f00be9de5a8202367e1aceda0
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Fri Dec 14 02:11:24 2018

Project Member

Comment 16 by bugdroid1@chromium.org, Dec 14

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/aa06df8a5252671b4dad8c076b7c88fcd227926e

commit aa06df8a5252671b4dad8c076b7c88fcd227926e
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Fri Dec 14 02:17:44 2018

Project Member

Comment 17 by bugdroid1@chromium.org, Dec 14

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/1aa083e32d59ecbc33e4d161acc37509b54dd93a

commit 1aa083e32d59ecbc33e4d161acc37509b54dd93a
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Fri Dec 14 02:25:50 2018

Fully deployed to -dev, works fine. Even works on Windows GCE and on dockerized GCE bots (so looks like metadata server is available from inside docker containers).

Boring graph: http://shortn/_vwFvKDW4xj (2 QPS with luci tokens come from non-GCE bots).
Next: step fully deploy this to prod (perhaps in stages) before the production freeze. Keep the fallback in place. Wait until mid January. Remove the fallback.
I didn't finish this in time before the production freeze. So the deployment to prod will have to wait until Jan 3 2019.
Blocking: 916198

Comment 22 by jbudorick@chromium.org, Today (11 hours ago)

Blocking: 923548

Comment 23 by jbudorick@chromium.org, Today (11 hours ago)

Did this roll out post-freeze?

Comment 24 by vadimsh@chromium.org, Today (11 hours ago)

Project Member

Comment 25 by bugdroid1@chromium.org, Today (11 hours ago)

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/15b6c3273e07d1d6167fab3b3932acf084e0034f

commit 15b6c3273e07d1d6167fab3b3932acf084e0034f
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Tue Jan 22 20:40:53 2019

Project Member

Comment 26 by bugdroid, Today (7 hours ago)

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/5ccc656e05bb84444454fdf19617e23f32e979ad

commit 5ccc656e05bb84444454fdf19617e23f32e979ad
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Wed Jan 23 00:27:01 2019

Project Member

Comment 27 by bugdroid, Today (6 hours ago)

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/8cf9666d85590741f7872447a32dc3ff6810782f

commit 8cf9666d85590741f7872447a32dc3ff6810782f
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Wed Jan 23 01:22:01 2019

Comment 28 by vadimsh@chromium.org, Today (6 hours ago)

Deployed to chrome-swarming. Will attempt to deploy to chromium-swarm tomorrow. It is going to be brutal: bots.cfg is 11350 lines of code :(
Project Member

Comment 29 by bugdroid, Today (6 hours ago)

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/8cf9666d85590741f7872447a32dc3ff6810782f

commit 8cf9666d85590741f7872447a32dc3ff6810782f
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Wed Jan 23 01:22:01 2019

Project Member

Comment 30 by bugdroid, Today (5 hours ago)

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/8cf9666d85590741f7872447a32dc3ff6810782f

commit 8cf9666d85590741f7872447a32dc3ff6810782f
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Wed Jan 23 01:22:01 2019

Sign in to add a comment