New issue
Advanced search Search tips

Issue 766387 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: 2
NextAction: ----
OS: Chrome
Pri: 0
Type: Feature

Blocking:
issue 766386


Show other hotlists

Hotlists containing this issue:
CrOSParallelCQ


Sign in to add a comment

Remove ChromeOS IP Whitelisting for Swarming and Isolate

Project Member Reported by dgarr...@chromium.org, Sep 18 2017

Issue description

ChromeOS currently uses these two instances of swarming:

https://chrome-swarming.appspot.com/
https://chromeos-proxy.appspot.com/

And isolate as referenced here:

https://chromereviews.googleplex.com/458637013/

We would like to use service accounts and remove the whitelists, so that we can stop using static IP addresses for our builders.

Passing to vadimsh@ to advice on the best path and timeline forward.
 
Components: -Infra>Client>ChromeOS Infra>Client>ChromeOS>Build
Components: -Infra>Client>ChromeOS>Build Infra>Client>ChromeOS>CI
Labels: -Type-Bug Type-Feature
Status: Assigned (was: Untriaged)
vadimsh, I think everything we would need is available now?
Labels: -Pri-3 CrOSParallelCQ OS-Chrome Pri-0
Vadim, friendly ping apropos the email thread. Anything preventing us from doing this?
Status: Available (was: Assigned)
This is big topic.

I think we need to scope this down first a bit. I see there are GCE bots running builds (swarm-cros-*), there are Skylab bots, and there are chromeos proxy bots. (I couldn't find Golo Swarming bots that run builds, are they gone?)

Each flavor of the bot needs different approach, since they run in different environment and do different things.

I assume we primarily care about swarm-cros-* for now. 

They are actually already have bot credentials deployed on them (by being based on Chrome Infra's GCE image and using our Puppet). E.g. here: https://chrome-swarming.appspot.com/bot?id=swarm-cros-0&sort_stats=total%3Adesc Notice: https://screenshot.googleplex.com/SUECQgd4OZ2.png

We can start enforcing these by setting 'require_luci_machine_token: true' in bots.cfg https://chrome-internal.googlesource.com/infradata/config/+/321a4c2fe170faaf34e256e51834555f35216a07/configs/chrome-swarming/bots.cfg#107. This would allow to stop relying on IP whitelist for Bot <-> Swarming authentication.

But this is not enough, since I suspect there are various scripts inside ChromeOS build that just call 'isolate.py' or 'swarming.py' without passing any credentials. These calls are also authenticated via IP whitelist currently and will break if bots move to a non-whitelisted IP.

How can we discover all such calls? Can we setup an experimental non-IP whitelisted bot that runs typical builds and see where it breaks?

Once we know what needs to be fixed, we can come up with a strategy for fixing it. Most likely it will look like:
1. Setup a Swarming task account for all ChromeOS Buildbucket tasks (easy config change).
2. Setup a Swarming task account for ChromeOS tasks triggered directly via swarming.py (are there any?). Amounts to passing a flag to swarming.py with account email.
3. Make sure all scripts called by the build pick up the account. If they are inside chroot, we'll need to propagate a portion of Swarming task environment to the chroot:
  a. LUCI_CONTEXT env var points to a file with parameters of how to grab a token. This file should be copied to chroot and processes inside chroot should have LUCI_CONTEXT set pointing to it.
  b. If we want to start using Swarming task accounts for git and gsutil, there are similar steps to propagate their configs to chroot.

Lastly, many of this steps require knowledge of how ChromeOS builds work. I'm a bad person to deal with this. I can help with LUCI parts, but we should find someone from ChromeOS side to drive this.
Cc: vadimsh@chromium.org
Owner: ----
Owner: mikenichols@chromium.org
Status: Assigned (was: Available)
> I think we need to scope this down first a bit. I see there are GCE bots running builds (swarm-cros-*), there are Skylab bots, and there are chromeos proxy bots. (I couldn't find Golo Swarming bots that run builds, are they gone?)

Let's focus only on "GCE bots running builds (swarm-cros-*)" for this bug. Golo Swarming bots are completely gone, AFAICT.

> I assume we primarily care about swarm-cros-* for now. 
> 
> They are actually already have bot credentials deployed on them (by being based on Chrome Infra's GCE image and using our Puppet). E.g. here: https://chrome-swarming.appspot.com/bot?id=swarm-cros-0&sort_stats=total%3Adesc Notice: https://screenshot.googleplex.com/SUECQgd4OZ2.png
> 
> We can start enforcing these by setting 'require_luci_machine_token: true' in bots.cfg https://chrome-internal.googlesource.com/infradata/config/+/321a4c2fe170faaf34e256e51834555f35216a07/configs/chrome-swarming/bots.cfg#107. This would allow to stop relying on IP whitelist for Bot <-> Swarming authentication.
> But this is not enough, since I suspect there are various scripts inside ChromeOS build that just call 'isolate.py' or 'swarming.py' without passing any credentials. These calls are also authenticated via IP whitelist currently and will break if bots move to a non-whitelisted IP.
> 
> How can we discover all such calls? Can we setup an experimental non-IP whitelisted bot that runs typical builds and see where it breaks?

Actually, we're fairly certain that there's just one script, CBuildBot, that is calling Buildbucket right now. We should be able to change that to use creds and test it.

> 1. Setup a Swarming task account for all ChromeOS Buildbucket tasks (easy config change).

What's the recommended security domain for a Swarming task account? Business use-case (e.g. chromeos-cq@, chromeos-release@)? Or higher level (e.g. chromeos-continuous-integration@)

> 2. Setup a Swarming task account for ChromeOS tasks triggered directly via swarming.py (are there any?). Amounts to passing a flag to swarming.py with account email.

I don't think we are ever calling swarming.py directly.

> 3. Make sure all scripts called by the build pick up the account. If they are inside chroot, we'll need to propagate a portion of Swarming task environment to the chroot:

AFAICT, we never call out to Buildbucket or Swarming from inside the chroot.

> Lastly, many of this steps require knowledge of how ChromeOS builds work. I'm a bad person to deal with this. I can help with LUCI parts, but we should find someone from ChromeOS side to drive this.

Agreed. You've given us the information that we need except for the one minor question above. Once we have that, I think that we can own this.

> What's the recommended security domain for a Swarming task account?

I think "business use-case" is preferred, since it will be easier to improve later.

Note that while all ChromeOS bots are in a single swarming pool (per pools.cfg definition, e.g. they are all pool:ChromeOS), there's really no security boundaries between different task accounts that hit that pool. A malicious task can "spoil" a bot and eventually observe tokens from all accounts that hit the pool.

So from security point of view using chromeos-cq@, chromeos-release@, ... vs using single chromeos-continuous-integration@ inside a single pool is approximately same thing.

But having separate accounts simplifies splitting the pool into security domains in the future (e.g. cq bots, release bots, ...).
Project Member

Comment 9 by sheriffbot@chromium.org, Nov 13

Pri-0 bugs are critical regressions or serious emergencies, and this bug has not been updated in three days. Could you please provide an update, or adjust the priority to a more appropriate level if applicable?

If a fix is in active development, please set the status to Started.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
That shouldn't be pinging features. I'll set disable-nags on all of these.
Labels: Disable-Nags
EstimatedDays: 2
Decision is to proceed with a single account for now, to simplify the task and validation, with the intent of moving to separate accounts in the future.  

-- Mike

Sign in to add a comment