CrOS in Docker on Swarming |
|||
Issue descriptionFeature tracking bug for building ChromeOS images inside Docker on Swarming. ChromeOS infra is exploring the possibility of running builds (including binary CrOS image generation) inside Docker containers as part of CI on LUCI Swarming. This has a lot of advantages to us and reduces our overall infra overhead. For this to be a tenable goal, Swarming bots would need to support caching/GCing (potentially large) Docker images on local disk. Currently collected requirements are as: - The Docker daemon should be used for all Docker cache maintenance, trying to handle caching via /var/lib/docker files would either not work or be a mess. - A client library for Docker should be used for this - Only one large image would need to be cached at any one point in time (around 30GB) - This image would be specific to CrOS builds (needs bot/build affinity) - This image would be a :latest on gcr.io, updates there should result in: - A `docker pull` from gcr.io (Docker does this for you by default) - A flush of the old image from cache (this can be lazy; it can be part of the Swarming GC) - Several other ephemeral images could be created / push during a build that need to be flushed reliably - Containers need to be cleaned up (100% of them can be nuked between runs, as long as the image cache persists) - Several Docker commands should be supported from Recipe including (but not limited to): - `docker build` from within a workspace with a Dockerfile - `docker run` with the privileged flag and volume mounting like `-v /dev:/dev` (used for kernel EXT4 image mounting from within the container) - `docker commit` for committing a container to an image - `docker push` for pushing images to gcr.io Experiments done (currently ACL restricted, ping athilenius for access, sorry random project was used for this): - CrOS has been fully built inside a Docker container (including image gen) - cros_sdk image (the large one that needs to be cached): gcr.io/the-em-drive/doccros_cros_sdk - fully_build_image (this image would only be pushed to enable local debugging on workstations in the event of a failure): gcr.io/the-em-drive/doccros_build_image - These steps were replicated in isolation on Cloud Build: - depo_tools checkout -> fully built cros_sdk: https://pantheon.corp.google.com/cloud-build/builds/228cc47d-2023-4054-aa5e-6befe45b222c?project=the-em-drive&organizationId=433637338589 - setup_boad -> build_image: https://pantheon.corp.google.com/cloud-build/builds/a4850b34-62a7-4d71-a214-3e7d8898da0e?project=the-em-drive&organizationId=433637338589 Other refs: crbug.com/808836 - Swarming bot: Design containment API crbug.com/764493 - Use cgroups in Swarming task
,
Oct 22
lannm@/jclinton@ ... is this a priority for us based on our re-prioritizations a couples weeks back? If not, please bump to p3 for now.
,
Oct 23
Note that not all swarming bots have Docker installed. This is controlled via puppet and issue 803675 was used to track deploying Docker to all Linux bots. Yet, progress on that issue has somewhat stalled and atm only a handful of bots actually have docker. If you need to install docker to more bots, you may want to add chrome_infra::packages::docker rule to the puppet config here: http://shortn/_JhDav37yEh.
,
Oct 23
This would be a performance optimization & infra cleanup, so not a high priority. (Although IMO "Want" accurately describes it's priority, but that's not my call :p) #3 Just Docker without Swarming cache management of images/containers would technically be sufficient if we had 100% affinity, which as of today I think we do (we could run GC at the start of every build like we do today). Probably if this proposal gets traction at some future point we would look into Swarming support for GC though.
,
Oct 23
ChromeOS is currently using custom configured bots, so that would be an issue anyway.
,
Oct 24
Foundation, we're unlikely to do anything related to Docker within the next calendar year and we have several high priority requests to your team so don't do anything with this just yet. We'd rather that you worked on those higher priority requests. |
|||
►
Sign in to add a comment |
|||
Comment 1 by mar...@chromium.org
, Oct 22