Extend warm cache support to physical builders. |
||||||||||||||
Issue descriptionAfter today's clobber experience, we'd like to investigate extending our warm-cache mechanism to physical builders. First thought is that we somehow create a tarball in GS that's kept roughly in sync what's on the GCE images, and use Puppet to download and install it to the cache directory of our physical builders.
,
Jun 30 2016
What we are using for the GCE builders is a directory that contains the ".repo" directory of a ChromeOS checkout. Our builders know how to copy that when creating a new buildboot (either first build, or after a clobber). We can then efficiently move it to a different (but similar) manifest, branch, etc. If CIPD is what I remember (not sure), it's gclient specific, and only works for git repos that are public because no authentication is possible.
,
Jun 30 2016
No, it uses authentication and isn't dependent on much aside from ruby which is included from puppet. I'm concerned about the extra bandwidth though. Is this just a strategy to pre-seed large git checkouts? I wonder if we could stage the repo on some mirrors or something local to the golo?
,
Jun 30 2016
CIPD is unrelated to git and could have been used for this, but it doesn't handle mult-GB files currently :( (It verifies SHA1 hashes on the backend, and for multi-GB file it will most probably timeout). How large is zipped '.repo' directory?
,
Jun 30 2016
On the current GCE images: du -sh /var/cache/chrome-infra/ccompute-setup/cros-internal/.repo 18G /var/cache/chrome-infra/ccompute-setup/cros-internal/.repo That's uncompressed, and is mostly full of git data, so..... multi-GB.
,
Jun 30 2016
I'll try to upload it as cipd package and see what happens :) There's hope it compresses well. (CIPD package is essentially a zip file in Google Storage, with some additional metadata and ACLs attached to it. We use CIPD primarily to deploy binaries on bots via Puppet, but it can be used for anything). I'll use cros-wimpy1-c2 for my experiments.
,
Jun 30 2016
Welp, it doesn't compress well (17 Gb after the compression that took 30 min...). Uploading now. It is also very slow. I have little hope it would work :)
,
Jun 30 2016
Yep, times out. There's a way to fix it though, but I don't want to spend time implementing it if it isn't going to be used...
,
Jun 30 2016
I'm not sure if a warm "repo" cache makes sense on a baremetal builder. Those stick around a lot longer than GCE, and so the cache will either have to be updated in the background periodically or will be out of date soon. The GCE one works nicely b/c it is constructed at the same time as the image, so all of the infrastructure for building, storage, and distribution is in place. For baremetals, we'd probably need: 1) An updater daemon to periodically update the repo cache in the background. 2) A BuildBot cron builder to periodically build the repo cache. 2a) Bonus here is that we can switch the GCE image's cache to use this so we don't use twice as much quota. I don't know if we want to use CIPD for this b/c these are really large files (gigabytes?) and CIPD, AFAIK, keeps things around indefinitely.
,
Jun 30 2016
,
Jun 30 2016
I just had a thought. Our existing warm cache support depends on a directory being on the image. What if we revamped it to always use GS? It's very rare to actually copy from the cache, so we can afford to make it somewhat expensive. This change would allow us to have more free space on the builders, simplify the process for creating images, and allow updated cache contents to be 'distributed' to all builders (both physical and GCE) for free.
,
Jun 30 2016
That seems like a good idea if we go with #9. Would you remove the warm cache entirely from the image, then, and have cbuildbot do the GS copy?
,
Jun 30 2016
I can roll a solution in cbuildbot that goes straight to GS, and we create a builder to update the GS data. If CIPD doesn't work for this, are there any other Chrome Infra services that do?
,
Jun 30 2016
Google Storage seems more than adequate for this. If we want to contain costs, we could create a new bucket and set a maximum lifespan. If we sync every day or two, with a maximum lifetime of 7 days on the bucket, there will be 3-4 sync archives in the bucket at any given time. The builder can do a "gsutil ls", find the latest image, and pull that without any races. If anything goes wrong, it's all fail-open anyway.
,
Jun 30 2016
I was thinking of creating a new builder that creates the tarball, uploads it as one of it's own build artifacts (which means limited lifespan), then another copy to a single fixed location. All of the builders will pull from that fixed location when doing a cold sync. This doesn't give us any history, but it's dead simple.
,
Jun 30 2016
Yeah, something like that seems fine.
,
Jun 30 2016
Please use a GCE bot to keep that image up to date if possible to limit bandwidth to/from the golo.
,
Jun 30 2016
GCE is fine, and even then, the image doesn't need to be updated very often. Once a week should be more than enough. The only heavy bandwidth I'd expect into the Golo is if a large number of Golo builders get clobbered at the same time, since they will all download that image about the same time. That this will still turn out to be less bandwidth than syncing the same data from GoB would have been.
,
Jun 30 2016
Implementation Plan: 1) Create a new builder which sync's to TOT, then creates a .repo tarball artifact. 2) Have builder upload artifact to well known GS location. 3) Update existing cbuildbot warm cache code to pull from well known GS location. 5) Remove warm cache command line option from cbuildbot recipe. 4) Remove warm cache from GCE image creation. 6) Remove warm cache command line option from cbuildbot.
,
Jul 6 2016
,
Jul 27 2016
,
Sep 8 2016
,
Oct 17 2016
,
Jan 21 2017
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by friedman@chromium.org
, Jun 30 2016