New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 624610 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocking:
issue 632203



Sign in to add a comment

Extend warm cache support to physical builders.

Project Member Reported by dgarr...@chromium.org, Jun 29 2016

Issue description

After today's clobber experience, we'd like to investigate extending our warm-cache mechanism to physical builders.

First thought is that we somehow create a tarball in GS that's kept roughly in sync what's on the GCE images, and use Puppet to download and install it to the cache directory of our physical builders.
 
Can you detail this tarball a little more?  We have CIPD which is our chrome infra package system which might be easier to use.
What we are using for the GCE builders is a directory that contains the ".repo" directory of a ChromeOS checkout. Our builders know how to copy that when creating a new buildboot (either first build, or after a clobber). We can then efficiently move it to a different (but similar) manifest, branch, etc.

If CIPD is what I remember (not sure), it's gclient specific, and only works for git repos that are public because no authentication is possible.

Cc: vadimsh@chromium.org
No, it uses authentication and isn't dependent on much aside from ruby which is included from puppet.

I'm concerned about the extra bandwidth though.  Is this just a strategy to pre-seed large git checkouts?  I wonder if we could stage the repo on some mirrors or something local to the golo?
CIPD is unrelated to git and could have been used for this, but it doesn't handle mult-GB files currently :( (It verifies SHA1 hashes on the backend, and for multi-GB file it will most probably timeout).

How large is zipped '.repo' directory?
On the current GCE images:

du -sh /var/cache/chrome-infra/ccompute-setup/cros-internal/.repo
18G     /var/cache/chrome-infra/ccompute-setup/cros-internal/.repo

That's uncompressed, and is mostly full of git data, so..... multi-GB.
I'll try to upload it as cipd package and see what happens :) There's hope it compresses well.

(CIPD package is essentially a zip file in Google Storage, with some additional metadata and ACLs attached to it. We use CIPD primarily to deploy binaries on bots via Puppet, but it can be used for anything).

I'll use cros-wimpy1-c2 for my experiments.
Welp, it doesn't compress well (17 Gb after the compression that took 30 min...). Uploading now. It is also very slow. I have little hope it would work :)
Yep, times out. There's a way to fix it though, but I don't want to spend time implementing it if it isn't going to be used...

Comment 9 by d...@chromium.org, Jun 30 2016

I'm not sure if a warm "repo" cache makes sense on a baremetal builder. Those stick around a lot longer than GCE, and so the cache will either have to be updated in the background periodically or will be out of date soon.

The GCE one works nicely b/c it is constructed at the same time as the image, so all of the infrastructure for building, storage, and distribution is in place. For baremetals, we'd probably need:
1) An updater daemon to periodically update the repo cache in the background.
2) A BuildBot cron builder to periodically build the repo cache.
2a) Bonus here is that we can switch the GCE image's cache to use this so we don't use twice as much quota.

I don't know if we want to use CIPD for this b/c these are really large files (gigabytes?) and CIPD, AFAIK, keeps things around indefinitely.
Cc: akes...@chromium.org
I just had a thought.

Our existing warm cache support depends on a directory being on the image.

What if we revamped it to always use GS? It's very rare to actually copy from the cache, so we can afford to make it somewhat expensive. This change would allow us to have more free space on the builders, simplify the process for creating images, and allow updated cache contents to be 'distributed' to all builders (both physical and GCE) for free.

Comment 12 by d...@chromium.org, Jun 30 2016

That seems like a good idea if we go with #9. Would you remove the warm cache entirely from the image, then, and have cbuildbot do the GS copy?
I can roll a solution in cbuildbot that goes straight to GS, and we create a builder to update the GS data.

If CIPD doesn't work for this, are there any other Chrome Infra services that do?

Comment 14 by d...@chromium.org, Jun 30 2016

Google Storage seems more than adequate for this. If we want to contain costs, we could create a new bucket and set a maximum lifespan. If we sync every day or two, with a maximum lifetime of 7 days on the bucket, there will be 3-4 sync archives in the bucket at any given time. The builder can do a "gsutil ls", find the latest image, and pull that without any races. If anything goes wrong, it's all fail-open anyway.
I was thinking of creating a new builder that creates the tarball, uploads it as one of it's own build artifacts (which means limited lifespan), then another copy to a single fixed location.

All of the builders will pull from that fixed location when doing a cold sync.

This doesn't give us any history, but it's dead simple.

Comment 16 by d...@chromium.org, Jun 30 2016

Yeah, something like that seems fine.
Please use a GCE bot to keep that image up to date if possible to limit bandwidth to/from the golo.
GCE is fine, and even then, the image doesn't need to be updated very often. Once a week should be more than enough.

The only heavy bandwidth I'd expect into the Golo is if a large number of Golo builders get clobbered at the same time, since they will all download that image about the same time. That this will still turn out to be less bandwidth than syncing the same data from GoB would have been.


Owner: nxia@chromium.org
Implementation Plan:

1) Create a new builder which sync's to TOT, then creates a .repo tarball artifact.
2) Have builder upload artifact to well known GS location.
3) Update existing cbuildbot warm cache code to pull from well known GS location.
5) Remove warm cache command line option from cbuildbot recipe.
4) Remove warm cache from GCE image creation.
6) Remove warm cache command line option from cbuildbot.
Labels: -current-issue
Blocking: 632203
Labels: BuildHealth iptaskforce
Status: Fixed (was: Untriaged)

Comment 24 by dchan@google.com, Jan 21 2017

Labels: VerifyIn-57

Comment 25 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 26 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 27 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 29 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment