New issue
Advanced search Search tips

Issue 800447 link

Starred by 4 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: 14
NextAction: ----
OS: ----
Pri: 2
Type: Feature

Blocking:
issue 825366



Sign in to add a comment

git wrapper: implement ~transparent git cache

Project Member Reported by no...@chromium.org, Jan 9 2018

Issue description

Implement caching in the git wrapper, as transparent as possible.

Optimize for the following use cases:
- kitchen fetches recipe repo, e.g. build.git
- recipe engine fetches recipe package deps
- bot_update fetches a repo, e.g. chromium/src

once done, delete existing git-cache from depot_tools and its usages.
 
Cc: nxia@chromium.org
Any thought to how desktop users can take advantage of git-cache?

And please give us a heads up BEFORE deleting git-cache, since I think it will break our version of the repo tool.
omg I really want to work on this...

I think there's a fully transparent solution using .git/objects/info/alternates. The basic strategy is:

  * Set the cache repo dir via $GIT_ALTERNATE_OBJECT_DIRECTORIES. This will indicate to the wrapper where to stuff the objects, but is also interpreted by native git as "look in these directories when searching for object files".
  * intercept `git fetch`, `git clone`, and `git pull` commands
  * parse them to see what fetch specs they're going for (which can involve running a quick `git config` invocation to look for `remote.*.fetch` definitions)
  * map the remote url to a hash
  * fetch into the cache (one big object store) using the url hash as a ref namespace. So if the user wants to fetch 'refs/heads/*' from url 'https://something' (whose hash is 'deadbeef'), we'd run `git fetch https://something 'refs/heads/*:refs/deadbeef/heads/*' in the cache.
  * This would be protected by a single flock. We could monitor contention on this lock to see if/how long the wrapper ends up blocking on it. I estimate that it won't be very much.
  * if the original command was `git fetch`, transform it into a series of `update-ref` calls on the underlying repo, similar behavior modifications for pull and clone (e.g. do update-ref and merge/rebase invocation).
  * use event monitoring to discover if any 'local' repos end up having object files in them (maybe once per 24 hours). Can use this to evaluate the effectiveness of the caching support, and if we see significant object accumulation (besides just committing patches or similar), we can tweak this strategy as necessary.

We would allow git-gc to work as normal (or maybe schedule offline garbage collection tasks to keep the objects under control. We could also enhance this later by making a giant git-ball on a cron builder which includes objects from all git repos mentioned in chromium DEPS (and maybe V8+friends DEPS too). We can archive and fetch this git ball from cipd in the event of a blank cache. (the alternative is to put it in google storage... in which case, why not just put it in CIPD?).

Once this is done, the git wrapper (a go binary) could be distributed to bots and users alike. I've been running the git wrapper on my work machine on mac for quite a while now. The git wrapper would then implement the behavior of git-retry.py and git-cache.py.
+1 to what dgarrett says in #1. I really want this to be easily used by users. I think if we implement #2, making this work for devs would be as easy as:
  * use the git wrapper in depot_tools (or instruct users to get it via go or via cipd)
  * set $GIT_ALTERNATE_OBJECT_DIRECTORIES to where you want the cache to go
(I would also expect the wrapper to do a no-op on dev machines which don't have $GIT_ALTERNATE_OBJECT_DIRECTORIES set)

Comment 5 by no...@chromium.org, Jan 9 2018

Cc: estaab@chromium.org
this is a great feature to work on. This may fits our new "tech debt" budget. +estaab

Comment 6 by nxia@chromium.org, Jan 9 2018

if we could remove git-cache from the depot tool, we then don't need to maintain our own version of repo fork.

Comment 7 by nxia@chromium.org, Jan 9 2018

Cc: vapier@chromium.org
at least until the next feature we develop that comes along that isn't merged upstream first (or at all).  or we want to cherry-pick back something because upstream isn't cutting new releases (like right now).  so i think we'll still keep our own copy in depot_tools, even if the only difference is that it has a few extra Chromium keys.

it would be nice though so that people who aren't using our fork of repo can reliably get a checkout and work with it.

Comment 9 by hinoka@chromium.org, Jan 12 2018

Components: -Infra>Platform Infra>SDK
Status: Available (was: Untriaged)
Components: Infra>Git
Let's also keep this in infra>git... it happens to also affect SDK, but is primarially for bots.
Owner: iannucci@chromium.org
Status: Assigned (was: Available)
Self assigning as this is in the OKRs anyway.
Is it reasonable to bump this to P1 given last week's outage?
I think given the likelihood of a similar outage being pretty low and the need for us to clear any strict LUCI migration blockers being higher priority I think this is probably still right. That said, as Robbie mentioned it *is* on our OKRs for this quarter so there will likely be work done on this.
Blocking: 825366

Comment 15 by efoo@chromium.org, Jun 2 2018

Labels: cit-pm-71

Comment 16 by efoo@chromium.org, Jun 2 2018

Friendly ping. This is a blocking bug for cit-pm-71. Please update pri and comment accordingly. Thanks!
i believe this much less relevant to cit-pm-71 now. This caused git quota exhaustion at the time, but now we are using cipd to deploy recipes. We do have git caching in bot_update.

Comment 18 by nxia@chromium.org, Jun 8 2018

Cc: -nxia@chromium.org
Labels: LUCI-Backlog
Owner: ----
Status: Available (was: Assigned)
Putting this on backlog
Cc: -iannucci@chromium.org iannu...@google.com

Sign in to add a comment