Do not fetch history of DEPSed repositories with pinned revisions |
|||||
Issue descriptionWhen requesting to add a repo to DEPS, one of the issues was the size. libaom was forked from libvpx and contains git history going back to 2010, encompassing vp8, vp9 development, vp10 investigation, av1 development and massive copies, renames and refactoring between those projects. libaom is almost twice the size of libvpx (58mb vs 110mb). None of this git history is valuable to building libaom for chromium. If we would like to add av1 support to chromium, it will put an extra 110mb in every single developers checkout. To mitigate this I would like to request that DEPS entries contain only the pinned revision. If a developer desires development history they would still be able to enter the desired directory and 'git fetch' such information. While I am not very familiar with every other DEPS entry, my understanding is they are intended to provide a snapshot for chromium to depend on, not be an active point of development. This is certainly the case for libaom and libvpx. Development mostly occurs upstream, and we roll updates into chromium periodically.
,
Nov 2 2017
There are a few problems with doing this: 1) Creating a shallow checkout (with fetch --depth N) is very expensive for the remote git servers. They have pre-made packs of all the files necessary for a full clone, and can serve them quickly. Serving shallow clones (especially many shallow clones, to all of the bots every time they sync) is much computationally harder. 2) Upgrading a shallow clone to a full clone is even more expensive than that. Although it wouldn't expect to be done often, if you run smut@'s steps above and then run a plain "git fetch", that will be slower than if you had just done a full-depth clone in the first place. Basically, when we first switched from SVN to Git in 2014, we tried to do exactly this. It didn't go well, and so we put other measures (like the git cache which is used by all bots) in place instead. It may be worth investigating this solution again, but I am not aware of anyone who really has the cycles to do so. In my opinion, a difference of ~0.075GB (since smut says the shallow clone is 3/4ths the size) and decreasing (since each subsequent shallow fetch leaves the old git objects on disk) is not worth fundamentally changing the way we get checkouts.
,
Nov 2 2017
Your performance points are quite unfortunate. However, I'm going to disagree strongly wrt size. While I used libaom as an example, I would suggest using this for all the DEPS repositories. I'm also not primarily interested in this for bots, but for developer checkouts For libaom: $ du -hs libaom* 25M libaom-7b06dd5dbf11ee1cd65b974a2e46ec33eab65375 104M libaom-full There are some much larger gains: native_client 422mb -> 45mb skia 347mb -> 94mb icu 355mb -> 177mb v8 323mb -> 151mb I tried to script this but ran into problems with the way buildtools has nested .git directories and so gave up on that one. But generally, it saves about 2gb total. Checkout directory goes from 18gb to 16gb. #!/bin/bash for dir in `find . -mindepth 2 -type d -name .git | sort | sed 's/\/\.git//'`; do [[ -d ${dir} ]] || echo ${dir} disappeared [[ -d ${dir} ]] || exit 1 pushd ${dir} > /dev/null project=$(pwd | awk -F"/" '{print $NF}') repo=$(grep -m1 url .git/config | cut -f2 -d'=') rev=$(git log | head -n 1 | cut -f2 -d' ') echo ${dir} ${project} ${repo} ${rev} [[ "${project}" = "buildtools" ]] && popd && continue cd .. du -hs ${project} rm -rf ${project} mkdir ${project} cd ${project} git init git fetch --depth 1 ${repo} ${rev} git checkout FETCH_HEAD cd .. du -hs ${project} popd > /dev/null done
,
Nov 3 2017
A simpler script: fetch --nohooks chromium du -hs . echo '#!/bin/bash\nrm -rf .git/objects\nmkdir .git/objects\ngit fetch --depth 1 origin `git rev-parse HEAD`' > /tmp/shallow chmod +x /tmp/shallow gclient recurse /tmp/shallow du -hs . But even so, the savings are minor: * It looks like a ~10% savings or so * A full compile produces more than this amount of executables + debug symbols anyway * The old git objects aren't deleted when the repo rolls forward * People with existing checkouts won't see these benefits at all And performing certain kinds of common-ish actions becomes a much worse experience: * Moving backwards in time (e.g. for a bisect) gets very slow * Examining history (e.g. for a blame in a dependency) becomes impossible, or requires a sync first * Running `git log` in a repo which has 5 out of the last 100 commits (e.g. v8 which you sync once a day) breaks entirely
,
Nov 9 2017
One last point: network traffic. <project> is a full checkout and <project>-shallow is a git fetch --depth 1 HEAD 98M ./libaom/.git 3.6M ./libaom-shallow/.git 384M ./native_client/.git 5.9M ./native_client-shallow/.git 288M ./skia/.git 20M ./skia-shallow/.git After doing 'git checkout FETCH_HEAD' the on-disk size is much larger, but I believe these accurately represent what is copied over the network.
,
Oct 18
,
Jan 10
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by s...@google.com
, Nov 1 2017Components: -Infra>Git>Admin Infra>SDK
Summary: Do not fetch history of DEPSed repositories with pinned revisions (was: DEPS checkouts take up a lot of space)