Reduce the load of repo sync on GoB |
||||||||||
Issue descriptionLately we saw some repo sync issues on our builds (mainly on the kernel project), tracked at b/31202734. The main reason should be the growing size of our code base and the increasing load on the GoB services. Looking for a way to reduce our load on the GoB services by reducing our repo sync with network. According to the code, the slave builds sync to the latest manifest (network sync only) first, then sync to the local next_manifest (network sync + local sync). Wondering if it hurts to change the second sync to local_only sync, as the first network sync already fetched all the changes from the server. https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/stages/sync_stages.py?dr&q=sync_&sq=package:%5Echromeos_(internal%7Cpublic)$&l=1187 The repo sync error mostly happened on the kernel project as each sync has 5 projects referring to the same kernel project. Reducing the network sync should help solve issues on the kernel project. Fetching CLs from Gerrit also caused some sync issues, which should be improved.
,
Oct 6 2016
Test #1 locally: 1) "repo sync" to the latest version 2) change the manifest to an old version 3) "repo sync -l" successfully updated the working tree by discarding commits in some projects. Looks like changing the second sync to local only works. So we now have two options to reduce the load: 1. 1) file a ticket to repo, repo sync to a pinned manifest should fetch all branches. 2) Once the fix for the repo is published, we revert the 268365 CL. 2. change the second sync to a local only sync. so we only run network sync once in the sync stage.
,
Oct 6 2016
2 seems like a good option to me, with a risk that there is some edge case we haven't thought of. Not that I can think of it. ;>
,
Oct 6 2016
I think that solution #2 will not work in all cases. If any project in the manifest is marked as sync-c=True or clone-depth=1, it may fail. Here's an example test case: 1. Add a new repo to your manifest with sync-c=True set on the project. 2. Run repo sync -n 3. Edit the manifest to point at a different branch 4. Run repo sync -l This should fail (hence why the second sync is a full sync and not repo sync -l) You can find the same failure with clone-depth=1 by trying to sync to an old SHA1 on a different branch.
,
Oct 6 2016
If one project is added to manifest with sync-c=True, should we fetch all branches while syncing? If not, we still have a problem with #1. 1. add a new repo to the manifest with sync-c=True. 2. pin manifest 3. full sync will work fine to fetch the specific branch. 4. apply a special CL which changes the manifest to a different branch. 4 will give the error like crbug.com/482077 To solve this issue, repo sync should still update all branches even sync-c is specified in the project. Probably repo should support an option like --force-all-branch, which ignore sync-c.
,
Oct 6 2016
Yes, --force-all-branch would help
,
Oct 6 2016
Just as background, we have one repository using sync-c now.
<project path="src/third_party/chromiumos-overlay" sync-c="true"
name="chromiumos/overlays/chromiumos-overlay"
groups="minilayout,labtools" />
The amount of history in that repo is MUCH larger than the actual files, since the history contains nearly every ebuild uprev in ChromiumOS.
And we would expect the TOT sha to be different in nearly every branch or pinned manifest case.
,
Oct 6 2016
That's the biggest offender. There's also one repository with clone-depth="1" (src/private-overlays/project-labstation-private)
,
Oct 6 2016
so --force-all-branch isn't preferred in this case? Then I guess we can let the code as it is now. Hope GoB can solve the OOM issue from their end.
,
Oct 6 2016
With --force-all-branch, you could just do a single repo sync ( you could revert https://chromium-review.googlesource.com/#/c/268365/ ) You might be able to tackle https://bugs.chromium.org/p/chromium/issues/detail?id=482077 a different way -- I'm suspecting there's some bug with the way repo switches branches that you could fix there that would alleviate the need for the --force-all-branch flag.
,
Oct 7 2016
Looked into our repo fork, when repo sync is running on a pinned manifest version, it skips RemoteFetch(update all branches) when it has the SHA1 already. https://chromium.googlesource.com/external/repo.git/+/stable/project.py#1211 This should have been fixed in the git-repo with the option '--optimized-fetch'. As long as we keep the option as False, the repo sync should always fetch all the branches from remote (when sync-c is not True: https://chromium.googlesource.com/external/repo.git/+/stable/project.py#1925). https://gerrit.googlesource.com/git-repo/+/master/project.py#1261 This fix isn't in our repo fork. I assume once vapier@ rolls out the new stable repo with the optimized-fetch feature, we can safely remove the first sync and so reduce our load on the GoB.
,
Oct 7 2016
,
Oct 7 2016
i can push out v1.12.34-cr2 again this evening. that way if the bots blow up, it's just over the weekend :).
,
Oct 7 2016
Just pointing out, turning off --optimized-fetch should put a good bit more load on Gerrit... so we'll really want to get rid of the initial sync -n :P
,
Oct 7 2016
Right. With optimized-fetch off in the repo sync, it's going to put a little more load on syncing when manifest is pinned to a version. we should get rid of the initial sync -n (based on the latest manifest file) and let the repo.sync(next_manifest) handle all sync operations(including updating branches). optimized-fetch is default to False in the repo, once the repo is pushed to the new version, the feature will be enabled.
,
Oct 7 2016
Just talked with nxia, and she meant: "optimized-fetch is default to False in the repo, and once the repo is pushed to the new version, the feature will be disabled." This means that we will no longer have optimized-fetch when we update, unless we explicitly specify this option in cbuildbot. This all sounds good -- just wanted to clarify.
,
Oct 11 2016
The --optimized-fetch can solve the issue we had at chromium:482077. but we may still have the issue mentioned in #5. Having the initial repo sync on unpinned manifest can't solve the issue either, as it only syncs the specific branch when sync-c is True. We may need to come up with an option like --force-all-branches. This is a different issue and now we only have one sync-c project, will open another bug to track it.
,
Oct 11 2016
So, if we upgrade to the new version of repo, I think things will still work. We then have three choices: 1) Revert https://chromium-review.googlesource.com/#/c/268365/ and get rid of the first sync 2) Use --optimized-fetch in second sync 3) Both 1 and 2 Not sure which of these 3 would work -- we could experiment. 3 would be the fastest, but you run the risk of running into the same bug again. Might be worth giving it a try!
,
Oct 11 2016
v1.12.34-cr2 has rolled out to all the systems now. if things stay stable, i'll give v1.12.37 a try next week.
,
Oct 11 2016
daividjames@ by 3), do you mean get rid of the first sync -n and add --optimized-fetch in the second sync? this wouldn't work as when --optimized-fetch is used, the second sync won't fetch other branches if it has the local SHA. 1) sounds good, as it looks to me there's no need to run sync twice when sync on pinned manifest also fetches all branches.
,
Oct 11 2016
Yes, in theory, implementing 3 would cause https://bugs.chromium.org/p/chromium/issues/detail?id=482077 to pop up again -- but I'm not sure because I don't fully understand the bug.
,
Oct 12 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/eb73f766d3b143acf964c2e39ea32b64240c9b18 commit eb73f766d3b143acf964c2e39ea32b64240c9b18 Author: Ningning Xia <nxia@chromium.org> Date: Tue Oct 11 00:58:43 2016 Revert "Sync with unpinned manifest first prior to syncing with pinned manifest." This reverts commit 7d034ce9230c36aaca5129c88eafeac24a291975. Previously, when syncing to a pinned manifest, repo sync skipped fetching from remote if the local project had been fixed to the pinned revision. As described in crbug.com/482077 , when a CL wanted to swich to a branch on one project in the manifest, it caused failures because the branch wasn't fetched by repo sync and so cannot be checked out. The new version of repo sync has fixed the issue by providing --optimized-fetch. With --optimized-fetch disabled (by default), repo sync fetches from remote no matter the local project has the SHA or not. As now repo sync on pinned manifest also fetches from remote like repo sync on unpinned manifest, we want to remove the initial repo sync on unpined manifest to reduce our load on the GoB server. BUG= chromium:653356 TEST=cbuildbot_run; confirmed that with the new repo version, repo sync on pinned manifest (sync-c not True) fetches from remote when it has the local SHA revision. Change-Id: I07e571360a16292061188b990db2b4c10945f4ee Reviewed-on: https://chromium-review.googlesource.com/396401 Commit-Ready: Ningning Xia <nxia@chromium.org> Tested-by: Ningning Xia <nxia@chromium.org> Reviewed-by: Don Garrett <dgarrett@chromium.org> [modify] https://crrev.com/eb73f766d3b143acf964c2e39ea32b64240c9b18/cbuildbot/stages/sync_stages.py
,
Oct 25 2016
,
Jan 21 2017
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by davidjames@chromium.org
, Oct 6 2016