New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 653356 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug

Blocked on:
issue 650719

Blocking:
issue 651581



Sign in to add a comment

Reduce the load of repo sync on GoB

Project Member Reported by nxia@chromium.org, Oct 6 2016

Issue description


Lately we saw some repo sync issues on our builds (mainly on the kernel project), tracked at b/31202734. The main reason should be the growing size of our code base and the increasing load on the GoB services. Looking for a way to reduce our load on the GoB services by reducing our repo sync with network.

According to the code, the slave builds sync to the latest manifest (network sync only) first, then sync to the local next_manifest (network sync + local sync). Wondering if it hurts to change the second sync to local_only sync, as the first network sync already fetched all the changes from the server. 
https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/stages/sync_stages.py?dr&q=sync_&sq=package:%5Echromeos_(internal%7Cpublic)$&l=1187

The repo sync error mostly happened on the kernel project as each sync has 5 projects referring to the same kernel project. Reducing the network sync should help solve issues on the kernel project. 


Fetching CLs from Gerrit also caused some sync issues, which should be improved.  
 
A better fix would be to revert https://chromium-review.googlesource.com/#/c/268365/, but that would require a fix to repo itself.

Comment 2 by nxia@chromium.org, Oct 6 2016

Test #1 locally:

1) "repo sync" to the latest version
2) change the manifest to an old version
3) "repo sync -l" successfully updated the working tree by discarding commits in some projects. 

Looks like changing the second sync to local only works. So we now have two options to reduce the load:

1. 1) file a ticket to repo, repo sync to a pinned manifest should fetch all branches. 2) Once the fix for the repo is published, we revert the 268365 CL.

2. change the second sync to a local only sync. so we only run network sync once in the sync stage. 

 
2 seems like a good option to me, with a risk that there is some edge case we haven't thought of. Not that I can think of it. ;>
I think that solution #2 will not work in all cases. If any project in the manifest is marked as sync-c=True or clone-depth=1, it may fail. Here's an example test case:

 1. Add a new repo to your manifest with sync-c=True set on the project.
 2. Run repo sync -n
 3. Edit the manifest to point at a different branch
 4. Run repo sync -l

This should fail (hence why the second sync is a full sync and not repo sync -l)

You can find the same failure with clone-depth=1 by trying to sync to an old SHA1 on a different branch.


Comment 5 by nxia@chromium.org, Oct 6 2016

If one project is added to manifest with sync-c=True, should we fetch all branches while syncing? If not, we still have a problem with #1.

1. add a new repo to the manifest with sync-c=True.
2. pin manifest 
3. full sync will work fine to fetch the specific branch.
4. apply a special CL which changes the manifest to a different branch.

4 will give the error like  crbug.com/482077 

To solve this issue, repo sync should still update all branches even sync-c is specified in the project. Probably repo should support an option like --force-all-branch, which ignore sync-c.
Yes, --force-all-branch would help
Just as background, we have one repository using sync-c now.

  <project path="src/third_party/chromiumos-overlay" sync-c="true"
           name="chromiumos/overlays/chromiumos-overlay"
           groups="minilayout,labtools" />

The amount of history in that repo is MUCH larger than the actual files, since the history contains nearly every ebuild uprev in ChromiumOS.

And we would expect the TOT sha to be different in nearly every branch or pinned manifest case.
That's the biggest offender. There's also one repository with clone-depth="1" (src/private-overlays/project-labstation-private)

Comment 9 by nxia@chromium.org, Oct 6 2016

so --force-all-branch isn't preferred in this case?

Then I guess we can let the code as it is now. Hope GoB can solve the OOM issue from their end. 
With --force-all-branch, you could just do a single repo sync ( you could revert https://chromium-review.googlesource.com/#/c/268365/ )

You might be able to tackle https://bugs.chromium.org/p/chromium/issues/detail?id=482077 a different way -- I'm suspecting there's some bug with the way repo switches branches that you could fix there that would alleviate the need for the --force-all-branch flag.

Comment 11 by nxia@chromium.org, Oct 7 2016

Cc: vapier@chromium.org
Looked into our repo fork, when repo sync is running on a pinned manifest version, it skips RemoteFetch(update all branches) when it has the SHA1 already.
 
https://chromium.googlesource.com/external/repo.git/+/stable/project.py#1211



This should have been fixed in the git-repo with the option '--optimized-fetch'. As long as we keep the option as False, the repo sync should always fetch all the branches from remote (when sync-c is not True: https://chromium.googlesource.com/external/repo.git/+/stable/project.py#1925). 

https://gerrit.googlesource.com/git-repo/+/master/project.py#1261

This fix isn't in our repo fork. I assume once vapier@ rolls out the new stable repo with the optimized-fetch feature, we can safely remove the first sync and so reduce our load on the GoB. 

Comment 12 by nxia@chromium.org, Oct 7 2016

Blockedon: 650719
i can push out v1.12.34-cr2 again this evening.  that way if the bots blow up, it's just over the weekend :).
Just pointing out, turning off --optimized-fetch should put a good bit more load on Gerrit... so we'll really want to get rid of the initial sync -n :P

Comment 15 by nxia@chromium.org, Oct 7 2016

Right. With optimized-fetch off in the repo sync, it's going to put a little more load on syncing when manifest is pinned to a version. we should get rid of the initial sync -n (based on the latest manifest file) and let the repo.sync(next_manifest) handle all sync operations(including updating branches). 

optimized-fetch is default to False in the repo, once the repo is pushed to the new version, the feature will be enabled.
 
Just talked with nxia, and she meant:

"optimized-fetch is default to False in the repo, and once the repo is pushed to the new version, the feature will be disabled." This means that we will no longer have optimized-fetch when we update, unless we explicitly specify this option in cbuildbot.

This all sounds good -- just wanted to clarify.

Comment 17 by nxia@chromium.org, Oct 11 2016

The --optimized-fetch can solve the issue we had at chromium:482077. 

but we may still have the issue mentioned in #5. Having the initial repo sync on unpinned manifest can't solve the issue either, as it only syncs the specific branch when sync-c is True. We may need to come up with an option like --force-all-branches. This is a different issue and now we only have one sync-c project, will open another bug to track it.
So, if we upgrade to the new version of repo, I think things will still work. We then have three choices:

  1) Revert https://chromium-review.googlesource.com/#/c/268365/ and get rid of the first sync
  2) Use --optimized-fetch in second sync
  3) Both 1 and 2

Not sure which of these 3 would work -- we could experiment. 3 would be the fastest, but you run the risk of running into the same bug again. Might be worth giving it a try!
v1.12.34-cr2 has rolled out to all the systems now.  if things stay stable, i'll give v1.12.37 a try next week.

Comment 20 by nxia@chromium.org, Oct 11 2016

daividjames@ by 3), do you mean get rid of the first sync -n and add --optimized-fetch in the second sync? this wouldn't work as when --optimized-fetch is used, the second sync won't fetch other branches if it has the local SHA.

1) sounds good, as it looks to me there's no need to run sync twice when sync on pinned manifest also fetches all branches.
Yes, in theory, implementing 3 would cause https://bugs.chromium.org/p/chromium/issues/detail?id=482077 to pop up again -- but I'm not sure because I don't fully understand the bug.
Project Member

Comment 22 by bugdroid1@chromium.org, Oct 12 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/eb73f766d3b143acf964c2e39ea32b64240c9b18

commit eb73f766d3b143acf964c2e39ea32b64240c9b18
Author: Ningning Xia <nxia@chromium.org>
Date: Tue Oct 11 00:58:43 2016

Revert "Sync with unpinned manifest first prior to syncing with pinned manifest."

This reverts commit 7d034ce9230c36aaca5129c88eafeac24a291975.

Previously, when syncing to a pinned manifest, repo sync skipped
fetching from remote if the local project had been fixed to the pinned
revision. As described in  crbug.com/482077 , when a CL wanted to swich to
a branch on one project in the manifest, it caused failures because the
branch wasn't fetched by repo sync and so cannot be checked out.

The new version of repo sync has fixed the issue by providing
--optimized-fetch. With --optimized-fetch disabled (by default), repo
sync fetches from remote no matter the local project has the SHA or
not. As now repo sync on pinned manifest also fetches from remote like
repo sync on unpinned manifest, we want to remove the initial repo
sync on unpined manifest to reduce our load on the GoB server.

BUG= chromium:653356 
TEST=cbuildbot_run; confirmed that with the new repo version, repo sync
on pinned manifest (sync-c not True) fetches from remote when it has
the local SHA revision.

Change-Id: I07e571360a16292061188b990db2b4c10945f4ee
Reviewed-on: https://chromium-review.googlesource.com/396401
Commit-Ready: Ningning Xia <nxia@chromium.org>
Tested-by: Ningning Xia <nxia@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>

[modify] https://crrev.com/eb73f766d3b143acf964c2e39ea32b64240c9b18/cbuildbot/stages/sync_stages.py

Comment 23 by nxia@chromium.org, Oct 25 2016

Status: Fixed (was: Untriaged)

Comment 24 by dchan@google.com, Jan 21 2017

Labels: VerifyIn-57

Comment 25 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 26 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 27 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 29 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment