New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 662214 link

Starred by 0 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocking:
issue 663543



Sign in to add a comment

repo init failed with no retry

Project Member Reported by vapier@chromium.org, Nov 3 2016

Issue description

log:
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/full/builds/194

my CLs don't touch chromite or the sync path or this repo, but Sync died:
error.GitError: rev-parse: [Errno 2] No such file or directory: '/b/cbuild/external_master/src/third_party/libapps'

assigning to deputy

 
That build slave uses a different build root for internal and external builds. All previous external builds that are still in history failed with the same error. The error is raised in repo code.

I wiped out all build roots on that slave, which should fix things.

I'm not sure what we could do here, other than be more aggressive about wiping buildroot on a sync error.


Status: Started (was: Unconfirmed)
Looks somewhat related to https://chromium-review.googlesource.com/#/c/404911/.
I'd like to add another level of retry that wipes the buildroot. However, that would wipe more than we really want because we reuse the buildroot for more than just the code checkout (manifest versions, .cache, chroot, etc).

Also, that sync code already has retries and a lot of other logic I don't fully understand, so I'm not sure if wrapping it with another level of retry is a good idea or not.

Comment 5 by nxia@chromium.org, Nov 4 2016

it failed at 'repo init manifest', before it could reach the repo sync code. so it shouldn't be affected by CL:404911?
Isn't that init the first step of the Sync code? Or did I get that wrong.


Comment 7 by nxia@chromium.org, Nov 4 2016

I didn't make it clear, I meant it didn't reach the 'repo sync -n', where the change was touched. 


I'm still confused why the old external history affected the repo init. The repo init was provided with the external manifest url.

Comment 8 by nxia@chromium.org, Nov 4 2016

I'll log into that builder and take a look. 
In theory, "repo init" and "repo sync" can handle any kind of corruption in the repo checkout. In practice, that's not 100% true.

We have code that assumes if you can "repo init" and run "repo manifest", then things are healthy. If not, .repo is removed, but the source checkouts are left alone.

A theoretical example of corruption that won't recover without help:

Some source files are owned by root. Repo is running as the normal user, and so can't update or remove them without error.
Looks like we were hit by it again last night:
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/14294
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/14295
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/14296
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/14297


Most of them were the same:
Traceback (most recent call last):
  File "/b/cbuild/internal_master/.repo/repo/main.py", line 529, in <module>
    _Main(sys.argv[1:])
  File "/b/cbuild/internal_master/.repo/repo/main.py", line 505, in _Main
    result = repo._Run(argv) or 0
  File "/b/cbuild/internal_master/.repo/repo/main.py", line 161, in _Run
    result = cmd.Execute(copts, cargs)
  File "/b/cbuild/internal_master/.repo/repo/subcmds/manifest.py", line 80, in Execute
    self._Output(opt)
  File "/b/cbuild/internal_master/.repo/repo/subcmds/manifest.py", line 70, in _Output
    peg_rev_upstream = opt.peg_rev_upstream)
  File "/b/cbuild/internal_master/.repo/repo/manifest_xml.py", line 309, in Save
    output_projects(None, root, list(sorted(projects)))
  File "/b/cbuild/internal_master/.repo/repo/manifest_xml.py", line 233, in output_projects
    output_project(parent, parent_node, project)
  File "/b/cbuild/internal_master/.repo/repo/manifest_xml.py", line 260, in output_project
    value = p.work_git.rev_parse(HEAD + '^0')
  File "/b/cbuild/internal_master/.repo/repo/project.py", line 2540, in runner
    capture_stderr = True)
  File "/b/cbuild/internal_master/.repo/repo/git_command.py", line 218, in __init__
    raise GitError('%s: %s' % (command[1], e))
error.GitError: rev-parse: [Errno 2] No such file or directory: '/b/cbuild/internal_master/src/third_party/libapps'

One of the failures was on a different project:
Traceback (most recent call last):
  File "/b/cbuild/internal_master/.repo/repo/main.py", line 529, in <module>
    _Main(sys.argv[1:])
  File "/b/cbuild/internal_master/.repo/repo/main.py", line 505, in _Main
    result = repo._Run(argv) or 0
  File "/b/cbuild/internal_master/.repo/repo/main.py", line 161, in _Run
    result = cmd.Execute(copts, cargs)
  File "/b/cbuild/internal_master/.repo/repo/subcmds/manifest.py", line 80, in Execute
    self._Output(opt)
  File "/b/cbuild/internal_master/.repo/repo/subcmds/manifest.py", line 70, in _Output
    peg_rev_upstream = opt.peg_rev_upstream)
  File "/b/cbuild/internal_master/.repo/repo/manifest_xml.py", line 309, in Save
    output_projects(None, root, list(sorted(projects)))
  File "/b/cbuild/internal_master/.repo/repo/manifest_xml.py", line 233, in output_projects
    output_project(parent, parent_node, project)
  File "/b/cbuild/internal_master/.repo/repo/manifest_xml.py", line 260, in output_project
    value = p.work_git.rev_parse(HEAD + '^0')
  File "/b/cbuild/internal_master/.repo/repo/project.py", line 2540, in runner
    capture_stderr = True)
  File "/b/cbuild/internal_master/.repo/repo/git_command.py", line 218, in __init__
    raise GitError('%s: %s' % (command[1], e))
error.GitError: rev-parse: [Errno 2] No such file or directory: '/b/cbuild/internal_master/src/aosp/external/dbus-binding-generator'
We added some new build servers to the trybot pool, which had been sitting idle for a very long time. It's probably the same cause as before.

If I'd landed the auto-wipe change, this would have been an invisible issue.

Instead, I wiped all non-active buildroots from all General Pool trybot builders.

For an idle builder:
  mv /b/cbuild/* /tmp
Status: Fixed (was: Started)
I believe this is fixed.

Comment 13 by nxia@chromium.org, Jan 21 2017

Owner: nxia@chromium.org
Status: Assigned (was: Fixed)

15:40:41: ERROR: return code: 1; command: repo init --repo-url https://chromium.googlesource.com/external/repo --manifest-url https://chromium.googlesource.com/chromiumos/manifest --manifest-name default.xml --manifest-branch master
cwd=/b/cbuild/external_master

15:40:41: ERROR: <class 'chromite.cbuildbot.repository.SrcCheckOutException'>: return code: 1; command: repo init --repo-url https://chromium.googlesource.com/external/repo --manifest-url https://chromium.googlesource.com/chromiumos/manifest --manifest-name default.xml --manifest-branch master
cwd=/b/cbuild/external_master
Traceback (most recent call last):
  File "/b/build/slave/full/build/chromite/lib/failures_lib.py", line 172, in wrapped_functor
    return functor(*args, **kwargs)
  File "/b/build/slave/full/build/chromite/cbuildbot/stages/sync_stages.py", line 587, in PerformStage
    self.ManifestCheckout(self.GetNextManifest())
  File "/b/build/slave/full/build/chromite/cbuildbot/stages/sync_stages.py", line 435, in ManifestCheckout
    self.repo.Sync(next_manifest)
  File "/b/build/slave/full/build/chromite/cbuildbot/repository.py", line 517, in Sync
    raise SrcCheckOutException(err_msg)
SrcCheckOutException: return code: 1; command: repo init --repo-url https://chromium.googlesource.com/external/repo --manifest-url https://chromium.googlesource.com/chromiumos/manifest --manifest-name default.xml --manifest-branch master
cwd=/b/cbuild/external_master




One possible solution is to remove ('manifest.xml', 'manifests.git', 'manifests', 'repo') folders under the .repo folder and retry 'repo init', which will pull the new and required manifests and repo again. 
Will post a fix. Will see if I can find another builder with this issue and try my fix.

Comment 14 by nxia@chromium.org, Jan 21 2017

Summary: repo init failed with no retry (was: build276-m2 fails to sync due to missing libapps)
The builder build276-m2 is fine now, just want to fix the issue from our code.

"rev-parse: [Errno 2] No such file or directory: '/b/cbuild/external_master/src/third_party/libapps' " should not be the critical error. repo init should be the issue here.


Comment 15 by nxia@chromium.org, Jan 21 2017

Cc: -nxia@chromium.org dgarr...@chromium.org pprabhu@chromium.org
Project Member

Comment 16 by bugdroid1@chromium.org, Jan 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/5abf2a5034391183ae09de0f51ec53ce26259e21

commit 5abf2a5034391183ae09de0f51ec53ce26259e21
Author: Ningning Xia <nxia@chromium.org>
Date: Sat Jan 21 02:23:16 2017

Retry 'repo init' on failures.

Previously, once 'repo init' failed, it exited and failed the build in
sync stage. We want to catch the 'repo init' failure, clean up the
repo and manifest dirs, and retry 'repo init' in repo.Initialize.

BUG= chromium:662214 
TEST=unit_tests; run_tests with network

Change-Id: I18e445f3da919b924794f8bc5b357924ed97e340
Reviewed-on: https://chromium-review.googlesource.com/431006
Commit-Ready: Ningning Xia <nxia@chromium.org>
Tested-by: Ningning Xia <nxia@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>

[modify] https://crrev.com/5abf2a5034391183ae09de0f51ec53ce26259e21/cbuildbot/repository.py
[modify] https://crrev.com/5abf2a5034391183ae09de0f51ec53ce26259e21/cbuildbot/repository_unittest.py

Comment 17 by nxia@chromium.org, Jan 31 2017

Status: Fixed (was: Assigned)

Comment 18 by nxia@chromium.org, Jan 31 2017

Blocking: 663543
Project Member

Comment 19 by bugdroid1@chromium.org, Mar 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/6caa9097769833af908f3ca380550fdb4e5542b5

commit 6caa9097769833af908f3ca380550fdb4e5542b5
Author: Don Garrett <dgarrett@google.com>
Date: Thu Mar 02 02:42:13 2017

osutil.EmptyDir: New empty to delete the contents of a directory.

This new helper deletes the contents of a directory while leaving the
directory alone. It can exclude selected contents.

BUG= chromium:662214 
TEST=Unittests

Change-Id: I1a9794aa188b22ee50ba38da4c42091a6f26155e
Reviewed-on: https://chromium-review.googlesource.com/429878
Commit-Ready: Don Garrett <dgarrett@chromium.org>
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>

[modify] https://crrev.com/6caa9097769833af908f3ca380550fdb4e5542b5/lib/osutils_unittest.py
[modify] https://crrev.com/6caa9097769833af908f3ca380550fdb4e5542b5/lib/osutils.py

Comment 20 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 21 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 23 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment