New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 654933 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Oct 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

gslib: Remove is not tolerant of NotFoundExceptions when ignore_no_match=True

Project Member Reported by shchen@chromium.org, Oct 11 2016

Issue description

Getting weird gs:// address below.  Is that correct?

Traceback (most recent call last):
  File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 602, in TaskRunner
    task(*x, **task_kwargs)
  File "/b/cbuild/internal_master/chromite/cbuildbot/stages/release_stages.py", line 441, in _RunPaygenInProcess
    skip_duts_check=skip_duts_check)
  File "/b/cbuild/internal_master/chromite/lib/paygen/paygen_build_lib.py", line 1485, in CreatePayloads
    skip_duts_check=skip_duts_check).CreatePayloads()
  File "/b/cbuild/internal_master/chromite/lib/paygen/paygen_build_lib.py", line 1392, in CreatePayloads
    self._CleanupBuild()
  File "/b/cbuild/internal_master/chromite/lib/paygen/paygen_build_lib.py", line 1288, in _CleanupBuild
    recurse=True, ignore_no_match=True)
  File "/b/cbuild/internal_master/chromite/lib/paygen/dryrun_lib.py", line 45, in __call__
    return self.Run(func, *args, **kwargs)
  File "/b/cbuild/internal_master/chromite/lib/paygen/dryrun_lib.py", line 82, in Run
    return self._Call(func, *args, **kwargs)
  File "/b/cbuild/internal_master/chromite/lib/paygen/dryrun_lib.py", line 86, in _Call
    return func(*args, **kwargs)
  File "/b/cbuild/internal_master/chromite/lib/paygen/gslib.py", line 115, in RetryHandler
    return func(*args, **kwargs)
  File "/b/cbuild/internal_master/chromite/lib/paygen/gslib.py", line 414, in Remove
    RunGsutilCommand(args, failed_exception=RemoveFail, **kwargs)
  File "/b/cbuild/internal_master/chromite/lib/paygen/gslib.py", line 237, in RunGsutilCommand
    raise failed_exception('%r failed' % cmd if headers else e.result.error)
RemoveFail: Removing gs://chromeos-releases/canary-channel/cave/8888.0.0/payloads/signing/18157-140066496866112/payload.hash.tar.bz2#1476221150426000...
Removing gs://chromeos-releases/canary-channel/cave/8888.0.0/payloads/signing/18167-140066496866112/1.payload.hash.update_signer.signed.bin#1476222212119000...
NotFoundException: 404 gs://chromeos-releases/canary-channel/cave/8888.0.0/payloads/signing/18167-140066496866112/1.payload.hash.update_signer.signed.bin#1476222212119000 does not exist.

logfile:
https://uberchromegw.corp.google.com/i/chromeos/builders/cave-release/builds/475/steps/Paygen/logs/stdio
 
The thing that looks weird to me is that extra tag at the end of the URI. I'll have to trace and see where that came from...
This is the gsutil command that was run.

15:15:49: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil rm -R gs://chromeos-releases/canary-channel/cave/8888.0.0/payloads/signing

gsutil generated the above error message. The number after the '#' is an ID for a specific revision of the file that changes each time it is overwritten. The directory was eventually cleaned up, so I think this is an internal glitch in GS.

But, we shouldn't have failed the stage because of this. We could add another kwarg to the gslib remove method to ignore NotFoundExceptions. I see there's one for no match.

And maybe we could enable that for a recursive remove. If a user wants to remove a directory recursively, I don't think the user cares if a file exists or not.

What do you think? At least it'd prevent us barfing on these kind of GS glitches. 
Cc: xixuan@chromium.org
That's a good idea.

We might also file this with the GS team, but they will probably ask why we haven't moved the GS version forward to a newer version.

So... Xixuan, where to we stand on being able to use a newer version of gsutil?

Comment 5 by xixuan@chromium.org, Oct 12 2016

The CL is ready: https://chromium-review.googlesource.com/#/c/381693/, Do we want to try it now?
Is that the right CL? I'm not familiar with it, but I don't see how it's an upgrade of gsutil.
Sorry, there is background here.

We tried to roll out a new version of gsutil, and broke several code bases that use chromite to decide which version to use, but don't use our libraries. The new version changed some command line options, and the chromite libraries were updated to match, but not devserver and other users.

This broken everything, and so we rolled back. Xixuan has been working on making everyone use the chromite gs library, so that we can roll forward and not worry about these problems in the future. This will also mean that all the smart retries and things in gs.py will be shared everywhere. This is good, but it's hard because we use some of these libraries in a variety of environments that may/may not have a version of chromite, or might manage which revision it is badly.

It's relevant here, since that deletion bug might not have happened with a newer version of gsutil.
Labels: BuildHealth iptaskforce Week-1641 Week-1642 OS-Chrome
Status: Started (was: Untriaged)
Ah okay, thanks for the explanation of the background. I'm still going to move forward in creating a CL that will ignore those exceptions like I mentioned in c#3.
Good. I also think we should report this to the GS team.

We can do that with "go/bigstore-ticket", but need to include the gsutil logs, not the cbuildbot logs.
Summary: gslib: Remove is not tolerant of NotFoundExceptions when ignore_no_match=True (was: cave: paygen stage: file not found exception)
CL is posted here: https://chromium-review.googlesource.com/#/c/397864/

I'll try to gather the gsutil logs if you can point me to them. (Or do you mean just filtering them out from the buildbot logs?)
Actually, if we can't reproduce we can't get that output.

They really want to see the output of a debug run, which we don't do on the builders.
What are the drawbacks to running debug runs on the builders?
The debug logs include credential information we don't want to leave the builders. The GS people know this but say it can't be fixed without silencing the data they need from a lower level library.
Project Member

Comment 15 by bugdroid1@chromium.org, Oct 14 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/ce93e35f42f049d8120a47c5fbe1341b2516c7ab

commit ce93e35f42f049d8120a47c5fbe1341b2516c7ab
Author: Aseda Aboagye <aaboagye@google.com>
Date: Wed Oct 12 23:49:06 2016

gslib: ignore_no_match ignores NotFoundExceptions.

It's been observed that sometimes Google Storage may glitch and complain
about a specific revision of a file that is asked to be removed, but is
no longer there.  This throws a NotFoundException.

This commit will cause gslib to ignore NotFoundExceptions if the
ignore_no_match kwarg is set to True.

BUG= chromium:654933 
BRANCH=None
TEST=python -b gslib_unittest --network

Change-Id: If91967b14f7b0c441412456f070dee7f921f2bcd
Signed-off-by: Aseda Aboagye <aaboagye@google.com>
Reviewed-on: https://chromium-review.googlesource.com/397864
Commit-Ready: Aseda Aboagye <aaboagye@chromium.org>
Tested-by: Aseda Aboagye <aaboagye@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/ce93e35f42f049d8120a47c5fbe1341b2516c7ab/lib/paygen/gslib_unittest.py
[modify] https://crrev.com/ce93e35f42f049d8120a47c5fbe1341b2516c7ab/lib/paygen/gslib.py

Status: Verified (was: Started)
Verified in unit test. Feel free to re-open if it surfaces again.
FYI, according to my logs this problem occurred 12 times in the canaries since Aug 1st.  (I only check 2 out of 3 canary builds.)

Sign in to add a comment