gslib: Remove is not tolerant of NotFoundExceptions when ignore_no_match=True |
|||||
Issue description
Getting weird gs:// address below. Is that correct?
Traceback (most recent call last):
File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 602, in TaskRunner
task(*x, **task_kwargs)
File "/b/cbuild/internal_master/chromite/cbuildbot/stages/release_stages.py", line 441, in _RunPaygenInProcess
skip_duts_check=skip_duts_check)
File "/b/cbuild/internal_master/chromite/lib/paygen/paygen_build_lib.py", line 1485, in CreatePayloads
skip_duts_check=skip_duts_check).CreatePayloads()
File "/b/cbuild/internal_master/chromite/lib/paygen/paygen_build_lib.py", line 1392, in CreatePayloads
self._CleanupBuild()
File "/b/cbuild/internal_master/chromite/lib/paygen/paygen_build_lib.py", line 1288, in _CleanupBuild
recurse=True, ignore_no_match=True)
File "/b/cbuild/internal_master/chromite/lib/paygen/dryrun_lib.py", line 45, in __call__
return self.Run(func, *args, **kwargs)
File "/b/cbuild/internal_master/chromite/lib/paygen/dryrun_lib.py", line 82, in Run
return self._Call(func, *args, **kwargs)
File "/b/cbuild/internal_master/chromite/lib/paygen/dryrun_lib.py", line 86, in _Call
return func(*args, **kwargs)
File "/b/cbuild/internal_master/chromite/lib/paygen/gslib.py", line 115, in RetryHandler
return func(*args, **kwargs)
File "/b/cbuild/internal_master/chromite/lib/paygen/gslib.py", line 414, in Remove
RunGsutilCommand(args, failed_exception=RemoveFail, **kwargs)
File "/b/cbuild/internal_master/chromite/lib/paygen/gslib.py", line 237, in RunGsutilCommand
raise failed_exception('%r failed' % cmd if headers else e.result.error)
RemoveFail: Removing gs://chromeos-releases/canary-channel/cave/8888.0.0/payloads/signing/18157-140066496866112/payload.hash.tar.bz2#1476221150426000...
Removing gs://chromeos-releases/canary-channel/cave/8888.0.0/payloads/signing/18167-140066496866112/1.payload.hash.update_signer.signed.bin#1476222212119000...
NotFoundException: 404 gs://chromeos-releases/canary-channel/cave/8888.0.0/payloads/signing/18167-140066496866112/1.payload.hash.update_signer.signed.bin#1476222212119000 does not exist.
logfile:
https://uberchromegw.corp.google.com/i/chromeos/builders/cave-release/builds/475/steps/Paygen/logs/stdio
,
Oct 12 2016
This is the gsutil command that was run. 15:15:49: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil rm -R gs://chromeos-releases/canary-channel/cave/8888.0.0/payloads/signing gsutil generated the above error message. The number after the '#' is an ID for a specific revision of the file that changes each time it is overwritten. The directory was eventually cleaned up, so I think this is an internal glitch in GS.
,
Oct 12 2016
But, we shouldn't have failed the stage because of this. We could add another kwarg to the gslib remove method to ignore NotFoundExceptions. I see there's one for no match. And maybe we could enable that for a recursive remove. If a user wants to remove a directory recursively, I don't think the user cares if a file exists or not. What do you think? At least it'd prevent us barfing on these kind of GS glitches.
,
Oct 12 2016
That's a good idea. We might also file this with the GS team, but they will probably ask why we haven't moved the GS version forward to a newer version. So... Xixuan, where to we stand on being able to use a newer version of gsutil?
,
Oct 12 2016
The CL is ready: https://chromium-review.googlesource.com/#/c/381693/, Do we want to try it now?
,
Oct 12 2016
Is that the right CL? I'm not familiar with it, but I don't see how it's an upgrade of gsutil.
,
Oct 12 2016
Sorry, there is background here. We tried to roll out a new version of gsutil, and broke several code bases that use chromite to decide which version to use, but don't use our libraries. The new version changed some command line options, and the chromite libraries were updated to match, but not devserver and other users. This broken everything, and so we rolled back. Xixuan has been working on making everyone use the chromite gs library, so that we can roll forward and not worry about these problems in the future. This will also mean that all the smart retries and things in gs.py will be shared everywhere. This is good, but it's hard because we use some of these libraries in a variety of environments that may/may not have a version of chromite, or might manage which revision it is badly.
,
Oct 12 2016
It's relevant here, since that deletion bug might not have happened with a newer version of gsutil.
,
Oct 12 2016
Ah okay, thanks for the explanation of the background. I'm still going to move forward in creating a CL that will ignore those exceptions like I mentioned in c#3.
,
Oct 12 2016
Good. I also think we should report this to the GS team. We can do that with "go/bigstore-ticket", but need to include the gsutil logs, not the cbuildbot logs.
,
Oct 13 2016
CL is posted here: https://chromium-review.googlesource.com/#/c/397864/ I'll try to gather the gsutil logs if you can point me to them. (Or do you mean just filtering them out from the buildbot logs?)
,
Oct 13 2016
Actually, if we can't reproduce we can't get that output. They really want to see the output of a debug run, which we don't do on the builders.
,
Oct 13 2016
What are the drawbacks to running debug runs on the builders?
,
Oct 13 2016
The debug logs include credential information we don't want to leave the builders. The GS people know this but say it can't be fixed without silencing the data they need from a lower level library.
,
Oct 14 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/ce93e35f42f049d8120a47c5fbe1341b2516c7ab commit ce93e35f42f049d8120a47c5fbe1341b2516c7ab Author: Aseda Aboagye <aaboagye@google.com> Date: Wed Oct 12 23:49:06 2016 gslib: ignore_no_match ignores NotFoundExceptions. It's been observed that sometimes Google Storage may glitch and complain about a specific revision of a file that is asked to be removed, but is no longer there. This throws a NotFoundException. This commit will cause gslib to ignore NotFoundExceptions if the ignore_no_match kwarg is set to True. BUG= chromium:654933 BRANCH=None TEST=python -b gslib_unittest --network Change-Id: If91967b14f7b0c441412456f070dee7f921f2bcd Signed-off-by: Aseda Aboagye <aaboagye@google.com> Reviewed-on: https://chromium-review.googlesource.com/397864 Commit-Ready: Aseda Aboagye <aaboagye@chromium.org> Tested-by: Aseda Aboagye <aaboagye@chromium.org> Reviewed-by: Don Garrett <dgarrett@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/ce93e35f42f049d8120a47c5fbe1341b2516c7ab/lib/paygen/gslib_unittest.py [modify] https://crrev.com/ce93e35f42f049d8120a47c5fbe1341b2516c7ab/lib/paygen/gslib.py
,
Oct 14 2016
Verified in unit test. Feel free to re-open if it surfaces again.
,
Oct 14 2016
FYI, according to my logs this problem occurred 12 times in the canaries since Aug 1st. (I only check 2 out of 3 canary builds.) |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by aaboagye@chromium.org
, Oct 11 2016