New issue
Advanced search Search tips

Issue 680248 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: Feb 2017
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Error in self._Finish not properly handled.

Project Member Reported by dgarr...@chromium.org, Jan 11 2017

Issue description

We saw the same error on several builders, which all displayed it incorrectly.

https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/gcc_toolchain/builds/163
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/llvm_next_toolchain/builds/53
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/llvm_next_toolchain/builds/52

It appears that the failure was raised in the stages's _Finish() method, but that GenericStage.Run() and/or ._RunStage() didn't catch and handle it appropriately. This meant that no stage was blamed for the failure, and our waterfall incorrectly displayed success for whatever stage was at fault.

Very confusing.


Exception thrown, but all stages marked successful. This is an internal error,
because the stage that threw the exception should be marked as failing.
cbuildbot: Unhandled exception:
Traceback (most recent call last):
  File "chromite/bin/cbuildbot", line 168, in <module>
    DoMain()
  File "chromite/bin/cbuildbot", line 164, in DoMain
    commandline.ScriptWrapperMain(FindTarget)
  File "/b/cbuild/internal_master/chromite/lib/commandline.py", line 834, in ScriptWrapperMain
    ret = target(argv[1:])
  File "/b/cbuild/internal_master/chromite/scripts/cbuildbot.py", line 1317, in main
    logging.info('One stage exited early: %s', ex)
  File "/b/cbuild/internal_master/chromite/scripts/cbuildbot.py", line 1314, in main
    _RunBuildStagesWrapper(options, site_config, build_config)
  File "/b/cbuild/internal_master/chromite/scripts/cbuildbot.py", line 248, in _RunBuildStagesWrapper
    if not builder.Run():
  File "/b/cbuild/internal_master/chromite/cbuildbot/builders/generic_builders.py", line 316, in Run
    self.RunStages()
  File "/b/cbuild/internal_master/chromite/cbuildbot/builders/simple_builders.py", line 345, in RunStages
    self._RunDefaultTypeBuild()
  File "/b/cbuild/internal_master/chromite/cbuildbot/builders/simple_builders.py", line 330, in _RunDefaultTypeBuild
    self.RunEarlySyncAndSetupStages()
  File "/b/cbuild/internal_master/chromite/cbuildbot/builders/simple_builders.py", line 264, in RunEarlySyncAndSetupStages
    self._RunStage(android_stages.AndroidMetadataStage)
  File "/b/cbuild/internal_master/chromite/cbuildbot/builders/generic_builders.py", line 107, in _RunStage
    return stage_instance.Run()
  File "/b/cbuild/internal_master/chromite/cbuildbot/stages/generic_stages.py", line 648, in Run
    self._Finish()
  File "/b/cbuild/internal_master/chromite/cbuildbot/stages/android_stages.py", line 144, in _Finish
    self._WriteAndroidVersionToMetadata()
  File "/b/cbuild/internal_master/chromite/cbuildbot/stages/android_stages.py", line 137, in _WriteAndroidVersionToMetadata
    self.UploadMetadata(filename=constants.PARTIAL_METADATA_JSON)
  File "/b/cbuild/internal_master/chromite/lib/failures_lib.py", line 172, in wrapped_functor
    return functor(*args, **kwargs)
  File "/b/cbuild/internal_master/chromite/cbuildbot/stages/generic_stages.py", line 1056, in UploadMetadata
    self.UploadArtifact(filename, archive=False)
  File "/b/cbuild/internal_master/chromite/lib/failures_lib.py", line 172, in wrapped_functor
    return functor(*args, **kwargs)
  File "/b/cbuild/internal_master/chromite/cbuildbot/stages/generic_stages.py", line 992, in UploadArtifact
    update_list=True, acl=self.acl)
  File "/b/cbuild/internal_master/chromite/lib/failures_lib.py", line 172, in wrapped_functor
    return functor(*args, **kwargs)
  File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 1843, in UploadArchivedFile
    _UploadPathToGS(file_path, upload_urls, debug, timeout, acl=acl)
  File "/b/cbuild/internal_master/chromite/lib/failures_lib.py", line 187, in wrapped_functor
    raise self.category_exception(exc_infos=exc_infos)
chromite.lib.failures_lib.GSUploadFailure: <class 'chromite.lib.gs.GSCommandError'>: return code: 1; command: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/trybot_archive/trybot-squawks-llvm-next-toolchain/R57-9170.0.0-b52/partial-metadata.json gs://chromeos-image-archive/trybot-squawks-llvm-next-toolchain/R57-9170.0.0-b52/partial-metadata.json
Copying file:///b/cbuild/internal_master/trybot_archive/trybot-squawks-llvm-next-toolchain/R57-9170.0.0-b52/partial-metadata.json [Content-Type=text/plain]...
Uploading   ...toolchain/R57-9170.0.0-b52/partial-metadata.json: 0 B/1.79 KiB    
Uploading   ...toolchain/R57-9170.0.0-b52/partial-metadata.json: 1.79 KiB/1.79 KiB    
ServiceException: 503 Backend Error
CommandException: 1 file/object could not be transferred.

cmd=['/b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil', '-o', 'Boto:num_retries=10', '-m', 'cp', '-v', '--', '/b/cbuild/internal_master/trybot_archive/trybot-squawks-llvm-next-toolchain/R57-9170.0.0-b52/partial-metadata.json', 'gs://chromeos-image-archive/trybot-squawks-llvm-next-toolchain/R57-9170.0.0-b52/partial-metadata.json'], extra env={'BOTO_CONFIG': '/b/build/site_config/.boto'}
Traceback (most recent call last):
  File "/b/cbuild/internal_master/chromite/lib/failures_lib.py", line 172, in wrapped_functor
    return functor(*args, **kwargs)
  File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 1824, in _UploadPathToGS
    recursive=True)
  File "/b/cbuild/internal_master/chromite/lib/gs.py", line 584, in CopyInto
    **kwargs)
  File "/b/cbuild/internal_master/chromite/lib/gs.py", line 874, in Copy
    raise
  File "/b/cbuild/internal_master/chromite/lib/gs.py", line 851, in Copy
    result = self.DoCommand(cmd, **kwargs)
  File "/b/cbuild/internal_master/chromite/lib/gs.py", line 782, in DoCommand
    raise GSCommandError(e.msg, e.result, e.exception)
GSCommandError: return code: 1; command: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/trybot_archive/trybot-squawks-llvm-next-toolchain/R57-9170.0.0-b52/partial-metadata.json gs://chromeos-image-archive/trybot-squawks-llvm-next-toolchain/R57-9170.0.0-b52/partial-metadata.json
Copying file:///b/cbuild/internal_master/trybot_archive/trybot-squawks-llvm-next-toolchain/R57-9170.0.0-b52/partial-metadata.json [Content-Type=text/plain]...
Uploading   ...toolchain/R57-9170.0.0-b52/partial-metadata.json: 0 B/1.79 KiB    
Uploading   ...toolchain/R57-9170.0.0-b52/partial-metadata.json: 1.79 KiB/1.79 KiB    
ServiceException: 503 Backend Error
CommandException: 1 file/object could not be transferred.
 
Owner: dgarr...@chromium.org
Status: Started (was: Untriaged)
I found that _Finish() declares that it can be overridden, and two stages do so. "SyncChromeStage" and "AndroidMetadataStage".

However, we don't invoke this method until after error handling is over, and the default does nothing except log that the stage is finished.

Both Chrome/Android stages use _Finish to ensure that metadata is updated, even if the stage is failing.
Project Member

Comment 3 by bugdroid1@chromium.org, Jan 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/c5c5f25f35f4c3cdf034b69f4077ce8ac8b988a0

commit c5c5f25f35f4c3cdf034b69f4077ce8ac8b988a0
Author: Don Garrett <dgarrett@google.com>
Date: Thu Jan 12 01:36:34 2017

SyncChromeStage, AndroidMetadataStage: Make _Finish error safe.

The _Finish method is available for stages to override, but errors in
it are not associated with the stage in question. This leads to very
bad waterfall reporting, confusion, and finally, cats and dogs living
together.

Stop the only two users of this method from raising errors if GS is
behaving badly.

BUG= chromium:680248 
TEST=Unittests

Change-Id: Id3b0696e057aae7fe4c5483d629b056f43c42485
Reviewed-on: https://chromium-review.googlesource.com/427384
Commit-Ready: Don Garrett <dgarrett@chromium.org>
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/c5c5f25f35f4c3cdf034b69f4077ce8ac8b988a0/cbuildbot/stages/android_stages.py
[modify] https://crrev.com/c5c5f25f35f4c3cdf034b69f4077ce8ac8b988a0/cbuildbot/stages/chrome_stages.py

Comment 4 by autumn@chromium.org, Jan 17 2017

Labels: -current-issue
Labels: Hotlist-Fixit
Status: Available (was: Started)
Short term fix is in place, but a more general cleanup would be better.

In short, either ensure that we properly handle errors in _Finish, or (better) just get rid of the method.
Status: Started (was: Available)
Labels: cros-infra-fixedit-q117
Status: Fixed (was: Started)

Comment 8 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 9 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 11 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment