New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 921764 link

Starred by 2 users

Issue metadata

Status: Unconfirmed
Merged: issue 917099
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

failure of cros_mark_as_stable is hidden in the log

Project Member Reported by semenzato@chromium.org, Jan 14

Issue description

This is problematic for sheriffs because there's no clue as to what should be done next (other than filing this bug).

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924362051354688864

https://luci-logdog.appspot.com/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8924362051354688864/+/steps/Uprev/0/stdout

02:58:11: INFO: Old and new ebuild /b/swarming/w/ir/cache/cbuild/repository/src/private-overlays/overlay-fizz-private/chromeos-base/chromeos-bsp-fizz-private/chromeos-bsp-fizz-private-0.0.1-r9.ebuild are exactly identical; skipping uprev
02:58:07: INFO: Skip: Determined that none of the ebuild chromeos-config-bsp-fizz-private rev_subdirs was touched []
02:58:07: INFO: Skipping uprev of ebuild chromeos-config-bsp-fizz-private, none of the rev_subdirs have been modified, no files/, nor has the -9999 ebuild.
02:58:07: INFO: Determining whether to create new ebuild /b/swarming/w/ir/cache/cbuild/repository/src/private-overlays/overlay-fizz-private/chromeos-base/chromeos-firmware-fizz/chromeos-firmware-fizz-0.0.1-r187.ebuild
02:58:07: INFO: Old and new ebuild /b/swarming/w/ir/cache/cbuild/repository/src/private-overlays/overlay-fizz-private/chromeos-base/chromeos-firmware-fizz/chromeos-firmware-fizz-0.0.1-r187.ebuild are exactly identical; skipping uprev
cros_mark_as_stable: Unhandled exception:
Traceback (most recent call last):
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_mark_as_stable", line 170, in <module>
    DoMain()
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_mark_as_stable", line 166, in DoMain
    commandline.ScriptWrapperMain(FindTarget)
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/commandline.py", line 912, in ScriptWrapperMain
    ret = target(argv[1:])
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/scripts/cros_mark_as_stable.py", line 306, in main
    git_project_overlays, manifest, package_list)
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/scripts/cros_mark_as_stable.py", line 367, in _WorkOnCommit
    parallel.RunTasksInProcessPool(_CommitOverlays, inputs)
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/parallel.py", line 809, in RunTasksInProcessPool
    queue.put((idx, input_args))
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/parallel.py", line 750, in BackgroundTaskRunner
    queue.put(_AllTasksComplete())
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/parallel.py", line 750, in BackgroundTaskRunner
    queue.put(_AllTasksComplete())
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/parallel.py", line 561, in ParallelTasks
    raise BackgroundFailure(exc_infos=errors)
chromite.lib.parallel.BackgroundFailure: <class 'chromite.lib.cros_build_lib.DieSystemExit'>: 1
Traceback (most recent call last):
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/parallel.py", line 602, in TaskRunner
    task(*x, **task_kwargs)
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/parallel.py", line 800, in <lambda>
    fn = lambda idx, task_args: out_queue.put((idx, task(*task_args)))
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/scripts/cros_mark_as_stable.py", line 496, in _WorkOnEbuild
    manifest)
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/portage_util.py", line 959, in RevWorkOnEBuild
    self.version_no_rev)
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/portage_util.py", line 897, in GetVersion
    (self.pkgname, ' '.join(srcdirs)))
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/cros_build_lib.py", line 683, in Die
    raise DieSystemExit(1)
DieSystemExit: 1

02:58:47: ERROR: 
return code: 1; command: /b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_mark_as_stable commit --all '--boards=scarlet' '--drop_file=/b/swarming/w/ir/cache/cbuild/repository/src/scripts/cbuildbot_package.list' --buildroot /b/swarming/w/ir/cache/cbuild/repository --overlay-type both
cmd=['/b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_mark_as_stable', 'commit', '--all', u'--boards=scarlet', '--drop_file=/b/swarming/w/ir/cache/cbuild/repository/src/scripts/cbuildbot_package.list', '--buildroot', '/b/swarming/w/ir/cache/cbuild/repository', '--overlay-type', u'both'], cwd=/b/swarming/w/ir/cache/cbuild/repository

02:58:47: ERROR: /b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_mark_as_stable failed (code=1)
02:58:47: INFO: Translating result /b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_mark_as_stable failed (code=1) to fail.


 
Components: -Infra>Client>ChromeOS>CI Infra>Client>ChromeOS>Build
Status: Available (was: Untriaged)
Mergedinto: 917099
Status: Duplicate (was: Available)
the error is in the log:
02:58:15: ERROR: Package chromeos-kernel-3_10 has a chromeos-version.sh script but it returned no valid version for "/b/swarming/w/ir/cache/cbuild/repository/src/third_party/kernel/v3.10"
Status: Unconfirmed (was: Duplicate)
Summary: failure of cros_mark_as_stable is hidden in the log (was: scarlet-release: failure of cros_mark_as_stable with no error reported)
Thank you Mike, indeed the error is there, but it is preceded by 450 lines of log messages repeating almost identically, and followed by 1250 lines of the same.

02:57:42: INFO: Determining whether to create new ebuild /b/swarming/w/ir/cache/cbuild/repository/src/private-overlays/overlay-sand-private/chromeos-base/chromeos-firmware-sand/chromeos-firmware-sand-0.0.1-r172.ebuild
02:57:42: INFO: Old and new ebuild /b/swarming/w/ir/cache/cbuild/repository/src/private-overlays/overlay-sand-private/chromeos-base/chromeos-firmware-sand/chromeos-firmware-sand-0.0.1-r172.ebuild are exactly identical; skipping uprev

It would be nice if the error could also be reported near the end, in addition to this:

02:58:47: ERROR: 
return code: 1; command: /b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_mark_as_stable commit --all '--boards=scarlet' '--drop_file=/b/swarming/w/ir/cache/cbuild/repository/src/scripts/cbuildbot_package.list' --buildroot /b/swarming/w/ir/cache/cbuild/repository --overlay-type both
cmd=['/b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_mark_as_stable', 'commit', '--all', u'--boards=scarlet', '--drop_file=/b/swarming/w/ir/cache/cbuild/repository/src/scripts/cbuildbot_package.list', '--buildroot', '/b/swarming/w/ir/cache/cbuild/repository', '--overlay-type', u'both'], cwd=/b/swarming/w/ir/cache/cbuild/repository
[1;31m02:58:47: ERROR: /b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_mark_as_stable failed (code=1)

(I am not insisting on this---you be the judge.)
Cc: -akes...@chromium.org
Labels: -Pri-2 Pri-3
it's not easy to do this because of the way the code is structured now in order to enable parallelism.  i think performance is more important here than having to remember to search for "error" log lines.
It's not clear why the parallelism is an issue.  There is an error code reported at the end, it's only the error message that doesn't make it.
the parallel module is bubbling up the exception, not the error/output directly
Labels: -Pri-3 Pri-1
This is also causing a critical failure in the ability for the release branches to uprev on M72 (and possibly the upcoming M73?):
https://luci-scheduler.appspot.com/jobs/chromeos/release-R72-11316.B-samus-chrome-pre-flight-branch

On what is normally/previously (M71) a very quiet PFQ:
https://luci-scheduler.appspot.com/jobs/chromeos/release-R71-11151.B-samus-chrome-pre-flight-branch

Also, on this topic, I mostly stumbled upon this myself while putting the pieces together for M72.  Is there any kind alerting in place for these kinds of build failures so they can be identified and addressed as soon as they start happening?

Moving this to P1 as this affects the ability to get daily release candidates.
Cc: cindyb@chromium.org bhthompson@chromium.org kbleicher@chromium.org
Labels: -Pri-1 Pri-3
this bug is only about how it's reported, it isn't about the actual failure (which is a dupe -- see above).  so inability to search for the string "error" in the log is not a blocker.
Agree with #10 that this doesn't seem like P1 to me. However, on the topic of parallelism distorting the log messages, can we address it using s.t like https://stackoverflow.com/a/641488

Sign in to add a comment