archive step build_image failure: mkdir: cannot create directory "/mnt/host/source/src/build/images/gru/R57-9177.0.0-a3/rootfs/opt/google/containers": Permission denied |
|||||||||||||||||
Issue descriptionThis is a snippet of the archive step log for gru-release https://uberchromegw.corp.google.com/i/chromeos/builders/gru-release/builds/786/steps/Archive/logs/stdio From the snippet I can guess that "gsutil cp" failed, since it's the last command before the python stack trace. But the code invoking gsutil cp should (and should be able to) report the exact failure, otherwise a lot of people waste time either confirming it, or discovering a different root failure. 05:49:54: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/gru-release/R57-9177.0.0/stripped-packages.tar gs://chromeos-moblab-rockchip/gru-release/R57-9177.0.0/stripped-packages.tar 05:50:04: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/gru-release/R57-9177.0.0/stripped-packages.tar gs://chromeos-moblab-inventec/gru-release/R57-9177.0.0/stripped-packages.tar 05:50:13: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/gru-release/R57-9177.0.0/UPLOADED gs://chromeos-image-archive/gru-release/R57-9177.0.0/UPLOADED 05:50:18: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/gru-release/R57-9177.0.0/UPLOADED gs://chromeos-moblab-rockchip/gru-release/R57-9177.0.0/UPLOADED 05:50:25: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/gru-release/R57-9177.0.0/UPLOADED gs://chromeos-moblab-inventec/gru-release/R57-9177.0.0/UPLOADED @@@STEP_FAILURE@@@ 06:10:11: ERROR: <class 'chromite.lib.failures_lib.BuildScriptFailure'>: ./build_image failed (code=1) Traceback (most recent call last): File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 441, in _Run self._task(*self._task_args, **self._task_kwargs) File "/b/cbuild/internal_master/chromite/cbuildbot/stages/artifact_stages.py", line 205, in BuildAndArchiveFactoryImages alias = commands.BuildFactoryInstallImage(buildroot, board, extra_env) File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 1928, in BuildFactoryInstallImage enter_chroot=True) File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 139, in RunBuildScript raise failures_lib.BuildScriptFailure(ex, cmd[0]) File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 139, in RunBuildScript raise failures_lib.BuildScriptFailure(ex, cmd[0]) BuildScriptFailure: ./build_image failed (code=1) 06:10:11: INFO: Running cidb query on pid 19861, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x3a80850> 06:10:16: INFO: cidb query succeeded after 1 retries
,
Jan 13 2017
The function RunBuildScript in the stack above has this code:
try:
return runcmd(cmd, **kwargs)
except cros_build_lib.RunCommandError as ex:
# Print the original exception.
logging.error('\n%s', ex)
# Check whether a specific package failed. If so, wrap the exception
# appropriately. These failures are usually caused by a recent CL, so we
# don't ever treat these failures as flaky.
if status_file is not None:
status_file.seek(0)
failed_packages = status_file.read().split()
if failed_packages:
raise failures_lib.PackageBuildFailure(ex, cmd[0], failed_packages)
# Looks like a generic failure. Raise a BuildScriptFailure.
raise failures_lib.BuildScriptFailure(ex, cmd[0])
It looks like it should be logging the original exception, and then wrapping it into a BuildScriptFailure. But I don't see the original exception.
Interesting: in chromite/lib/failure_lib.py:
class BuildScriptFailure(StepFailure):
"""This exception is thrown when a build command failed.
It is intended to provide a shorter summary of what command failed,
for usage in failure messages from the Commit Queue, so as to ensure
that developers aren't spammed with giant error messages when common
commands (e.g. build_packages) fail.
"""
I tried git blame to understand these changes better, but code was moved too much...
Anyway, this seems to be in conflict with the (perceived) logging strategy, which seems one of throwing *everything* into the logs. Not sure what "developers" this refers too, but sheriffs have to wade through huge logs already, and filtering failure information seems pointless.
Aviv can you elaborate?
,
Jan 14 2017
I am seeing more of these in the latest canary builds.
,
Jan 14 2017
,
Jan 14 2017
,
Jan 14 2017
Also see issue 680849 . That bug is actually two bugs I think. One was fixed (the udev/udevadm problem) but the other one (same as this one) is still happening, although it was masked by the "all devservers are down" failures.
,
Jan 17 2017
I am seeing many canary failures in the Archive step with this error. For instance: https://uberchromegw.corp.google.com/i/chromeos/builders/cave-release/builds/768/steps/Archive/logs/stdio https://uberchromegw.corp.google.com/i/chromeos/builders/terra-release/builds/771/steps/Archive/logs/stdio I'll be gathering more stats, but right now we have 32 red canaries and I am guessing most of them are failing with this error.
,
Jan 17 2017
,
Jan 17 2017
29 of them:
auron_paine/767 build_image
celes/767 build_image
cave/768 build_image
lulu/768 build_image
setzer/772 build_image
kevin/776 build_image
gru/802 build_image
elm/769 build_image
gandof/759 build_image
oak/789 build_image
relm/769 build_image
kefka/771 build_image
veyron_jaq/769 build_image
veyron_mighty/769 build_image
asuka/767 build_image
terra/771 build_image
veyron_minnie/769 build_image
ultima/769 build_image
veyron_speedy/767 build_image
samus/3938 build_image
auron_yuna/767 build_image
reks/767 build_image
caroline/320 build_image
wizpig/775 build_image
sentry/767 build_image
edgar/771 build_image
umaro/770 build_image
veyron_jerry/768 build_image
banon/767 build_image
,
Jan 17 2017
I don't see a clear error of that build_image command line, the massive failure started around last Wednesday (1/11). Changes in chromite around that time are: https://chromium-review.googlesource.com/#/c/422958/ (this is not related to build_image) https://chromium-review.googlesource.com/#/c/427382/ (only affect report stage, not archive) https://chromium-review.googlesource.com/#/c/425849/ (seems only related to paygen) has anyone tried to build_image in local chroot?
,
Jan 17 2017
Issue 681632 has been merged into this issue.
,
Jan 17 2017
,
Jan 17 2017
this is the failure: mkdir: cannot create directory ‘/mnt/host/source/src/build/images/gru/R57-9177.0.0-a3/rootfs/opt/google/containers’: Permission denied it was mentioned in issue 680849 , but i don't think was followed up as the focus was on the recovery image failure.
,
Jan 17 2017
,
Jan 17 2017
domlaskowski@ created https://chrome-internal-review.googlesource.com/#/c/317535/ to fix that.
,
Jan 18 2017
should be duped with https://bugs.chromium.org/p/chromium/issues/detail?id=676152 if this is the failure ?
,
Jan 18 2017
,
Jan 18 2017
,
Jan 18 2017
Let us wait to see how the morning canaries go by 2pm. We will close this out if we see we have all images by that build.
,
Jan 18 2017
We have successful Archive steps! https://chromegw.corp.google.com/i/chromeos/builders/oak-release/builds/793 https://chromegw.corp.google.com/i/chromeos/builders/elm-release/builds/773 (nevermind the VMTest sanity failures: "All devservers are currently down").
,
Jan 18 2017
We had a devserver downtime earlier (b/34389707). They are back online as of now.
,
Jan 19 2017
,
Aug 1 2017
,
Jan 22 2018
|
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by semenzato@chromium.org
, Jan 13 2017