Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Starred by 8 users
Status: Fixed
Owner:
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug

Blocking:
issue 676152



Sign in to add a comment
archive step build_image failure: mkdir: cannot create directory "/mnt/host/source/src/build/images/gru/R57-9177.0.0-a3/rootfs/opt/google/containers": Permission denied
Project Member Reported by semenzato@chromium.org, Jan 12 2017 Back to list
This is a snippet of the archive step log for gru-release

https://uberchromegw.corp.google.com/i/chromeos/builders/gru-release/builds/786/steps/Archive/logs/stdio

From the snippet I can guess that "gsutil cp" failed, since it's the last command before the python stack trace.  But the code invoking gsutil cp should (and should be able to) report the exact failure, otherwise a lot of people waste time either confirming it, or discovering a different root failure.

05:49:54: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/gru-release/R57-9177.0.0/stripped-packages.tar gs://chromeos-moblab-rockchip/gru-release/R57-9177.0.0/stripped-packages.tar
05:50:04: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/gru-release/R57-9177.0.0/stripped-packages.tar gs://chromeos-moblab-inventec/gru-release/R57-9177.0.0/stripped-packages.tar
05:50:13: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/gru-release/R57-9177.0.0/UPLOADED gs://chromeos-image-archive/gru-release/R57-9177.0.0/UPLOADED
05:50:18: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/gru-release/R57-9177.0.0/UPLOADED gs://chromeos-moblab-rockchip/gru-release/R57-9177.0.0/UPLOADED
05:50:25: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/gru-release/R57-9177.0.0/UPLOADED gs://chromeos-moblab-inventec/gru-release/R57-9177.0.0/UPLOADED

@@@STEP_FAILURE@@@
06:10:11: ERROR: <class 'chromite.lib.failures_lib.BuildScriptFailure'>: ./build_image failed (code=1)
Traceback (most recent call last):
  File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 441, in _Run
    self._task(*self._task_args, **self._task_kwargs)
  File "/b/cbuild/internal_master/chromite/cbuildbot/stages/artifact_stages.py", line 205, in BuildAndArchiveFactoryImages
    alias = commands.BuildFactoryInstallImage(buildroot, board, extra_env)
  File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 1928, in BuildFactoryInstallImage
    enter_chroot=True)
  File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 139, in RunBuildScript
    raise failures_lib.BuildScriptFailure(ex, cmd[0])
  File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 139, in RunBuildScript
    raise failures_lib.BuildScriptFailure(ex, cmd[0])
BuildScriptFailure: ./build_image failed (code=1)

06:10:11: INFO: Running cidb query on pid 19861, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x3a80850>
06:10:16: INFO: cidb query succeeded after 1 retries
 

Another example (probably exactly the same thing):

https://uberchromegw.corp.google.com/i/chromeos/builders/sentry-release/builds/751/steps/Archive/logs/stdio

05:53:16: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/sentry-release/R57-9177.0.0/stripped-packages.tar gs://chromeos-moblab-quanta/sentry-release/R57-9177.0.0/stripped-packages.tar
05:53:30: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/sentry-release/R57-9177.0.0/stripped-packages.tar gs://chromeos-moblab-skhynix/sentry-release/R57-9177.0.0/stripped-packages.tar
05:53:40: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/sentry-release/R57-9177.0.0/UPLOADED gs://chromeos-image-archive/sentry-release/R57-9177.0.0/UPLOADED
05:53:46: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/sentry-release/R57-9177.0.0/UPLOADED gs://chromeos-moblab-intel/sentry-release/R57-9177.0.0/UPLOADED
05:53:53: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/sentry-release/R57-9177.0.0/UPLOADED gs://chromeos-moblab-quanta/sentry-release/R57-9177.0.0/UPLOADED
05:54:00: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/cbuild/internal_master/buildbot_archive/sentry-release/R57-9177.0.0/UPLOADED gs://chromeos-moblab-skhynix/sentry-release/R57-9177.0.0/UPLOADED

@@@STEP_FAILURE@@@
06:18:03: ERROR: <class 'chromite.lib.failures_lib.BuildScriptFailure'>: ./build_image failed (code=1)
Traceback (most recent call last):
  File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 441, in _Run
    self._task(*self._task_args, **self._task_kwargs)
  File "/b/cbuild/internal_master/chromite/cbuildbot/stages/artifact_stages.py", line 205, in BuildAndArchiveFactoryImages
    alias = commands.BuildFactoryInstallImage(buildroot, board, extra_env)
  File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 1928, in BuildFactoryInstallImage
    enter_chroot=True)
  File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 139, in RunBuildScript
    raise failures_lib.BuildScriptFailure(ex, cmd[0])
  File "/b/cbuild/internal_master/chromite/cbuildbot/commands.py", line 139, in RunBuildScript
    raise failures_lib.BuildScriptFailure(ex, cmd[0])
BuildScriptFailure: ./build_image failed (code=1)

Cc: akes...@chromium.org
The function RunBuildScript in the stack above has this code:

    try:
      return runcmd(cmd, **kwargs)
    except cros_build_lib.RunCommandError as ex:
      # Print the original exception.
      logging.error('\n%s', ex)

      # Check whether a specific package failed. If so, wrap the exception
      # appropriately. These failures are usually caused by a recent CL, so we
      # don't ever treat these failures as flaky.
      if status_file is not None:
        status_file.seek(0)
        failed_packages = status_file.read().split()
        if failed_packages:
          raise failures_lib.PackageBuildFailure(ex, cmd[0], failed_packages)

      # Looks like a generic failure. Raise a BuildScriptFailure.
      raise failures_lib.BuildScriptFailure(ex, cmd[0])

It looks like it should be logging the original exception, and then wrapping it into a BuildScriptFailure.  But I don't see the original exception.

Interesting: in chromite/lib/failure_lib.py:

class BuildScriptFailure(StepFailure):
  """This exception is thrown when a build command failed.

  It is intended to provide a shorter summary of what command failed,
  for usage in failure messages from the Commit Queue, so as to ensure
  that developers aren't spammed with giant error messages when common
  commands (e.g. build_packages) fail.
  """

I tried git blame to understand these changes better, but code was moved too much...

Anyway, this seems to be in conflict with the (perceived) logging strategy, which seems one of throwing *everything* into the logs.  Not sure what "developers" this refers too, but sheriffs have to wade through huge logs already, and filtering failure information seems pointless.

Aviv can you elaborate?


Labels: -Pri-3 Pri-2
Owner: dgarr...@chromium.org
I am seeing more of these in the latest canary builds.

Labels: -Pri-2 Pri-0
Labels: -Pri-0 Pri-2
Labels: -Pri-2 Pri-1
Also see  issue 680849 .  That bug is actually two bugs I think.  One was fixed (the udev/udevadm problem) but the other one (same as this one) is still happening, although it was masked by the "all devservers are down" failures.
Cc: dgarr...@chromium.org
Labels: -Pri-1 Pri-0
Owner: dshi@chromium.org
I am seeing many canary failures in the Archive step with this error.

For instance:

https://uberchromegw.corp.google.com/i/chromeos/builders/cave-release/builds/768/steps/Archive/logs/stdio

https://uberchromegw.corp.google.com/i/chromeos/builders/terra-release/builds/771/steps/Archive/logs/stdio

I'll be gathering more stats, but right now we have 32 red canaries and I am guessing most of them are failing with this error.

Summary: archive step build_image exit status 1 on canaries (was: archive step failure should report root error)
29 of them:

     auron_paine/767   build_image
           celes/767   build_image
            cave/768   build_image
            lulu/768   build_image
          setzer/772   build_image
           kevin/776   build_image
             gru/802   build_image
             elm/769   build_image
          gandof/759   build_image
             oak/789   build_image
            relm/769   build_image
           kefka/771   build_image
      veyron_jaq/769   build_image
   veyron_mighty/769   build_image
           asuka/767   build_image
           terra/771   build_image
   veyron_minnie/769   build_image
          ultima/769   build_image
   veyron_speedy/767   build_image
           samus/3938  build_image
      auron_yuna/767   build_image
            reks/767   build_image
        caroline/320   build_image
          wizpig/775   build_image
          sentry/767   build_image
           edgar/771   build_image
           umaro/770   build_image
    veyron_jerry/768   build_image
           banon/767   build_image

Comment 10 by dshi@chromium.org, Jan 17 2017
I don't see a clear error of that build_image command line, the massive failure started around last Wednesday (1/11). Changes in chromite around that time are:
https://chromium-review.googlesource.com/#/c/422958/
(this is not related to build_image)

https://chromium-review.googlesource.com/#/c/427382/
(only affect report stage, not archive)

https://chromium-review.googlesource.com/#/c/425849/
(seems only related to paygen)

has anyone tried to build_image in local chroot?
Comment 11 Deleted
Issue 681632 has been merged into this issue.
Comment 13 by dshi@chromium.org, Jan 17 2017
Cc: vapier@chromium.org
Cc: lhchavez@chromium.org mka@chromium.org
this is the failure:
mkdir: cannot create directory &#8216;/mnt/host/source/src/build/images/gru/R57-9177.0.0-a3/rootfs/opt/google/containers&#8217;: Permission denied

it was mentioned in  issue 680849 , but i don't think was followed up as the focus was on the recovery image failure.
Summary: archive step build_image failure: mkdir: cannot create directory "/mnt/host/source/src/build/images/gru/R57-9177.0.0-a3/rootfs/opt/google/containers": Permission denied (was: archive step build_image exit status 1 on canaries)
Cc: dshi@chromium.org
Owner: domlasko...@chromium.org
domlaskowski@ created https://chrome-internal-review.googlesource.com/#/c/317535/ to fix that.
should be duped with https://bugs.chromium.org/p/chromium/issues/detail?id=676152 if this is the failure ?
Blocking: 676152
Comment 19 by josa...@google.com, Jan 18 2017
Labels: M-57 ReleaseBlock-Dev
Status: Assigned
Comment 20 by ketakid@google.com, Jan 18 2017
Let us wait to see how the morning canaries go by 2pm. We will close this out if we see we have all images by that build.
We have successful Archive steps!

https://chromegw.corp.google.com/i/chromeos/builders/oak-release/builds/793
https://chromegw.corp.google.com/i/chromeos/builders/elm-release/builds/773

(nevermind the VMTest sanity failures: "All devservers are currently down").
Comment 22 by dshi@chromium.org, Jan 18 2017
We had a devserver downtime earlier (b/34389707). They are back online as of now.
Status: Fixed
Labels: VerifyIn-61
Sign in to add a comment