build_image failing again in canary archive step with cryptic error |
|||||||||||||||
Issue descriptionDozens of dead canaries. I have some issues with this log file. The root cause, if present, is difficult to find. Also the code seems to have invented time travel, the time stamps are all over. https://uberchromegw.corp.google.com/i/chromeos/builders/gru-release/builds/863/steps/Archive/logs/stdio @@@BUILD_STEP@Archive@@@ ************************************************************ @@@STEP_LINK@stdout-->stdio@https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fgru-release%2F863%2F%2B%2Frecipes%2Fsteps%2FArchive%2F0%2Fstdout@@@ ** Start Stage Archive - Mon, 06 Feb 2017 04:39:48 -0800 (PST) ** ** Archives build and test artifacts for developer consumption. ** ** Attributes: ** release_tag: The release tag. E.g. 2981.0.0 ** version: The full version string, including the milestone. ** E.g. R26-2981.0.0-b123 ************************************************************ 04:39:48: INFO: Created cidb engine bot@173.194.81.53 for pid 30625 04:39:48: INFO: Running cidb query on pid 30625, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f0882ba7f90> Preconditions for the stage successfully met. Beginning to execute stage... 04:39:48: INFO: Running cidb query on pid 30625, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f0882baff10> 04:39:55: INFO: RunCommand: /b/cbuild/internal_master/chromite/bin/cros_sdk 'USE=-cros-debug chrome_internal' 'PARALLEL_EMERGE_STATUS_FILE=/tmp/tmpFpPkKD' -- ./mod_image_for_recovery.sh '--board=gru' '--image=/mnt/host/source/src/build/images/gru/R58-9256.0.0/tmpTduJlU/chromiumos_base_image.bin' in /b/cbuild/internal_master 04:47:58: INFO: RunCommand: /b/cbuild/internal_master/chromite/bin/cros_sdk 'USE=-cros-debug chrome_internal' 'PARALLEL_EMERGE_STATUS_FILE=/tmp/tmpai4amL' -- ./build_image '--board=gru' --replace '--symlink=factory_shim' '--build_attempt=3' factory_install in /b/cbuild/internal_master 04:58:56: ERROR: return code: 1; command: /b/cbuild/internal_master/chromite/bin/cros_sdk 'USE=-cros-debug chrome_internal' 'PARALLEL_EMERGE_STATUS_FILE=/tmp/tmpai4amL' -- ./build_image '--board=gru' --replace '--symlink=factory_shim' '--build_attempt=3' factory_install * Generating locale-archive: forcing # of jobs to 1
,
Feb 6 2017
There was some maintenance around the time 03 Feb 14:20 according to chromiumos-status.appspot.com. akeshet Fri, 03 Feb 14:20 Tree is closed for maintenance (mass builder reimaging happening now) akeshet Fri, 03 Feb 14:11 Tree is closed for maintenance (waterfall restart after current CQ run) Probably it may be related. +akeshet to see any clue.
,
Feb 6 2017
It failed to build the factory_install_image. Other images are fine and generated, like: https://pantheon.corp.google.com/storage/browser/chromeos-image-archive/gru-release/R58-9250.0.0 But goldeye seems to treat it as a build failed and totally does not show all images. It'd be more serious.
,
Feb 6 2017
,
Feb 6 2017
Didn't mean to minus the Ccs. Added Christine for more comments on images in GoldenEye.
,
Feb 6 2017
Yes, the archive stage has failed on these builders, so nothing has been copied to the chromeos-releases bucket: e.g. https://pantheon.corp.google.com/storage/browser/chromeos-releases/canary-channel/samus/9256.0.0/?pli=1 vs. https://pantheon.corp.google.com/storage/browser/chromeos-releases/canary-channel/buddy/9256.0.0/?pli=1 or for the example above: https://pantheon.corp.google.com/storage/browser/chromeos-releases/canary-channel/gru/9250.0.0 That means there aren't any signed images or any payloads for these boards, only the unsigned image produced by the builders (still in chromeos-image-archives). I think we need to figure out what is making the factory_install image generation failure so the archive stage works.
,
Feb 6 2017
Note that there are two bugs here. One is, why did build_image fail, and two, can we make it easier to tell why it failed from the logs. I think it makes sense to leave it as a single bug for now (maybe forever) but the two issues may need separate resolutions.
,
Feb 6 2017
If build_image is failing this looks more to be a sheriff issue rather than a deputy issue.
,
Feb 6 2017
#8 this is unclear. It's not failing for all builds, so it could be a flake, which could be due to the build_image code, but also to infra issues. :/
,
Feb 6 2017
Is there a recent build with an example of this? If so, please link to it.
,
Feb 6 2017
Added link to recent build of this failure: https://uberchromegw.corp.google.com/i/chromeos/builders/gru-release/builds/863.
,
Feb 6 2017
Re: #8: It seems to be pretty consistently those particular builds that are failing. Coincidentally, if I look at GoldenEye, I can see that these boards are all the boards that have an ARC container (publicly enabled or not).
,
Feb 6 2017
https://cros-goldeneye.corp.google.com/chromeos/console/listBuild?milestone=58#/details. Click open one of the status with low success rate and all the builds with missing signed image has an ARC version.
,
Feb 6 2017
I compared the log with the code which print the log. Given this one as an example: https://uberchromegw.corp.google.com/i/chromeos/builders/gru-release/builds/863/steps/Archive/logs/stdio Look like the code of the line 220 still worked fine, which calls gconv_strip. https://cs.corp.google.com/chromeos_public/src/scripts/build_library/base_image_util.sh?type=cs&q=delete_prompt+package:%5Echromeos_public$&l=220 Generated the log of: 04:58:51: INFO: Searching for unused gconv files defined in /mnt/host/source/src/build/images/gru/R58-9256.0.0-a3/rootfs/usr/lib/gconv/gconv-modules 04:58:52: INFO: Will search for 1131 strings in 10 files 04:58:53: INFO: Done. Using 20 gconv modules. Removed 226 unused modules (17140.1 KiB) and 6 unused dependencies (928.0 KiB) But the code of the line 244 seems to be unreached, which calls insert_container_publickey.sh: https://cs.corp.google.com/chromeos_public/src/scripts/build_library/base_image_util.sh?type=cs&q=delete_prompt+package:%5Echromeos_public$&l=244 The insert_container_publickey.sh script is supposed to print a log of "Container verification key was installed. Do not forget to resign the image!" when done. https://cs.corp.google.com/chromeos_public/src/platform/vboot_reference/scripts/image_signing/insert_container_publickey.sh?q=insert_container_publickey&dr&l=45 However, this string didn't show up on the log. On the other hand, an error string showed: Could not open /mnt/host/source/src/build/images/gru/R58-9256.0.0-a3/rootfs/opt/google/containers/android/system.raw.img, because No such file or directory The recent change of adding this insert_container_publickey.sh script is the most suspicious. https://chromium-review.googlesource.com/#/c/430830/ Added the author dgreid@ to clarify if this script works on a factory install image.
,
Feb 7 2017
It is not related to https://chromium-review.googlesource.com/#/c/430830/. The cause is somewhere happened before, i.e. line 232 which calls get_arc_build_info. https://cs.corp.google.com/chromeos_public/src/scripts/build_library/base_image_util.sh?type=cs&q=delete_prompt+package:%5Echromeos_public$&l=232 The changes are: https://chromium-review.googlesource.com/#/c/433997/ https://chrome-internal-review.googlesource.com/c/321325/ That matches the error message: Could not open /mnt/host/source/src/build/images/gru/R58-9256.0.0-a3/rootfs/opt/google/containers/android/system.raw.img, because No such file or directory
,
Feb 7 2017
there have been some chromite changes related to ARC, but not sure if Bernie's changes have landed yet
,
Feb 7 2017
The factory install shim reuses the same create_base_image() method. But some ARC++ logic doesn't apply to the factory install shim, like the suspicious ones in c#14 (insert_container_publickey) and c#15 (get_arc_build_info). Should ignore calling these methods on the factory install shim case, or create another create_factory_install_image().
,
Feb 7 2017
This is the changelog for 9248.0.0 which is when the issue began: https://crosland.corp.google.com/log/9247.0.0..9248.0.0
,
Feb 7 2017
Re c#18, for the infra issue (or non-image related issue) like this one, looking at the changes between (build-1)..(build) is not enough. Should look at the changes between (second-latest-push-to-prod)..(latest-push-to-prod). However, it seems no easy way to get the build number of push-to-prod. And the schedule of push-to-prod doesn't align with the build boundary. Changes like the following are only pushed to the production servers on regular push-to-prod schedule. https://chromium-review.googlesource.com/#/c/430830/ https://chromium-review.googlesource.com/#/c/433997/
,
Feb 7 2017
Re comment 16, these appear to be before my changes in 9250, I don't see anything obvious in the delta, maybe the new glib?
,
Feb 7 2017
This definitely looks like a failure of https://chrome-internal-review.googlesource.com/c/321325/ and dependent CLs. I think we should just revert them in the meantime
,
Feb 7 2017
,
Feb 7 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/overlays/project-cheets-private/+/32c94a81a7b540972ecaa531d2b540721c2895c8 commit 32c94a81a7b540972ecaa531d2b540721c2895c8 Author: Wai-Hong Tam <waihong@google.com> Date: Tue Feb 07 03:01:11 2017
,
Feb 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/crosutils/+/d21e05fead7fd12ab58d43558903f9b7f0a4a3cd commit d21e05fead7fd12ab58d43558903f9b7f0a4a3cd Author: Wai-Hong Tam <waihong@google.com> Date: Tue Feb 07 03:01:18 2017 Revert "Pass ARC release info to cros_set_lsb_release." The change broke Canary builds. CQ-DEPEND=CL:*325265 BUG=chromium:689072 TEST=build_image --board=gru --replace --symlink=factory_shim \ --build_attempt=3 factory_install This reverts commit 180d3f8a79ae3dcf022e4ae51f3850bfd4be26d9. Change-Id: Ieb08f353886632059201f11a4b41b7d99cd36182 Reviewed-on: https://chromium-review.googlesource.com/438771 Reviewed-by: Elijah Taylor <elijahtaylor@chromium.org> Commit-Queue: Wai-Hong Tam <waihong@google.com> Tested-by: Wai-Hong Tam <waihong@google.com> [modify] https://crrev.com/d21e05fead7fd12ab58d43558903f9b7f0a4a3cd/build_library/base_image_util.sh
,
Feb 7 2017
Sorry for breakage and thanks for reverts. I reproduced this issue locally with ./build_image factory_install, so I believe those reverts will fix the issue. I'm reassigning this issue to waihong@ as he made reverts. I think we can mark it fixed after we verify release builders get back to green. I'll reland those patches soon.
,
Feb 7 2017
Chrome OS CQ is now failing with "ERROR: Could not determine Android SDK version": https://luci-milo.appspot.com/buildbot/chromeos/veyron_minnie-paladin/1594 Does it mean the following change https://chrome-internal-review.googlesource.com/c/322768/ also needs to be reverted for a while?
,
Feb 7 2017
Ah, yes, that's true. Sorry I forgot that those tests are in bvt-cq. I will make a revert.
,
Feb 7 2017
I created a revert: https://chrome-internal-review.googlesource.com/c/325328 I enqueued it to CQ, but maybe we can chump this change since CQ will fail for sure without it. I'll defer to sheriffs.
,
Feb 7 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/autotest-cheets/+/08c9ceca9b10092eff27dabe7610edcad7cbbae0 commit 08c9ceca9b10092eff27dabe7610edcad7cbbae0 Author: Shuhei Takahashi <nya@google.com> Date: Tue Feb 07 16:38:38 2017
,
Feb 7 2017
Both the issues (factory install image failed in Archive in Canary and "Could not determine Android SDK version" in CQ) are fixed and don't happen on recent builds.
,
Feb 8 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/overlays/project-cheets-private/+/94eced08087144057bd19f78cabbd3b3a783ceb3 commit 94eced08087144057bd19f78cabbd3b3a783ceb3 Author: Shuhei Takahashi <nya@google.com> Date: Wed Feb 08 06:27:03 2017
,
Feb 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/crosutils/+/7a82c91c41d6cd35d76668523101a3e5413e0e42 commit 7a82c91c41d6cd35d76668523101a3e5413e0e42 Author: Shuhei Takahashi <nya@chromium.org> Date: Wed Feb 08 06:27:03 2017 Reland: Pass ARC release info to cros_set_lsb_release. The cause was identified and fixed in CL:*325403. This reverts commit d21e05fead7fd12ab58d43558903f9b7f0a4a3cd. Original change's description: > Revert "Pass ARC release info to cros_set_lsb_release." > > The change broke Canary builds. > > CQ-DEPEND=CL:*325265 > BUG=chromium:689072 > TEST=build_image --board=gru --replace --symlink=factory_shim \ > --build_attempt=3 factory_install > > This reverts commit 180d3f8a79ae3dcf022e4ae51f3850bfd4be26d9. > > Change-Id: Ieb08f353886632059201f11a4b41b7d99cd36182 > Reviewed-on: https://chromium-review.googlesource.com/438771 > Reviewed-by: Elijah Taylor <elijahtaylor@chromium.org> > Commit-Queue: Wai-Hong Tam <waihong@google.com> > Tested-by: Wai-Hong Tam <waihong@google.com> CQ-DEPEND=CL:*325403 BUG=b:34693882 TEST=build_image --board=samus-cheets TEST=build_image --board=samus-cheets factory_install Change-Id: I5bcdf360af6f6404420a5335c533ebe4cd69e456 Reviewed-on: https://chromium-review.googlesource.com/438905 Commit-Ready: Shuhei Takahashi <nya@chromium.org> Tested-by: Shuhei Takahashi <nya@chromium.org> Reviewed-by: Elijah Taylor <elijahtaylor@chromium.org> [modify] https://crrev.com/7a82c91c41d6cd35d76668523101a3e5413e0e42/build_library/base_image_util.sh
,
Feb 14 2017
Postmortem questions a) What the root cause this? b) If the root cause was a CL that was reverted, how the the CL manage to land before breaking the CQ? Was it chumped, or was there some kind of hole in the CQ testing?
,
Feb 14 2017
I don't think the CQ does the same archiving steps (creating factory install shim) as the canary?
,
Feb 14 2017
Hmm. I the weekly summary, some CQ failures were blamed on this bug. Was that blame incorrect?
,
Feb 14 2017
examples: https://luci-milo.appspot.com/buildbot/chromeos/veyron_speedy-paladin/4324 https://luci-milo.appspot.com/buildbot/chromeos/veyron_speedy-paladin/4321 Is this the wrong bug for those?
,
Feb 14 2017
I think that was an incomplete revert (mentioned above).
,
Feb 15 2017
#33: We had two problems: 1. canary builder breakage 2. CQ breakage 1 was caused because my patch: https://chrome-internal-review.googlesource.com/c/321325/ did not consider the case of factory_install images. 2 was due to incomplete reverts. Following changes were initial reverts: https://chrome-internal-review.googlesource.com/c/325265/ https://chromium-review.googlesource.com/c/438771/ They broke CQ. Actually we also needed this revert: https://chrome-internal-review.googlesource.com/c/325328/ We could avoid these breakages if: 1: we tested the patch with ./build_image factory_install 2: we committed the reverts via CQ
,
Feb 21 2017
,
Jul 17 2017
ChromeOS Infra P1 Bugscrub. P1 Bugs in this component should be important enough to get weekly status updates. Is this already fixed? -> Fixed Is this no longer relevant? -> Archived or WontFix Is this not a P1, based on go/chromeos-infra-bug-slo rubric? -> lower priority. Is this a Feature Request rather than a bug? Type -> Feature Is this missing important information or scope needed to decide how to proceed? -> Ask question on bug, possibly reassign. Does this bug have the wrong owner? -> reassign. Bugs that remain in this state next week will be downgraded to P2.
,
Jul 24 2017
ChromeOS Infra P1 Bugscrub. Issue untouched in a week after previous message. Downgrading to P2.
,
Jun 8 2018
Hi, this bug has not been updated recently. Please acknowledge the bug and provide status within two weeks (6/22/2018), or the bug will be closed. Thank you. |
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by waihong@chromium.org
, Feb 6 2017