Issue metadata
Sign in to add a comment
|
CQ should promote chroot snapshots to stable after 2 successful runs |
||||||||||||||||||||||
Issue descriptionreef-paladin has been failing for the last three days (example: https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Freef-paladin%2F6522%2F%2B%2Frecipes%2Fsteps%2FBuildPackages%2F0%2Fstdout), with a number of packages failing to build with errors such as libpcre-8.41-r1: ./.libs/libpcrecpp.so: error: undefined reference to 'std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string()' libpcre-8.41-r1: clang-7: error: linker command failed with exit code 1 (use -v to see invocation) smartmontools-6.6-r1: utility.cpp:458: error: undefined reference to 'std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string()' smartmontools-6.6-r1: clang-7: error: linker command failed with exit code 1 (use -v to see invocation) protobuf-3.3.0: ./.libs/libprotoc.so: error: undefined reference to 'std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string()' protobuf-3.3.0: clang-7: error: linker command failed with exit code 1 (use -v to see invocation) opencv-2.3.0-r12: ../../../OpenCV-2.3.0/modules/stitching/main.cpp:123: error: undefined reference to 'std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string()' opencv-2.3.0-r12: clang-7: error: linker command failed with exit code 1 (use -v to see invocation) Assigning to Luis since https://chromium-review.googlesource.com/1168987 looks vaguely toolchain/stdlib related and seems to fit time wise. If this is something else, can you please help find the right owner for it?
,
Aug 14
Note that reef-paladin is experimental (and has been) for a long time. Since reef-release is fine (https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=reef-release&buildBranch=master), it is probably a prebuilts issue. Can someone clobber the builder, dgarrett@ or vapier@ ?
,
Aug 14
I reinstanced it, but don't really expect that to help. Prebuilts are usually clobbered after every failed build.
,
Aug 14
I think there is something special about being marked experimental. Looking at setup_board stage logs, it is still using a old chroot.
,
Aug 14
Oh... that sounds bad.
,
Aug 14
https://uberchromegw.corp.google.com/i/chromeos/builders/reef-paladin/builds/6545 is doing fine after reinstancing. Change the bug to reflect that CQ is not resetting chroots after failed builds for experimental builders.
,
Aug 14
,
Aug 14
,
Aug 14
Wait, how could it possibly be using an old chroot (#4) after reinstancing (#3)?
,
Aug 14
Currently only affecting experimental builds so setting this to P2.
,
Aug 14
Comment#4 was referring to the state before re-instancing. e.g. Failing builds (failed in build_packages); InitSDK/Setup_board stage reused the old chroot. https://uberchromegw.corp.google.com/i/chromeos/builders/reef-paladin/builds/6543 https://uberchromegw.corp.google.com/i/chromeos/builders/reef-paladin/builds/6544 After reinstancing, new chroot was used so build_packages passed. https://uberchromegw.corp.google.com/i/chromeos/builders/reef-paladin/builds/6545
,
Aug 14
Ah, thanks. I'll work on a fix now.
,
Aug 22
Looks like the issue not limited to experimental buidlers but affects all CQ builders. Taking kevin-paladin as an example: It has been failing since build 5360. But the next runs are still using an old chroot. First failing build: https://uberchromegw.corp.google.com/i/chromeos/builders/kevin-paladin/builds/5360 Setup Board stage output shows that the next build 5361 is still using the old chroot. https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fkevin-paladin%2F5361%2F%2B%2Frecipes%2Fsteps%2FSetupBoard%2F0%2Fstdout 15:55:21: INFO: RunCommand: /b/c/cbuild/repository/chromite/bin/cros_sdk 'PARALLEL_EMERGE_STATUS_FILE=/tmp/tmp3zGKw9' 'USE=chrome_internal' 'FEATURES=separatedebug' -- ./setup_board '--board=kevin' '--accept_licenses=@CHROMEOS' --skip_chroot_upgrade '--save_install_plan=/tmp/kevin_install_plan.2866425' in /b/c/cbuild/repository 15:55:21: NOTICE: /b/c/cbuild/repository/chroot.img is using 38 GiB more than needed. Running fstrim. INFO : Selecting profile: /mnt/host/source/src/private-overlays/overlay-kevin-private/profiles/base for /build/kevin INFO : Cross toolchain already up to date. Nothing to do. WARNING : Board output directory '/build/kevin' already exists. WARNING : Exiting early. WARNING : Use --force to clobber the board root and start again. Same story for the following builds where an old chroot continues to be used: https://uberchromegw.corp.google.com/i/chromeos/builders/kevin-paladin/builds/5361 https://uberchromegw.corp.google.com/i/chromeos/builders/kevin-paladin/builds/5362 https://uberchromegw.corp.google.com/i/chromeos/builders/kevin-paladin/builds/5363
,
Aug 22
Bump back to P1.
,
Aug 22
Explained offline: we always reuse known-good chroots on all builds. So, that we are seeing the chroot already existing does not imply that there is a bug. This is achieved with a known-good filesystem snapshot. So, in the kevin-paladin example, the chroot is from 5359 and has been since it started failing. LVM2 is used to implement this. when the build passes, LVM2 will merge the filesystem delta from the build into the base snapshot making a new known-good image. Back to the original bug report: are we seeing actual consequences on the experimental builders that makes you think that something about the chroot is bad? I ask because the reef-paladin has passed recently without resetting the chroot: https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=reef-paladin&buildBranch=master
,
Aug 23
I've stared at this for hours and am pretty sure that there's no bug: there's no logic specific to experimental status as that level of detail is not made available to the code that manages the chroots. Please reopen if you feel that I've missed something.
,
Aug 23
Question.... at which point do we snapshot the chroot? IE: What do we reset to after a failure?
,
Aug 23
Explained in this design doc (which has been fully implemented): https://docs.google.com/document/d/1bPaB8ZzaCbghQYR3lv4eYTIQn6f-c0REEhCKa8GU0DA/edit
,
Aug 23
I don't see how it is a won't fix. Yes, the root cause has changed to snapshots but the issues caused by a bad snapshot have not been addressed. If CQ is taking snapshots, the snapshots should have sort of expiry or other logic that ignores the snapshots after some number of fails. The P0 bug in https://bugs.chromium.org/p/chromium/issues/detail?id=876634 was clearly a case of snapshots being incorrect or out of date.
,
Aug 23
> I don't see how it is a won't fix. Yes, the root cause has changed to snapshots but the issues caused by a bad snapshot have not been addressed. > > If CQ is taking snapshots, the snapshots should have sort of expiry or other logic that ignores the snapshots after some number of fails. Implementing that kind of logic would be really hard and error prone. However, the opposite would be attainable: only promoting chroot snapshots after two successful runs. That would prevent the N+1 style breakage. Reopening this bug and retitling it to track. > The P0 bug in https://bugs.chromium.org/p/chromium/issues/detail?id=876634 was clearly a case of snapshots being incorrect or out of date. Yes, but ideally we'd stop bad CL's from breaking the chroots in the first place. However, I believe that the proposal above would be feasible and provide that safety net without too much performance impact.
,
Aug 23
It we can improve chroot creation and setup board performance, we could reset the chroot on every single build, giving us more reproducible results. One option is to publish pre-created chroot.img files that builders download and use. They could include additional setup steps such as running setup_board for every board in advance from a clean tree. Builders then have to update the chroot, and run-run setup_board in case of new changes, but don't have to start from scratch. We keep a few builders (probably release, full, and the new chroot.img publisher) that always start from scratch to make certain we can.
,
Dec 12
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by jwer...@chromium.org
, Aug 14