chromiumos-sdk builder failing in SDKTest |
|||||||||||
Issue descriptionhttps://uberchromegw.corp.google.com/i/chromiumos/builders/chromiumos-sdk/builds/8085 INFO cros_sdk:make_chroot: Unpacking stage3... INFO cros_sdk:make_chroot: Set timezone... INFO cros_sdk:make_chroot: Adding user/group... useradd: cannot create directory /home/chrome-bot
,
Aug 21 2017
It does look related. Debugging now.
,
Aug 21 2017
,
Aug 22 2017
I reverted my cl and the trybots failed with the same error. I'll try to bisect tomorrow.
,
Aug 22 2017
,
Aug 22 2017
Adding a little more info to the description. The 'cannot create directory' suggests to me that it might be something like a disk full error and re-imaging build112-m2 might fix this? A bisect might prove otherwise though, so please do try that, thanks!
,
Aug 22 2017
So far I've confirmed that abb3e376..9ec40dbe don't show the problem. I'll continue bisecting.
,
Aug 22 2017
For what it's worth, the space constraint is on a loopback image, not the builder. chrome-bot@build112-m2:(Linux 14.04):~$ df -h Filesystem Size Used Avail Use% Mounted on udev 63G 12K 63G 1% /dev tmpfs 13G 442M 13G 4% /run /dev/sda4 3.3T 335G 2.8T 11% / none 4.0K 0 4.0K 0% /sys/fs/cgroup none 5.0M 0 5.0M 0% /run/lock none 63G 48K 63G 1% /run/shm none 100M 0 100M 0% /run/user /dev/sda2 962M 37M 877M 4% /boot
,
Aug 22 2017
That builder being so different, it might be using a lot more space in the chroot than normally happens.
,
Aug 22 2017
> it might be using a lot more space in the chroot than normally happens true, but fairly certain we aren't talking a scale of 100G's :)
,
Aug 23 2017
Manoj's bisect and trybot blames this CL: https://chromium-review.googlesource.com/c/chromiumos/chromite/+/625377 trybot (past the failing chroot creation): https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/chromiumos-sdk/builds/2504 I'm not convinced that CL is to blame, yet. One complicating factor is that jclinton@ notes that we've landed 10s of ebuilds since that CL that use the new behaviour, so we're not looking at a single revert here.
,
Aug 23 2017
+sheriffs.
,
Aug 23 2017
The error message in the bug submission isn't complete; the actual failure is: Running ['/b/c/cbuild/repository/src/scripts/sdk_lib/make_chroot.sh', '--stage3_path', '/b/c/cbuild/repository/built-sdk.tar.xz', '--chroot', '/b/c/cbuild/repository/new-sdk-chroot', '--cache_dir', '/b/c/cbuild/repository/.cache', '--nousepkg'] failed! @@@STEP_FAILURE@@@ That looks like it's trying to extract a binary stage3 tarball in a chroot.
,
Aug 23 2017
Downloaded cros-sdk-2017.08.17.102013.tar.xz from last successful run and re-ran the failing command at ToT: it succeeds with https://chromium-review.googlesource.com/c/chromiumos/chromite/+/625377 still included. So, the only way that https://chromium-review.googlesource.com/c/chromiumos/chromite/+/625377 could be implicated is if that CL somehow changes the output of the SDK tarball in a way that doesn't cause a build failure but subsequently causes a stage 3 bootstrap to fail.
,
Aug 23 2017
,
Aug 23 2017
Hi Luis, Please find an owner for this: the SDK build is broken with no way to reproduce and no current understanding of a root cause. Debugging requires knowledge of (and maybe new features in) the toolchain builder. A few suggestions for follow-ups: * More logs in the failing portion of build * A way to reproduce the failure on eng workstations * SDK Builder should be in CQ
,
Aug 24 2017
I don't think we can expect the toolchain team to fix all that immediately. Any chance you can generate a local build with a new checkout + 'cros_sdk --bootstrap'? I'm also beginning to think it will be easier to go with a revert (since we *do* have a highly likely target CL, even if it's tough to explain at this point), and then with lower priority we can figure out how to really understand the failure. Can we begin tracking down what the 10s of ebuild changes that would need reverted? What would be needed there, Jason?
,
Aug 24 2017
First of all, the CL that is causing the trouble should be reverted ASAP. This builder in the waterfall has been broken for 5+ days. There is not fix in sight for this problem and after reading all the info I have I don't see a good reason why not to revert. I don't see why reverting this CL will cause build issues if Chrome OS has lived with that problem for several years. When a change breaks the waterfall, we revert first and deal with the issue later. I think what jclinton is trying to fix is great and it is great that he took this problem on but his fix is having a bad side effect and that needs to be fixed before the CL can stick. Second, Jason, you are the person that best understand the problem. After the analysis you did you understand the interactions between DEPEND/RDEPEND much better than most of us. So, IMO, you are the best person to deal with this. You may need some help from some of the portage experts and my team is willing to help where we can but we are no portage experts. We already helped by helping find the CL that caused the problem. So, I cannot own this bug.
,
Aug 24 2017
> revert first and deal with the issue later. Agreed, that's where we're at now. The only problem is the side effects of any ebuilds that have changed since then to make use of jclinton's bugfix. We don't want to break the CQ any more there. > So, I cannot own this bug. Agreed. That's why I reassigned. The revert passed the chromiumos-sdk trybot, so I think we could be OK... ...although pasted from chat, jclinton claimed: """ what will happen is that source builds on a number of packages (chromeos-bootimage, depthcharge, libpayload, coreboot, chromeos-device-firmware) will fail on a developer's workstation after the revert """
,
Aug 24 2017
Looking for CLs that depend on the bugfix, I looked through the most obvious suspects - CLs in chromiumos-overlay and portage-stable since Aug 18.
Regarding RDEPEND vs DEPEND lists, most don't make changes there, with the following exceptions:
- CL:565352: is a new ebuild for vm_tools; sets RDEPEND=DEPEND, so not an issue.
- CL:577766 adds the new dex2oatds to RDEPEND in target-chromium-os-sdk only; consistent with how other dependencies are specified, shouldn't cause regression.
- CL:614310 - more on it below.
What other potential telltale signs of ebuilds that rely on the bugfix should we be looking for?
CL:614310 "chromeos-ec: EC_FIRMWARE_UNIBUILD{,_FAKE} -> get_each_model_conf_value_set" adds unibuild? ( chromeos-base/chromeos-config ) to DEPEND only.
Will it break with the revert? Will later unibuild-themed CLs break? Can it be fixed by adding chromeos-config to RDEPEND?
,
Aug 24 2017
Re: Comment 18: There seems to have been some miscommunication: we currently have no confidence that https://chromium-review.googlesource.com/c/chromiumos/chromite/+/625377 is implicated because the no one has not pointed to a root cause or a way to reproduce. As such, we don't have any working hypothesis how it could be related: Portage isn't used at the stage that is failing at ToT. The line that is failing is http://cs/chromeos_public/src/scripts/sdk_lib/make_chroot.sh?l=181&rcl=2382b198a82bd13ff429b59288e1d89083d21c3d . This is also not a typical However, since we are at the point of preferring taking any action over the correct action, I've submitted the revert. We should see an SDK build sometime tomorrow. There will be fallout from the revert on CQ though it make take a few days to manifest due to the nature of the parallel_emerge bug that this CL fixed. Only from-source builds are affected thus only uprevs of interdependent packages will manifest the bug. Folks have traditionally used DEPEND/RDEPEND as a hack to work around that, though, I had to submit this change because that doesn't always work: no ebuild is safe. I suspect that this has been a significant historical source of CQ flakiness. Re: Comment 20: DEPEND/RDEPEND isn't a viable work-around for some versions of graph cycles and so also isn't a canary into what CL's we might see fail. At this point, we'll just have to wait for the failure reports to come in in a few days.
,
Aug 24 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/95f30c12f2f0012aa5bc45ee05b1cdeae9c7b9ea commit 95f30c12f2f0012aa5bc45ee05b1cdeae9c7b9ea Author: Manoj Gupta <manojgupta@google.com> Date: Thu Aug 24 06:57:00 2017 Revert "parallel_emerge: Work around Portage library bug with usepkg" This reverts commit 14e53987cfbb1e99fa72f7517910acbb4e8ca5d4. The revert is required to fix sdk builder breakage. BUG= chromium:757824 , chromium:757147 TEST=Chromiumos-sdk builder works after reverting the CL. TEST=https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/chromiumos-sdk/builds/2504 Change-Id: Ifa65f82f9b39b27eb56b8152adf14103202a2b5f Reviewed-on: https://chromium-review.googlesource.com/625377 Trybot-Ready: Brian Norris <briannorris@chromium.org> Tested-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Jason Clinton <jclinton@chromium.org> Commit-Queue: Jason Clinton <jclinton@chromium.org> [modify] https://crrev.com/95f30c12f2f0012aa5bc45ee05b1cdeae9c7b9ea/scripts/parallel_emerge.py
,
Jan 22 2018
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by ayatane@chromium.org
, Aug 21 2017