Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 2 users
Status: Fixed
Owner:
Closed: Aug 24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug



Sign in to add a comment
chromiumos-sdk builder failing in SDKTest
Project Member Reported by manojgupta@chromium.org, Aug 19 Back to list
https://uberchromegw.corp.google.com/i/chromiumos/builders/chromiumos-sdk/builds/8085

INFO    cros_sdk:make_chroot: Unpacking stage3...
INFO    cros_sdk:make_chroot: Set timezone...
INFO    cros_sdk:make_chroot: Adding user/group...
useradd: cannot create directory /home/chrome-bot


 
Cc: pprabhu@chromium.org dgarr...@chromium.org bmgordon@chromium.org
This CL seems suspicious: https://chromium-review.googlesource.com/c/chromiumos/chromite/+/614761

It touches cros_sdk and landed just after the last passing chromiumos-sdk build

+bmgordan

+pprabhu current deputy for monitoring
It does look related.  Debugging now.
Owner: bmgordon@chromium.org
Status: Assigned
I reverted my cl and the trybots failed with the same error.  I'll try to bisect tomorrow.
Cc: alanjones@google.com vapier@chromium.org
 Issue 757824  has been merged into this issue.
Cc: steve...@chromium.org
Summary: chromiumos-sdk builder failing in SDKTest (was: chromiumos-sdk builder failing)
Adding a little more info to the description.

The 'cannot create directory' suggests to me that it might be something like a disk full error and re-imaging build112-m2 might fix this?

A bisect might prove otherwise though, so please do try that, thanks!

So far I've confirmed that abb3e376..9ec40dbe don't show the problem.  I'll continue bisecting.
For what it's worth, the space constraint is on a loopback image, not the builder.

chrome-bot@build112-m2:(Linux 14.04):~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             63G   12K   63G   1% /dev
tmpfs            13G  442M   13G   4% /run
/dev/sda4       3.3T  335G  2.8T  11% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none             63G   48K   63G   1% /run/shm
none            100M     0  100M   0% /run/user
/dev/sda2       962M   37M  877M   4% /boot

That builder being so different, it might be using a lot more space in the chroot than normally happens.
> it might be using a lot more space in the chroot than normally happens

true, but fairly certain we aren't talking a scale of 100G's :)
Cc: manojgupta@chromium.org jclinton@chromium.org
Labels: -Pri-1 Pri-0
Manoj's bisect and trybot blames this CL:
https://chromium-review.googlesource.com/c/chromiumos/chromite/+/625377
trybot (past the failing chroot creation): https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/chromiumos-sdk/builds/2504

I'm not convinced that CL is to blame, yet.
One complicating factor is that jclinton@ notes that we've landed 10s of ebuilds since that CL that use the new behaviour, so we're not looking at a single revert here.
Cc: apronin@chromium.org briannorris@chromium.org
+sheriffs.

The error message in the bug submission isn't complete; the actual failure is:

Running ['/b/c/cbuild/repository/src/scripts/sdk_lib/make_chroot.sh', '--stage3_path', '/b/c/cbuild/repository/built-sdk.tar.xz', '--chroot', '/b/c/cbuild/repository/new-sdk-chroot', '--cache_dir', '/b/c/cbuild/repository/.cache', '--nousepkg'] failed!

@@@STEP_FAILURE@@@

That looks like it's trying to extract a binary stage3 tarball in a chroot.

Downloaded cros-sdk-2017.08.17.102013.tar.xz from last successful run and re-ran the failing command at ToT: it succeeds with https://chromium-review.googlesource.com/c/chromiumos/chromite/+/625377 still included. So, the only way that https://chromium-review.googlesource.com/c/chromiumos/chromite/+/625377 could be implicated is if that CL somehow changes the output of the SDK tarball in a way that doesn't cause a build failure but subsequently causes a stage 3 bootstrap to fail.
Owner: jclinton@chromium.org
Owner: llozano@chromium.org
Hi Luis,

Please find an owner for this: the SDK build is broken with no way to reproduce and no current understanding of a root cause. Debugging requires knowledge of (and maybe new features in) the toolchain builder.

A few suggestions for follow-ups:
* More logs in the failing portion of build
* A way to reproduce the failure on eng workstations
* SDK Builder should be in CQ

Owner: jclinton@chromium.org
I don't think we can expect the toolchain team to fix all that immediately.

Any chance you can generate a local build with a new checkout + 'cros_sdk --bootstrap'?

I'm also beginning to think it will be easier to go with a revert (since we *do* have a highly likely target CL, even if it's tough to explain at this point), and then with lower priority we can figure out how to really understand the failure.

Can we begin tracking down what the 10s of ebuild changes that would need reverted? What would be needed there, Jason?
Cc: keta...@chromium.org akes...@chromium.org josa...@chromium.org bhthompson@chromium.org
First of all, the CL that is causing the trouble should be reverted ASAP. This builder in the waterfall has been broken for 5+ days.
There is not fix in sight for this problem and after reading all the info I have I don't see a good reason why not to revert. I don't see why reverting this CL will cause build issues if Chrome OS has lived with that problem for several years.
When a change breaks the waterfall, we revert first and deal with the issue later.

I think what jclinton is trying to fix is great and it is great that he took this problem on but his fix is having a bad side effect and that needs to be fixed before the CL can stick.

Second, Jason, you are the person that best understand the problem. After the analysis you did you understand the interactions between DEPEND/RDEPEND much better than most of us. So, IMO, you are the best person to deal with this. You may need some help from some of the portage experts and my team is willing to help where we can but we are no portage experts. We already helped by helping find the CL that caused the problem. 

So, I cannot own this bug. 




> revert first and deal with the issue later.

Agreed, that's where we're at now. The only problem is the side effects of any ebuilds that have changed since then to make use of jclinton's bugfix. We don't want to break the CQ any more there.

> So, I cannot own this bug.

Agreed. That's why I reassigned.

The revert passed the chromiumos-sdk trybot, so I think we could be OK...

...although pasted from chat, jclinton claimed:

"""
what will happen is that source builds on a number of packages (chromeos-bootimage, depthcharge, libpayload, coreboot, chromeos-device-firmware) will fail on a developer's workstation after the revert
"""
Looking for CLs that depend on the bugfix, I looked through the most obvious suspects - CLs in chromiumos-overlay and portage-stable since Aug 18.
Regarding RDEPEND vs DEPEND lists, most don't make changes there, with the following exceptions:
 - CL:565352: is a new ebuild for vm_tools; sets RDEPEND=DEPEND, so not an issue.
 - CL:577766 adds the new dex2oatds to RDEPEND in target-chromium-os-sdk only; consistent with how other dependencies are specified, shouldn't cause regression.
 - CL:614310 - more on it below.
What other potential telltale signs of ebuilds that rely on the bugfix should we be looking for?

CL:614310 "chromeos-ec: EC_FIRMWARE_UNIBUILD{,_FAKE} -> get_each_model_conf_value_set" adds unibuild? ( chromeos-base/chromeos-config ) to DEPEND only.
Will it break with the revert? Will later unibuild-themed CLs break? Can it be fixed by adding chromeos-config to RDEPEND?
Status: Fixed
Re: Comment 18: There seems to have been some miscommunication: we currently have no confidence that https://chromium-review.googlesource.com/c/chromiumos/chromite/+/625377 is implicated because the no one has not pointed to a root cause or a way to reproduce. As such, we don't have any working hypothesis how it could be related: Portage isn't used at the stage that is failing at ToT. The line that is failing is http://cs/chromeos_public/src/scripts/sdk_lib/make_chroot.sh?l=181&rcl=2382b198a82bd13ff429b59288e1d89083d21c3d . This is also not a typical 

However, since we are at the point of preferring taking any action over the correct action, I've submitted the revert. We should see an SDK build sometime tomorrow.

There will be fallout from the revert on CQ though it make take a few days to manifest due to the nature of the parallel_emerge bug that this CL fixed. Only from-source builds are affected thus only uprevs of interdependent packages will manifest the bug. Folks have traditionally used DEPEND/RDEPEND as a hack to work around that, though, I had to submit this change because that doesn't always work: no ebuild is safe. I suspect that this has been a significant historical source of CQ flakiness.

Re: Comment 20: DEPEND/RDEPEND isn't a viable work-around for some versions of graph cycles and so also isn't a canary into what CL's we might see fail. At this point, we'll just have to wait for the failure reports to come in in a few days.

Project Member Comment 22 by bugdroid1@chromium.org, Aug 24
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/95f30c12f2f0012aa5bc45ee05b1cdeae9c7b9ea

commit 95f30c12f2f0012aa5bc45ee05b1cdeae9c7b9ea
Author: Manoj Gupta <manojgupta@google.com>
Date: Thu Aug 24 06:57:00 2017

Revert "parallel_emerge: Work around Portage library bug with usepkg"

This reverts commit 14e53987cfbb1e99fa72f7517910acbb4e8ca5d4.
The revert is required to fix sdk builder breakage.

BUG= chromium:757824 ,  chromium:757147 
TEST=Chromiumos-sdk builder works after reverting the CL.
TEST=https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/chromiumos-sdk/builds/2504

Change-Id: Ifa65f82f9b39b27eb56b8152adf14103202a2b5f
Reviewed-on: https://chromium-review.googlesource.com/625377
Trybot-Ready: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Jason Clinton <jclinton@chromium.org>
Commit-Queue: Jason Clinton <jclinton@chromium.org>

[modify] https://crrev.com/95f30c12f2f0012aa5bc45ee05b1cdeae9c7b9ea/scripts/parallel_emerge.py

Sign in to add a comment