New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 793540 link

Starred by 7 users

Issue metadata

Status: Assigned
Owner:
Last visit 25 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

coreboot-sdk is slow to build | InitSDK step suddenly taking nearly an hour on some builders

Project Member Reported by akes...@chromium.org, Dec 9 2017

Issue description

http://shortn/_aLJCJQQ5kn

This is a major speed degradation to the CQ.
 
Cc: rspangler@chromium.org x...@chromium.org davidri...@chromium.org
Owner: pgeorgi@chromium.org
Status: Assigned (was: Untriaged)
Any idea why this package got slow?
Summary: coreboot-sdk is slow to build | InitSDK step suddenly taking nearly an hour on some builders (was: InitSDK step suddenly taking nearly an hour on some builders)
The package is always slow to build, but it isn't updated very often.
Labels: -Pri-1 Pri-2
metrics from OP indicate this is no longer a problem.

pgeorgi@ was there actaully a new change that initiated this rebuild? I couldn't find it.
https://chromium-review.googlesource.com/c/chromiumos/third_party/coreboot/+/816945 was added a bit before this bug was opened. So I guess that's the culprit?

note to self: when upstreaming, batch changes to util/crossgcc until there's something interesting, to reduce the number of coreboot-sdk rebuilds.
> There was multiple builds in a row which were slow.

Presumably until a green run that included the prebuilt.

> Is it needed on the master?

Yes and no. The master needs a working chroot to run BinhostTest, so it needs to run InitSdk. Although, the master never actually needs the corebood-sdk I bet. We'd need to make the concept of "working chroot" smarter to avoid this slowdown on master. (but I don't see the point because all the slaves slow down by about the same amount anyway).
Well this will result in a double slowdown -- slow down for the master, and then again for the slave.
> Well this will result in a double slowdown -- slow down for the master, and then again for the slave.

Slaves are already launched by the time master runs initsdk, so these are parallel poles.
Ah yes, my bad.  I got confused and was just thinking of the Commit Queue Completion blocking.
Here's another case of InitSDK being too slow on what should be unrelated changes:
https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/789842
https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/70158

(I'm not sure if this is from Coreboot SDK changes, but I'm pointing out that these changes can have effects across all developers if there are slow pre-CQ build times and multiple pre-CQ failures with unrelated changes).
Cc: nxia@chromium.org

Comment 17 by lannm@google.com, Jan 17 2018

To summarize: building coreboot-sdk is always slow, but that doesn't usually matter as it is rarely changed. It happens to be a bigger problem now because we haven't had a successful build since the last change to coreboot-sdk to get a new prebuilt.

What is the request here? To find a way to avoid building coreboot-sdk in InitSDK?
From what I can see "emerge coreboot-sdk" builds a compiler which isn't particularly quick even on a z840, I'm wondering if something else changed that caused it to somehow not do that before?
also to add to #18 -- coreboot-sdk is sequentially building a toolchain for each architecture and that is currently the following:


       local architectures=(
                i386-elf
                x86_64-elf
                arm-eabi
                aarch64-elf
                mipsel-elf
                nds32le-elf
        )

so all of those toolchains are being built sequentially.  I'm not sure if we really need all of those?

I did find when InitSDK was fast, it didn't even emerge coreboot-sdk, so I don't think we can blame coreboot-sdk itself for the slowness, but rather the decision to emerge it in the first place.

InitSDK does this:
01:18:03: INFO: RunCommand: /b/c/cbuild/repository/chromite/bin/cros_sdk --nouse-image 'USE=chrome_internal' 'FEATURES=separatedebug' -- ./run_chroot_version_hooks in /b/c/cbuild/repository

so I'm trying to figure out why we used to not build this package as a result but always do it now

Is it because the CQ isn't green and haven't had a proper uprev in 24+ hours so no prebuilts?
re #20 - From what I can see the difference may be whether or not it decides to create a new chroot.  

In cases where it's fast -- there's no output -- recent Example from the master:

build 17504 has a fast InitSDK  - 6 sec:
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F17504%2F%2B%2Frecipes%2Fsteps%2FInitSDK%2F0%2Fstdout

In cases where it's slow -- it reports that it needs to recreate the chroot

build 17505 and later were slow at over an hour:
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F17505%2F%2B%2Frecipes%2Fsteps%2FInitSDK%2F0%2Fstdout

it starts by downloading a sdk tarball and then re-creates the chroot and the long pole in that process is building coreboot-sdk

That seems to be true even back when this was reported back in december.
master-paladin build 17142 was quick with no output:
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F17142%2F%2B%2Frecipes%2Fsteps%2FInitSDK%2F0%2Fstdout

but subsequent builds 17143, 17144, 17145 were slow and all re-created the chroot.  17146 went back to being fast again and didn't re-create a chroot.


so I'm trying to figure out why it sometimes decides to re-create the chroot but not do it other times

 
chroot might get blown away after failed builds?
re #22 --  that's an interesting idea, and I think there could be something to that

I looked at some recent master paladin builds to see:
https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17497 
 - build passed
 - InitSDK took 10 minutes 
 -- previous build succeeded
 -- did create chroot

https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17498
 - build failed
 - Init SDK took 5 seconds
 -- previous build succeeded
 -- did not recreate chroot

https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17499
 - build failed
 - Init SDK took 5 minutes
 -- previous build failed
 -- did recreate chroot

https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17500 
 - build passed
 - Init SDK took 5 minutes
 -- previous build failed
 -- did recreate chroot

https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17501
 - build passed
 - Init SDK took 5 seconds
 -- previous build succeeded
 -- did not recreate chroot

In this sample, it looks like whenever the previous builds failed it will re-create the chroot.  However, sometimes even when the previous build succeeds it still recreates the chroot, like in 17497.

Also interesting in here is that there are some cases where it recreates the chroot but doesn't end up emerging coreboot-sdk, and so InitSDK only takes a few minutes rather than 45 minutes or more.

So I need to dig further into how it's deciding whether to re-create the chroot and then how it's deciding what to build at that point.  I'm guessing there might be some relationship with the sdk builder?

Cc: -la...@chromium.org cmt...@chromium.org dgarr...@chromium.org llozano@chromium.org
+dgarrett (build), cmtice, llozano (sdk)
Cc: la...@chromium.org

Comment 26 by lannm@google.com, Jan 18 2018

A more dramatic fix for this would be to move firmware builds out of the normal build flow (which would mean we wouldn't need to build coreboot-sdk at all). I know this has been proposed in the past and even prototyped; not sure why it hasn't moved forward.
We do blow away the chroot on builders after a failed build, as well as when changing branches between builds. This is by design, since a failed build can (and has) left a chroot in a state that causes future builds to fail.

The chroot.img work is supposed to allow a fast rollback to a "clean" chroot, but I don't think it was ever rolled out because it had it's own problems that were never fully resolved.

Reguarding #26, I would love to see firmware pulled out into it's own thing, but nobody has stepped up to own that.

A doc was sent out to build a whole new infrastructure, which my team was fine with, but not willing to own. Nobody else has stepped up.

I also think we could work something out that would fit into our existing system, but we'd still need firmware team (or someone) to own the firmware specific parts. I'm happy to help you work out what the boundaries should look like and how it should work.

ok to answer my own question about why coreboot-sdk is getting built sometimes -- the coreboot-sdk ebuild got revved on Jan 17 and previously on Dec 8.  After it's uprevved the sdk builder needs to produce a new stage3 tarball with the new version, but any builds that happen in-between will have to build that package from source.

I think the sdk builder has now built a stage3 tarball with the latest version so this should quiet down now.

maybe there are less extreme ways to alleviate the problem than making a whole new infrastructure?  Could we separate out some of these toolchains so we only build the ones that we need or build them on demand?


The chromiumos sdk builder takes > 12 hours do only updates 1-2 times a day. That can be a lot of lag.

Comment 31 by lannm@google.com, Jan 18 2018

Cc: martinroth@chromium.org
+martinroth, who has some ideas to mitigate this issue.

Comment 32 by lannm@google.com, Jan 23 2018

Cc: vapier@chromium.org
 Issue 803107  has been merged into this issue.
Cc: adurbin@chromium.org
Can we delay the uprev while still getting a pre-built done?
Would it help to set up a separate build job that only builds the sdk. That way we wouldn't keep the binpkg from being generated until a lot of unrelated packages build successfully.
Cc: chingcodes@chromium.org
#26

I did some initial work to breakout a separate firmware build CI. Enabling the coreboot-sdk in cros was a major blocker before. Seems like a good time to revisit.
Labels: -Restrict-View-Google OS-Chrome

Comment 38 by nxia@chromium.org, Jun 8 2018

Cc: -nxia@chromium.org

Sign in to add a comment