coreboot-sdk is slow to build | InitSDK step suddenly taking nearly an hour on some builders |
||||||||||||||
Issue descriptionhttp://shortn/_aLJCJQQ5kn This is a major speed degradation to the CQ.
,
Dec 9 2017
,
Dec 9 2017
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fgale-paladin%2F4707%2F%2B%2Frecipes%2Fsteps%2FInitSDK%2F0%2Fstdout suggests to me that coreboot-sdk became a very long pole.
,
Dec 9 2017
Any idea why this package got slow?
,
Dec 9 2017
,
Dec 11 2017
The package is always slow to build, but it isn't updated very often.
,
Dec 11 2017
metrics from OP indicate this is no longer a problem. pgeorgi@ was there actaully a new change that initiated this rebuild? I couldn't find it.
,
Dec 11 2017
https://chromium-review.googlesource.com/c/chromiumos/third_party/coreboot/+/816945 was added a bit before this bug was opened. So I guess that's the culprit? note to self: when upstreaming, batch changes to util/crossgcc until there's something interesting, to reduce the number of coreboot-sdk rebuilds.
,
Dec 11 2017
There was multiple builds in a row which were slow. Also, it is holding up the master build: https://viceroy.corp.google.com/chromeos/build_details?build_config=master-paladin&build_number=17143 https://viceroy.corp.google.com/chromeos/build_details?build_config=master-paladin&build_number=17144 https://viceroy.corp.google.com/chromeos/build_details?build_id=2112626 Is it needed on the master?
,
Dec 11 2017
> There was multiple builds in a row which were slow. Presumably until a green run that included the prebuilt. > Is it needed on the master? Yes and no. The master needs a working chroot to run BinhostTest, so it needs to run InitSdk. Although, the master never actually needs the corebood-sdk I bet. We'd need to make the concept of "working chroot" smarter to avoid this slowdown on master. (but I don't see the point because all the slaves slow down by about the same amount anyway).
,
Dec 11 2017
Well this will result in a double slowdown -- slow down for the master, and then again for the slave.
,
Dec 11 2017
> Well this will result in a double slowdown -- slow down for the master, and then again for the slave. Slaves are already launched by the time master runs initsdk, so these are parallel poles.
,
Dec 11 2017
Ah yes, my bad. I got confused and was just thinking of the Commit Queue Completion blocking.
,
Dec 14 2017
Here's another case of InitSDK being too slow on what should be unrelated changes: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/789842 https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/70158 (I'm not sure if this is from Coreboot SDK changes, but I'm pointing out that these changes can have effects across all developers if there are slow pre-CQ build times and multiple pre-CQ failures with unrelated changes).
,
Jan 17 2018
,
Jan 17 2018
Upping to P1. This is back, hitting the pre-cq and causing initsdk to take 45+ minutes. https://pcon.corp.google.com/p#chrome-infra/queryplayground?yAxisMin=0&yAxisMin2=0&oldHeatmap=false&query=CAEY9wOKAbABCpwBCA96lwEKlAEqGgoLbWV0cmljOm5hbWUSCwgBIgdJbml0U0RLKiUKE21ldHJpYzpidWlsZF9jb25maWcSDggCIgouKi1wYWxhZGluMk8KGW1vbmFyY2guYWNxdWlzaXRpb25zLlRhc2sSMgowL2Nocm9tZS9pbmZyYS9jaHJvbWVvcy9jYnVpbGRib3Qvc3RhZ2UvZHVyYXRpb25zIICYmrwEKgIgATCAwN3uwQKSAQwKBggBGKCNBhIAYAA&names=Query%201&duration=1w https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/no_vmtest_pre_cq/208194#
,
Jan 17 2018
To summarize: building coreboot-sdk is always slow, but that doesn't usually matter as it is rarely changed. It happens to be a bigger problem now because we haven't had a successful build since the last change to coreboot-sdk to get a new prebuilt. What is the request here? To find a way to avoid building coreboot-sdk in InitSDK?
,
Jan 18 2018
From what I can see "emerge coreboot-sdk" builds a compiler which isn't particularly quick even on a z840, I'm wondering if something else changed that caused it to somehow not do that before?
,
Jan 18 2018
also to add to #18 -- coreboot-sdk is sequentially building a toolchain for each architecture and that is currently the following:
local architectures=(
i386-elf
x86_64-elf
arm-eabi
aarch64-elf
mipsel-elf
nds32le-elf
)
so all of those toolchains are being built sequentially. I'm not sure if we really need all of those?
I did find when InitSDK was fast, it didn't even emerge coreboot-sdk, so I don't think we can blame coreboot-sdk itself for the slowness, but rather the decision to emerge it in the first place.
InitSDK does this:
01:18:03: INFO: RunCommand: /b/c/cbuild/repository/chromite/bin/cros_sdk --nouse-image 'USE=chrome_internal' 'FEATURES=separatedebug' -- ./run_chroot_version_hooks in /b/c/cbuild/repository
so I'm trying to figure out why we used to not build this package as a result but always do it now
,
Jan 18 2018
Is it because the CQ isn't green and haven't had a proper uprev in 24+ hours so no prebuilts?
,
Jan 18 2018
re #20 - From what I can see the difference may be whether or not it decides to create a new chroot. In cases where it's fast -- there's no output -- recent Example from the master: build 17504 has a fast InitSDK - 6 sec: https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F17504%2F%2B%2Frecipes%2Fsteps%2FInitSDK%2F0%2Fstdout In cases where it's slow -- it reports that it needs to recreate the chroot build 17505 and later were slow at over an hour: https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F17505%2F%2B%2Frecipes%2Fsteps%2FInitSDK%2F0%2Fstdout it starts by downloading a sdk tarball and then re-creates the chroot and the long pole in that process is building coreboot-sdk That seems to be true even back when this was reported back in december. master-paladin build 17142 was quick with no output: https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F17142%2F%2B%2Frecipes%2Fsteps%2FInitSDK%2F0%2Fstdout but subsequent builds 17143, 17144, 17145 were slow and all re-created the chroot. 17146 went back to being fast again and didn't re-create a chroot. so I'm trying to figure out why it sometimes decides to re-create the chroot but not do it other times
,
Jan 18 2018
chroot might get blown away after failed builds?
,
Jan 18 2018
re #22 -- that's an interesting idea, and I think there could be something to that I looked at some recent master paladin builds to see: https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17497 - build passed - InitSDK took 10 minutes -- previous build succeeded -- did create chroot https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17498 - build failed - Init SDK took 5 seconds -- previous build succeeded -- did not recreate chroot https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17499 - build failed - Init SDK took 5 minutes -- previous build failed -- did recreate chroot https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17500 - build passed - Init SDK took 5 minutes -- previous build failed -- did recreate chroot https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17501 - build passed - Init SDK took 5 seconds -- previous build succeeded -- did not recreate chroot In this sample, it looks like whenever the previous builds failed it will re-create the chroot. However, sometimes even when the previous build succeeds it still recreates the chroot, like in 17497. Also interesting in here is that there are some cases where it recreates the chroot but doesn't end up emerging coreboot-sdk, and so InitSDK only takes a few minutes rather than 45 minutes or more. So I need to dig further into how it's deciding whether to re-create the chroot and then how it's deciding what to build at that point. I'm guessing there might be some relationship with the sdk builder?
,
Jan 18 2018
+dgarrett (build), cmtice, llozano (sdk)
,
Jan 18 2018
,
Jan 18 2018
A more dramatic fix for this would be to move firmware builds out of the normal build flow (which would mean we wouldn't need to build coreboot-sdk at all). I know this has been proposed in the past and even prototyped; not sure why it hasn't moved forward.
,
Jan 18 2018
We do blow away the chroot on builders after a failed build, as well as when changing branches between builds. This is by design, since a failed build can (and has) left a chroot in a state that causes future builds to fail. The chroot.img work is supposed to allow a fast rollback to a "clean" chroot, but I don't think it was ever rolled out because it had it's own problems that were never fully resolved.
,
Jan 18 2018
Reguarding #26, I would love to see firmware pulled out into it's own thing, but nobody has stepped up to own that. A doc was sent out to build a whole new infrastructure, which my team was fine with, but not willing to own. Nobody else has stepped up. I also think we could work something out that would fit into our existing system, but we'd still need firmware team (or someone) to own the firmware specific parts. I'm happy to help you work out what the boundaries should look like and how it should work.
,
Jan 18 2018
ok to answer my own question about why coreboot-sdk is getting built sometimes -- the coreboot-sdk ebuild got revved on Jan 17 and previously on Dec 8. After it's uprevved the sdk builder needs to produce a new stage3 tarball with the new version, but any builds that happen in-between will have to build that package from source. I think the sdk builder has now built a stage3 tarball with the latest version so this should quiet down now. maybe there are less extreme ways to alleviate the problem than making a whole new infrastructure? Could we separate out some of these toolchains so we only build the ones that we need or build them on demand?
,
Jan 18 2018
The chromiumos sdk builder takes > 12 hours do only updates 1-2 times a day. That can be a lot of lag.
,
Jan 18 2018
+martinroth, who has some ideas to mitigate this issue.
,
Jan 23 2018
,
Jan 23 2018
,
Jan 23 2018
Can we delay the uprev while still getting a pre-built done?
,
Jan 23 2018
Would it help to set up a separate build job that only builds the sdk. That way we wouldn't keep the binpkg from being generated until a lot of unrelated packages build successfully.
,
Jan 23 2018
#26 I did some initial work to breakout a separate firmware build CI. Enabling the coreboot-sdk in cros was a major blocker before. Seems like a good time to revisit.
,
May 30 2018
,
Jun 8 2018
|
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by akes...@chromium.org
, Dec 9 2017