Issue metadata
Sign in to add a comment
|
Release builders failing in Archive due to kernel image too large |
||||||||||||||||||||||||
Issue descriptionI have found recent peppy and falco tests failed in Archive stage and the error shows as: ERROR: sys-kernel/chromeos-kernel-3_8-3.8.11-r750::chromiumos failed (install phase): * Kernel image is larger than 8 MB. I observed the issues occurs on Friday (11/09) morning (starting around 10:31) on peppy-release. The report is linked here: https://luci-logdog.appspot.com/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8930071116954244048/+/steps/Archive/0/stdout There's a chance that there are more boards are affected by this.
,
Nov 12
,
Nov 12
,
Nov 12
dgarrett@ Is duping to BUG 898576 correct? That bug is related to coral/unibuilds HW tests. But here we have a build time fail on peppy/falco because of larger than expected kernel image size.
,
Nov 12
I'm sorry, I jumped in too quickly. You are correct.
,
Nov 12
Here are all the release builders affected: falco, falco-li, leon, mccloud, monroe, panther, peppy, tricky, wolf, zako. The check that is failing is in src/third_party/chromiumos-overlay/eclass/cros-kernel2.eclass:cros-kernel2_src_install() where we restrict the kernel size based on the kernel version number. All the above boards are running 3.8.11 so they are limited to just 8 MiB. I'm confused about whether this is actually correct, because AFAICT they all inherit the same disk layout from scripts/build_library/legacy_disk_layout.json which gives them a 16 MiB space for their kernel images, and they don't modify that. That aside, I think my plan is to revert whatever commit increased the kernel size so we can get the builds working again, and then we can investigate increasing the allowed kernel size as a separate task. If that can't happen for some reason the author of the CL that added size will have to find another way to make space (prune down the kernel config, etc.)
,
Nov 12
If memory serves... Some boards have a kernel size restriction because of a signature verification bug in firmware. Since recovery runs from RO firmware, the bug couldn't be fixed in RW firmware without larger kernels breaking the recovery process.
,
Nov 12
Do these kernels use AFDO and are they built with clang? If so, passing the -fprofile-sample-accurate cflag might grant some size savings, at some performance cost. (Handwavy use of 'some' since the size savings are a function of how much code the profile considers to be hot, and the perf loss is a function of how accurate the profiles are.)
,
Nov 12
Nope, the 3.8.11 kernel is built with gcc. From the build command: make -j32 ... CC=x86_64-cros-linux-gnu-gcc ... So far I haven't been able to figure out what's causing the difference in build size. The last good build was R72-11245.0.0 and the first bad was R72-11246.0.0. The kernel code for the v3.8.11 kernel we're building is identical. Looking at the changelog for those builds: https://crosland.corp.google.com/log/11245.0.0..11246.0.0 I haven't been able to find anything obvious that would change the kernel size. Most likely candidates at this point are a build tool change that increased the binary size, or a change in some non-kernel repo that pulled in new code (new USE flags?). I've been able to reproduce this locally with tip-of-tree, and am currently getting a repo with R72-11245.0.0 that I can upgrade to R72-11246.0.0 so I can debug what the difference is. I'll continue work tomorrow, unless another Sheriff has time to resolve it before then. Here's a command to reproduce the issue locally: USE="fbconsole vtconsole recovery_ramfs tpm i2cdev vfat -kernel_afdo" emerge-falco sys-kernel/chromeos-kernel-3_8
,
Nov 13
,
Nov 13
iiuc, the recovery kernel doesn't use kernel modules which is why we enable a bunch of USE flags to build it in. i don't think we can drop vfat support. that gets into issues where we might use the efi (vfat) formatted partition for saving logs. do we still need fbconsole/vtconsole ? we've switched to frecon/kms now haven't we ? i wonder what compression algos we're using for the kernel currently. is it `xz -9` (which i assume would produce the smallest results) ? can you try enabling USE=kernel_compress_xz ?
,
Nov 13
We do use the vfat partition to save recovery logs. That was at the request of the Support Ninjas, a while back.
,
Nov 13
This doesn't drop vfat support. In all kernel configs as a module. We're just moving it from being a builtin to a module.
,
Nov 13
i think you missed the first part: > iiuc, the recovery kernel doesn't use kernel modules which is why we enable a bunch of USE flags to build it in. no modules are available to the recovery kernel which is why we have these USE flags to build them in. you can verify by looking at /lib/ in the root of the recovery initramfs
,
Nov 13
I'm pretty sure we do have a large part of the kernel configured as modules when making a recovery image? In looking at the kernel build that happens for a recovery image kernel I see modules being compiled, i.e.: LD [M] drivers/usb/serial/sierra.ko and installed: INSTALL drivers/usb/serial/sierra.ko I'm not sure how to easily look at the /lib/ dir in the recovery initramfs - can you point me in the right direction? In any case, using kernel_compress_xz seems like a much better way to go. It shrinks the kernel image significantly (it looks like around 1 MiB?!), giving us a lot of headroom moving forward. I've verified that the new recovery image using XZ for the kernel works fine. I'll update the CL. Thanks for the help.
,
Nov 13
the modules are built, but they aren't actually installed into the initramfs. if they were, we probably would have blown our storage budget long ago. you can conceptually verify this based on the fact: - we emerge chromeos-initramfs first which generates the initramfs - we emerge the custom kernel telling it to use that custom initramfs - we install that kernel binary directly - our build doesn't repack the initramfs to include the freshly compiled kernel modules you can verify this locally by: - USE="fbconsole vtconsole recovery_ramfs tpm i2cdev vfat -kernel_afdo" emerge-$BOARD chromeos-initramfs - look at `xzcat /build/$BOARD/var/lib/initramfs/recovery_ramfs.cpio.xz | cpio -itv` output and see no kernel modules i don't have an incantation offhand to extract the initramfs from the recovery kernel on disk, but you should boot a recovery image with debugging enabled to get a shell and then look at the limited initramfs environment.
,
Nov 14
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/crosutils/+/ce04776969521fc248f81ad690fc7ae47bff8ec1 commit ce04776969521fc248f81ad690fc7ae47bff8ec1 Author: Ross Zwisler <zwisler@google.com> Date: Wed Nov 14 20:50:21 2018 mod_image_for_recovery: use XZ kernel compression For some reason recent builds have a slightly larger kernel size which is causing recovery kernel creation to fail on v3.8.11 based kernels. No changes have been made recently to the v3.8.11 codebase itself, so this size increase is due to something else (toolchain differences, etc.). Work around this by enabling XZ compression for the kernel. This ends up saving us around 1 MiB, giving us plenty of headroom. BUG= chromium:904473 TEST=built recovery image for falco to validate size, built and tested recovery image for octopus to validate that it still works correctly. Suggested-by: Mike Frysinger <vapier@chromium.org> Change-Id: I9b60e368bfd293d363312c6c56827d53f5064b87 Signed-off-by: Ross Zwisler <zwisler@google.com> Reviewed-on: https://chromium-review.googlesource.com/c/1334434 Tested-by: Ross Zwisler <zwisler@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> Commit-Queue: Ross Zwisler <zwisler@chromium.org> [modify] https://crrev.com/ce04776969521fc248f81ad690fc7ae47bff8ec1/mod_image_for_recovery.sh
,
Nov 26
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by jclinton@chromium.org
, Nov 12Owner: tcwang@chromium.org
Status: Assigned (was: Untriaged)