New issue
Advanced search Search tips

Issue 904473 link

Starred by 1 user

Issue metadata

Status: Verified
Merged: issue 898576
Owner:
Closed: Nov 26
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Release builders failing in Archive due to kernel image too large

Project Member Reported by tcwang@chromium.org, Nov 12

Issue description

I have found recent peppy and falco tests failed in Archive stage and the error shows as:

ERROR: sys-kernel/chromeos-kernel-3_8-3.8.11-r750::chromiumos failed (install phase):
 *   Kernel image is larger than 8 MB.

I observed the issues occurs on Friday (11/09) morning (starting around 10:31) on peppy-release. The report is linked here: https://luci-logdog.appspot.com/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8930071116954244048/+/steps/Archive/0/stdout

There's a chance that there are more boards are affected by this.

 
Labels: -Pri-0 Pri-1
Owner: tcwang@chromium.org
Status: Assigned (was: Untriaged)
Owner: zwisler@chromium.org
Mergedinto: 898576
Status: Duplicate (was: Assigned)
dgarrett@ Is duping to  BUG 898576  correct?

That bug is related to coral/unibuilds HW tests. But here we have a build time fail on peppy/falco because of larger than expected kernel image size.
Status: Assigned (was: Duplicate)
I'm sorry, I jumped in too quickly. You are correct.
Here are all the release builders affected:

falco, falco-li, leon, mccloud, monroe, panther, peppy, tricky, wolf, zako.

The check that is failing is in src/third_party/chromiumos-overlay/eclass/cros-kernel2.eclass:cros-kernel2_src_install() where we restrict the kernel size based on the kernel version number.  All the above boards are running 3.8.11 so they are limited to just 8 MiB.

I'm confused about whether this is actually correct, because AFAICT they all inherit the same disk layout from scripts/build_library/legacy_disk_layout.json which gives them a 16 MiB space for their kernel images, and they don't modify that.

That aside, I think my plan is to revert whatever commit increased the kernel size so we can get the builds working again, and then we can investigate increasing the allowed kernel size as a separate task.  If that can't happen for some reason the author of the CL that added size will have to find another way to make space (prune down the kernel config, etc.)
If memory serves...

Some boards have a kernel size restriction because of a signature verification bug in firmware. Since recovery runs from RO firmware, the bug couldn't be fixed in RW firmware without larger kernels breaking the recovery process.
Do these kernels use AFDO and are they built with clang?

If so, passing the -fprofile-sample-accurate cflag might grant some size savings, at some performance cost.

(Handwavy use of 'some' since the size savings are a function of how much code the profile considers to be hot, and the perf loss is a function of how accurate the profiles are.)
Nope, the 3.8.11 kernel is built with gcc.  From the build command:

make -j32 ... CC=x86_64-cros-linux-gnu-gcc ...

So far I haven't been able to figure out what's causing the difference in build size.  The last good build was R72-11245.0.0 and the first bad was R72-11246.0.0.  The kernel code for the v3.8.11 kernel we're building is identical.

Looking at the changelog for those builds:

https://crosland.corp.google.com/log/11245.0.0..11246.0.0

I haven't been able to find anything obvious that would change the kernel size.  Most likely candidates at this point are a build tool change that increased the binary size, or a change in some non-kernel repo that pulled in new code (new USE flags?).

I've been able to reproduce this locally with tip-of-tree, and am currently getting a repo with R72-11245.0.0 that I can upgrade to R72-11246.0.0 so I can debug what the difference is.  I'll continue work tomorrow, unless another Sheriff has time to resolve it before then.

Here's a command to reproduce the issue locally:

USE="fbconsole vtconsole recovery_ramfs tpm i2cdev vfat -kernel_afdo" emerge-falco sys-kernel/chromeos-kernel-3_8 
Cc: vapier@chromium.org
iiuc, the recovery kernel doesn't use kernel modules which is why we enable a bunch of USE flags to build it in.

i don't think we can drop vfat support.  that gets into issues where we might use the efi (vfat) formatted partition for saving logs.

do we still need fbconsole/vtconsole ?  we've switched to frecon/kms now haven't we ?

i wonder what compression algos we're using for the kernel currently.  is it `xz -9` (which i assume would produce the smallest results) ?  can you try enabling USE=kernel_compress_xz ?
We do use the vfat partition to save recovery logs. That was at the request of the Support Ninjas, a while back.
This doesn't drop vfat support.  In all kernel configs as a module.   We're just moving it from being a builtin to a module.
i think you missed the first part:
> iiuc, the recovery kernel doesn't use kernel modules which is why we enable a bunch of USE flags to build it in.

no modules are available to the recovery kernel which is why we have these USE flags to build them in.

you can verify by looking at /lib/ in the root of the recovery initramfs
I'm pretty sure we do have a large part of the kernel configured as modules when making a recovery image?  In looking at the kernel build that happens for a recovery image kernel I see modules being compiled, i.e.:

  LD [M]  drivers/usb/serial/sierra.ko

and installed:

  INSTALL drivers/usb/serial/sierra.ko

I'm not sure how to easily look at the /lib/ dir in the recovery initramfs - can you point me in the right direction?

In any case, using kernel_compress_xz seems like a much better way to go.  It shrinks the kernel image significantly (it looks like around 1 MiB?!), giving us a lot of headroom moving forward.

I've verified that the new recovery image using XZ for the kernel works fine.  I'll update the CL.

Thanks for the help.
the modules are built, but they aren't actually installed into the initramfs.  if they were, we probably would have blown our storage budget long ago.

you can conceptually verify this based on the fact:
- we emerge chromeos-initramfs first which generates the initramfs
- we emerge the custom kernel telling it to use that custom initramfs
- we install that kernel binary directly
- our build doesn't repack the initramfs to include the freshly compiled kernel modules

you can verify this locally by:
- USE="fbconsole vtconsole recovery_ramfs tpm i2cdev vfat -kernel_afdo" emerge-$BOARD chromeos-initramfs
- look at `xzcat /build/$BOARD/var/lib/initramfs/recovery_ramfs.cpio.xz | cpio -itv` output and see no kernel modules

i don't have an incantation offhand to extract the initramfs from the recovery kernel on disk, but you should boot a recovery image with debugging enabled to get a shell and then look at the limited initramfs environment.
Project Member

Comment 17 by bugdroid1@chromium.org, Nov 14

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosutils/+/ce04776969521fc248f81ad690fc7ae47bff8ec1

commit ce04776969521fc248f81ad690fc7ae47bff8ec1
Author: Ross Zwisler <zwisler@google.com>
Date: Wed Nov 14 20:50:21 2018

mod_image_for_recovery: use XZ kernel compression

For some reason recent builds have a slightly larger kernel size which is
causing recovery kernel creation to fail on v3.8.11 based kernels.  No
changes have been made recently to the v3.8.11 codebase itself, so this
size increase is due to something else (toolchain differences, etc.).

Work around this by enabling XZ compression for the kernel.  This ends up
saving us around 1 MiB, giving us plenty of headroom.

BUG= chromium:904473 
TEST=built recovery image for falco to validate size, built and tested
recovery image for octopus to validate that it still works correctly.

Suggested-by: Mike Frysinger <vapier@chromium.org>
Change-Id: I9b60e368bfd293d363312c6c56827d53f5064b87
Signed-off-by: Ross Zwisler <zwisler@google.com>
Reviewed-on: https://chromium-review.googlesource.com/c/1334434
Tested-by: Ross Zwisler <zwisler@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>
Commit-Queue: Ross Zwisler <zwisler@chromium.org>

[modify] https://crrev.com/ce04776969521fc248f81ad690fc7ae47bff8ec1/mod_image_for_recovery.sh

Status: Verified (was: Assigned)

Sign in to add a comment