New issue
Advanced search Search tips

Issue 769824 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2017
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

kallsyms fails building emerge-arm64-generic chromeos-kernel-4_12

Project Member Reported by groeck@chromium.org, Sep 28 2017

Issue description

"emerge-arm64-generic chromeos-kernel-4_12" fails with the following error.

kallsyms failure: relative symbol value 0xffffff8008081000 out of range in relative mode

Analysis shows bad symbols in the symbol table (System.map).

000000000000000e n __efistub_$d
000000000000000e n __efistub_$d
000000000000000e n __efistub_$d
000000000000000e n __efistub_$d
...

This is how the symbols should look like (generated with a working toolchain):
ffffff9009185bf0 t __efistub_$d
ffffff9009185e90 t __efistub_$d
ffffff9009185f38 t __efistub_$d
ffffff90091860f8 t __efistub_$d
ffffff9009186230 t __efistub_$d
ffffff90091863e8 t __efistub_$d
...

Due to the bad symbol table values, the relative address base is set to 0x0e (instead of something like 0xffffff90080bc000, ie _head), and all real symbols are considered to be out of range.

Toolchain versions:
x86_64-cros-linux-gnu-gcc.real (4.9.2_cos_gg_4.9.2-r164-0c5a656a1322e137fa4a251f2ccc6c4022918c0a_4.9.2-r164) 4.9.x 20150123 (prerelease)

GNU gold (binutils-2.27.0-r3-85fafaf039799ebc8053bf36ce1c6e6df7adbbec_cos_gg 2.27.0.20170315) 1.12

 

Comment 1 by groeck@chromium.org, Sep 28 2017

Description: Show this description
gold should not be used for kernel. is that the issue here?
Labels: OS-Chrome

Comment 4 by groeck@chromium.org, Sep 28 2017

FWIW, gold _is_ used for kernel builds. I had to explicitly disable it for my x86_64 test builds because the version in our toolchain has the bug described in https://bugzilla.kernel.org/show_bug.cgi?id=187841.
That is not the issue here, though. I tried to disable gold for arm64, but the result was the same.

I don't think we have had any major toolchain changes recently. Does this issue repro with clang as well?

Comment 6 by groeck@chromium.org, Sep 28 2017

"USE=clang emerge-arm64-generic chromeos-kernel-4_12" tells me:

die "Clang is not yet supported for ${ARCH}"

The problem may have existed before; symbol handling was updated in kernel 4.12 and now uses relative addressing to reduce image size. This triggered a number of problems with binutils, such as the gold issue referenced in #4.

For reference, the "working" toolchain is:
aarch64-linux-gcc.br_real (Buildroot 2015.11.1-00010-g77c236c) 5.2.0
GNU ld (GNU Binutils) 2.25.1

Comment 7 by cmt...@chromium.org, Sep 28 2017

This looks similar to another bug I saw (and fixed) back in May; let me take a closer look at this bug.

Comment 8 by cmt...@chromium.org, Sep 28 2017

Owner: groeck@chromium.org
Yes, this is pretty much the same bug.  The fix is very easy.  The file to be patched is ~/trunk/src/third_party/kernel/v4.12/scripts/kallsyms.c.

The fix is:

diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 5d554419170b..dc91c2e31f00 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -158,7 +158,7 @@ static int read_symbol(FILE *in, struct sym_entry *s)
        else if (str[0] == '$')
                return -1;
        /* exclude debugging symbols */
-       else if (stype == 'N')
+       else if (toupper(stype) == 'N')
                return -1;
 
        /* include the type field in the symbol name, so that it 

Someone on the kernel team should probably make this change.

Comment 9 by groeck@chromium.org, Sep 28 2017

#8: Please provide background. This patch is not in the upstream kernel, suggesting that it either was not submitted or that it is not really a kernel problem. I am especially curious to understand why no one outside Google seems to experience this problem.

This was encountered by a ChromeOS Partner trying to build the 4.9 kernel using the ChromeOS toolchain (aarch64). (see https://b.corp.google.com/issues/36661449).

The problem appears to be that the toolchain sets the symbol type for  "__efistub_$d" type to "n" instead of "N" (which is what kallsyms appears to expect).

In the bug referenced above, an alternate fix was to change the compilation from using the -fgcc-record-switches flag to -fno-gcc-record-switches

I tried to do that with this bug, but I could not make that work this time around.

#9. We add compiler options in our compiler wrappers that are not enabled by default. -fgcc-record-switches is one of them. 
This Cl was supposed to remove  -fgcc-record-switches flag but looks like it didn't.
https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/536233 
I may have made an error earlier when testing the flags solution; I am trying it again to see if it will really work in this case or  not.
-fgcc-record-switches is a valid flag, so if things need updating to support it, that should happen as well
#11: Guess that explains why I could not disasble it on the command line.

#14: Agreed. -fgcc-record-switches generates a section named .GCC.command.line which is ignored by modpost. I had, however, observed some other errors in association with .GCC.command.line when trying to build x86_64 images without "-g"; see chromium:769037 for details. Also, using my own toolchain, adding "-fgcc-record-switches" to the build flags works just fine. So there must be something else.

I don't mind adding the patch from #8 to the kernel if necessary, ie if the problem can be encountered by others as well. However, at this point I am concerned that it just paints over some other problem.

so, maybe it is the interaction with other flags we add. (need more than one flag added to reproduce the problem)

From here: https://cs.corp.google.com/chromeos_public/src/third_party/chromiumos-overlay/sys-devel/gcc/files/sysroot_wrapper.hardened.body

You can see we add:
FLAGS_TO_ADD = set(['-fstack-protector-strong', '-fPIE', '-pie',
                    '-D_FORTIFY_SOURCE=2',
                    '-fno-omit-frame-pointer',
                   ])
gcc_flags = ['-frecord-gcc-switches',
               '-fno-reorder-blocks-and-partition',
               '-Wno-unused-local-typedefs',
               '-Wno-maybe-uninitialized',
              ]

so, it may be the combination with -fPIE?

you can see the exact command line if you add the option -print-cmdline (which is a wrapper option).





Ah! and for ARM we add -mthumb. Maybe that is the one ...
FLAGS_TO_ADD.add('-mthumb')
#17: This was arm64, which does not know about -mthumb. I'll do some experiments; problem is that the flags are added through the backdoor, meaning I can't just use KCFLAGS when I use another toolchain to get the same results. This only happens with efi symbols, meaning there is something different in that directory / build. I'll try to create some front-end for use with my self-built toolchain and try to replicate this way.

I finished re-testing with -fno-record-gcc-switches, and the package built just fine (for arm64-generic).
-frecord-gcc-switches alone is sufficient to trigger the problem. Reproduced with gcc 5.2.0/binutils 2.25.1 and gcc 7.2.0/binutils 2.9 (both built using buildroot) by adding -fgcc-record-switches to KCFLAGS. I thought I had tested that before, and it didn't fail, but I must have done something wrong.
This is only seen on little endian arm64 images with CONFIG_EFI enabled. It appears that some directory specific combination of compiler flags in efi/libstub triggers the problem.

The following kernel change "fixes" the problem for me.
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index e078390ba477..a67429940c19 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -10,7 +10,7 @@ cflags-$(CONFIG_X86)          += -m$(BITS) -D__KERNEL__ -O2 \
                                   -fPIC -fno-strict-aliasing -mno-red-zone \
                                   -mno-mmx -mno-sse
 
-cflags-$(CONFIG_ARM64)         := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie
+cflags-$(CONFIG_ARM64)         := $(subst -pg,,$(subst -frecord-gcc-switches,,$(KBUILD_CFLAGS))) -fpie
 cflags-$(CONFIG_ARM)           := $(subst -pg,,$(KBUILD_CFLAGS)) \
                                   -fno-builtin -fpic -mno-single-pic-base

I'll discuss with upstream.

Status: Assigned (was: Untriaged)
arm64 builds with "-fno-record-gcc-switches" generates symbols named "$d". Those are filtered by kallsyms.

Problem is that efi is built with "--prefix-symbols=__efistub_". As a result, all $d symbols are converted to __efistub_$d. kallsyms does not recognize "__efistub_$d" and does not filter them. This means that the combination of "--prefix-symbols" and "-fgcc-record-switches" is toxic for arm64 kernel builds.

Either "-fgcc-record-switches" will have to be filtered out if "--prefix-symbols" is enabled, or kallsyms will need to handle symbols of type 'n'. We'll see what upstream has to say.

Follow-up on #4: For some reason, it appears that gold is no longer used for kernel builds, and I no longer have to disable it. No idea what changed.

I think the confusion is because we have a different default linker for aarch64:

$ i686-pc-linux-gnu-ld -v
GNU gold (binutils-2.27.0-r4-53dd00a1a34ebf5251f6210d778768b4157c5e11_cos_gg 2.27.0.20170315) 1.12

$ x86_64-cros-linux-gnu-ld -v
GNU gold (binutils-2.27.0-r4-53dd00a1a34ebf5251f6210d778768b4157c5e11_cos_gg 2.27.0.20170315) 1.12

$ armv7a-cros-linux-gnueabi-ld -v
GNU gold (binutils-2.27.0-r4-53dd00a1a34ebf5251f6210d778768b4157c5e11_cos_gg 2.27.0.20170315) 1.12

$ aarch64-cros-linux-gnu-ld -v
GNU ld (binutils-2.27.0-r4-53dd00a1a34ebf5251f6210d778768b4157c5e11_cos_gg) 2.27.0.20170315

i.e. default for aarch64 is ld.bfd, and for i686/x86_64/armv7a it is ld.gold.

Unless the kernel builds are selecting a linker explicitly with -fuse-ld flag, the default linker will get used.

Project Member

Comment 25 by bugdroid1@chromium.org, Oct 5 2017

Labels: merge-merged-chromeos-4.12
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a8369d79f01483c4e7634f814c7bbc6b018e78ff

commit a8369d79f01483c4e7634f814c7bbc6b018e78ff
Author: Guenter Roeck <linux@roeck-us.net>
Date: Thu Oct 05 12:17:03 2017

FROMLIST: scripts/kallsyms: Ignore symbol type 'n'

gcc on aarch64 may emit synbols of type 'n' if the kernel is built with
'-frecord-gcc-switches'. In most cases, those symbols are reported
with nm as
	000000000000000e n $d
and with objdump as
	0000000000000000 l    d  .GCC.command.line	0000000000000000 .GCC.command.line
	000000000000000e l       .GCC.command.line	0000000000000000 $d

Those symbols are detected in is_arm_mapping_symbol() and ignored. However,
if "--prefix-symbols=<prefix>" is configured as well, the situation is
different. For example, in efi/libstub, arm64 images are built with
	'--prefix-alloc-sections=.init --prefix-symbols=__efistub_'.
In combination with '-frecord-gcc-switches', the symbols are now reported
by nm as:
	000000000000000e n __efistub_$d
and by objdump as:
	0000000000000000 l    d  .GCC.command.line	0000000000000000 .GCC.command.line
	000000000000000e l       .GCC.command.line	0000000000000000 __efistub_$d

Those symbols are no longer ignored and included in the base address
calculation. This results in a base address of 000000000000000e, which
in turn causes kallsyms to abort with
    kallsyms failure:
	relative symbol value 0xffffff900800a000 out of range in relative mode

The problem is seen in little endian arm64 builds with CONFIG_EFI enabled
and with '-frecord-gcc-switches' set in KCFLAGS.

Explicitly ignore symbols of type 'n' since those are clearly debug
symbols.

BUG=chromium:722580, chromium:769824 
TEST=Little endian arm64 build with efi enabled

Change-Id: Icd624f8f27d8ab11f9393c8fe477be587a83b61b
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(am from https://patchwork.kernel.org/patch/9985205/)
Reviewed-on: https://chromium-review.googlesource.com/699661
Reviewed-by: Caroline Tice <cmtice@chromium.org>

[modify] https://crrev.com/a8369d79f01483c4e7634f814c7bbc6b018e78ff/scripts/kallsyms.c

Project Member

Comment 26 by bugdroid1@chromium.org, Oct 5 2017

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/d52ea05f93e2853d7430763b35ecb70b6db224d2

commit d52ea05f93e2853d7430763b35ecb70b6db224d2
Author: Guenter Roeck <linux@roeck-us.net>
Date: Thu Oct 05 12:17:00 2017

FROMLIST: scripts/kallsyms: Ignore symbol type 'n'

gcc on aarch64 may emit synbols of type 'n' if the kernel is built with
'-frecord-gcc-switches'. In most cases, those symbols are reported
with nm as
	000000000000000e n $d
and with objdump as
	0000000000000000 l    d  .GCC.command.line	0000000000000000 .GCC.command.line
	000000000000000e l       .GCC.command.line	0000000000000000 $d

Those symbols are detected in is_arm_mapping_symbol() and ignored. However,
if "--prefix-symbols=<prefix>" is configured as well, the situation is
different. For example, in efi/libstub, arm64 images are built with
	'--prefix-alloc-sections=.init --prefix-symbols=__efistub_'.
In combination with '-frecord-gcc-switches', the symbols are now reported
by nm as:
	000000000000000e n __efistub_$d
and by objdump as:
	0000000000000000 l    d  .GCC.command.line	0000000000000000 .GCC.command.line
	000000000000000e l       .GCC.command.line	0000000000000000 __efistub_$d

Those symbols are no longer ignored and included in the base address
calculation. This results in a base address of 000000000000000e, which
in turn causes kallsyms to abort with
    kallsyms failure:
	relative symbol value 0xffffff900800a000 out of range in relative mode

The problem is seen in little endian arm64 builds with CONFIG_EFI enabled
and with '-frecord-gcc-switches' set in KCFLAGS.

Explicitly ignore symbols of type 'n' since those are clearly debug
symbols.

BUG=chromium:722580, chromium:769824 
TEST=Little endian arm64 build with efi enabled

Change-Id: Icd624f8f27d8ab11f9393c8fe477be587a83b61b
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(am from https://patchwork.kernel.org/patch/9985205/)
Reviewed-on: https://chromium-review.googlesource.com/701496

[modify] https://crrev.com/d52ea05f93e2853d7430763b35ecb70b6db224d2/scripts/kallsyms.c

Components: OS>Kernel
Status: Fixed (was: Assigned)
Can't really blame the toolchain for this problem.

Comment 28 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Comment 29 by dchan@chromium.org, Jan 23 2018

Status: Fixed (was: Archived)

Sign in to add a comment