New issue
Advanced search Search tips

Issue 738283 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Jul 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 711461



Sign in to add a comment

3.18 x86_64 kernels fail to boot with binutils 2.27

Project Member Reported by manojgupta@chromium.org, Jun 30 2017

Issue description

Tested on Caroline board.
USB image fails to boot and is stuck at a blank screen. No ssh etc can be done to machine.

If chromeos-kernel is built with older binutils+gcc, booting/installation etc works fine
 
Blockedon: -711461
Blocking: 711461
Disassembly shows a newer relocation R_X86_64_REX_GOTPCRELX used. With binutils 2.25, R_X86_64_GOTPCREL was generated.
Bisection shows arch/x86/boot/compressed/cmdline.o as the first bad file.
Probably more files x86/boot/ have problems with binutils 2.27
Bisecting on bintuls show that CL that adds support for R_X86_64_REX_GOTPCRELX is converting R_X86_64_GOTPCREL to R_X86_64_REX_GOTPCRELX.

commit 56ceb5b5405af23eddd12e12d8ba849010120324
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Oct 22 04:49:20 2015 -0700

    Add R_X86_64_[REX_]GOTPCRELX support to gas and ld
    
    This patch adds support for the R_X86_64_GOTPCRELX and
    R_X86_64_REX_GOTPCRELX relocations proposed in

Forcing assembler to not generate R_X86_64_REX_GOTPCRELX did not help :(.

The other change is related to nops:

New:
2e2:	0f 1f 40 00          	nopl   0x0(%rax)
 2e6:	66 2e 0f 1f 84 00 00 00 00 00 	nopw   %cs:0x0(%rax,%rax,1)

Old:
66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 	data16 data16 data16 data16 nopw %cs:0x0(%rax,%rax,1)

This corresponds to the following lines in asm generated by compiler:
        .p2align 5,,20
        .p2align 3


The following commit changed the nop behavior. The revert is not clean. Will try to do a manual merge and check if the issue is fixed.

commit 80b8656cbaaf09b685c2f3c9dd96f61274ed7fb7
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 20 04:39:04 2015 -0700

    Limit multi-byte nop instructions to 10 bytes
    
    There is no performance advantage to use multi-byte nop instructions
    greater than 10 bytes.  This patch limits multi-byte nop instructions
    to 10 bytes.  Since there is only one way to encode multi-byte nop
    instructions now, it also removed redundant nop tests.
    
    gas/
    
            * config/tc-i386.c (i386_align_code): Limit multi-byte nop
            instructions to 10 bytes.

Reverting nop change didn't help. Verified that the assembly of arch/x86/boot/* identical but image still does not boot.
Doing another bisection round to check if more object files appear to be problematic.
Looking at the asm files in x86/boot, only notable difference is in x86/boot/header.o

Old:
0000000000000065 <die>:
  65:	f4                   	hlt    
  66:	e9                   	.byte 0xe9
	...
	67: R_X86_64_PC16	die-0x2 

New:
0000000000000065 <die>:
  65:	f4                   	hlt    
  66:	eb fd                	jmp    65 <die> // Infinite loop
This change is the suspect. It applies a patch to kernel which we'll also need to apply. 
Testing if this fixes the issue.

commit 8dcea93252a9ea7dff57e85220a719e2a5e8ab41
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri May 15 03:17:31 2015 -0700

    Add -mshared option to x86 ELF assembler
    
    This patch adds -mshared option to x86 ELF assembler.  By default,
    assembler will optimize out non-PLT relocations against defined non-weak
    global branch targets with default visibility.  The -mshared option tells
    the assembler to generate code which may go into a shared library
    where all non-weak global branch targets with default visibility can
    be preempted.  The resulting code is slightly bigger.  This option
    only affects the handling of branch instructions.
    
    This Linux kernel patch is needed to create a working x86 Linux kernel if
    it hasn't been applied:
    
    diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
    index ae6588b..b91a00c 100644
    --- a/arch/x86/kernel/head_64.S
    +++ b/arch/x86/kernel/head_64.S
    @@ -339,8 +339,8 @@ early_idt_handlers:
            i = i + 1
            .endr
    
    -/* This is global to keep gas from relaxing the jumps */
    -ENTRY(early_idt_handler)
    +/* This is weak to keep gas from relaxing the jumps */
    +WEAK(early_idt_handler)
            cld
    
            cmpl $2,(%rsp)          # X86_TRAP_NMI

Verified that just forcing -mshared in assembler fixed the kernel boot issue.
https://chromium-review.googlesource.com/553544
Tested image: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/release/builds/12575


Cc: diand...@chromium.org
Cc: groeck@chromium.org sonnyrao@chromium.org
Verified that the following patches for kernel 3.10, 3.1 and 3.18 fixes the boot issue.
https://chromium-review.googlesource.com/c/565908
https://chromium-review.googlesource.com/c/565355
https://chromium-review.googlesource.com/c/565354

Devices with 3.8 kernel didn't have the boot issue in the first place.
Should we send them in, then?
For patch to 3.14 kernel:
  samus-release with binutils-2.27: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/release/builds/12655
  Note that the device does not boot with the new image.
  samus-release with binutils-2.27 and this patch applied: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/release/builds/12656
  Device boots fine and runs tests.

For patch to 3.18 kernel:
  caroline-release with binutils-2.27: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/release/builds/12653
  Note that the device does not boot with the new image.
  caroline-release with binutils-2.27 and this patch applied: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/release/builds/12658
  Device boots fine and runs tests.

I could not find a board to reproduce the issue or test the patch to 3.10 kernel.

I think we've moved all x86 systems off of 3.10, so that's why there's no board to reproduce the issue there.

The changes look like they're all +2'ed so I think we should land them
OK, marking ready.  I'll land the 3.10 since we think it's good and it seems better not to leave 3.10 broken.
based on the public list, the only devices still using 3.10 are nvidia SoCs, and iiuc that's more because we haven't been able to get them to help with updating to a newer release.
http://dev.chromium.org/chromium-os/developer-information-for-chrome-os-devices

all the 3.10 systems pulled up to 4.4
Project Member

Comment 18 by bugdroid1@chromium.org, Jul 13 2017

Labels: merge-merged-chromeos-3.14
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/781bec55d5355d81d1d0792fec9a5b106d57115b

commit 781bec55d5355d81d1d0792fec9a5b106d57115b
Author: Andy Lutomirski <luto@kernel.org>
Date: Thu Jul 13 00:55:51 2017

BACKPORT: x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers

commit 425be5679fd292a3c36cb1fe423086708a99f11a upstream.

The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart.  It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.

Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count.  Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code.  The new scheme should be much
clearer to future readers.

While we're at it, improve the comments and rename the array and
common code.

Binutils may start relaxing jumps to non-weak labels.  If so,
this change will fix our build, and we may need to backport this
change.

Before, on x86_64:

  0000000000000000 <early_idt_handlers>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 00 00 00 00          jmpq   9 <early_idt_handlers+0x9>
                          5: R_X86_64_PC32        early_idt_handler-0x4
  ...
    48:   66 90                   xchg   %ax,%ax
    4a:   6a 08                   pushq  $0x8
    4c:   e9 00 00 00 00          jmpq   51 <early_idt_handlers+0x51>
                          4d: R_X86_64_PC32       early_idt_handler-0x4
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   e9 00 00 00 00          jmpq   120 <early_idt_handler>
                          11c: R_X86_64_PC32      early_idt_handler-0x4

After:

  0000000000000000 <early_idt_handler_array>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 14 01 00 00          jmpq   11d <early_idt_handler_common>
  ...
    48:   6a 08                   pushq  $0x8
    4a:   e9 d1 00 00 00          jmpq   120 <early_idt_handler_common>
    4f:   cc                      int3
    50:   cc                      int3
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   eb 03                   jmp    120 <early_idt_handler_common>
   11d:   cc                      int3
   11e:   cc                      int3
   11f:   cc                      int3

BUG= chromium:738283 
BUG= chromium:711461 
TEST=caroline boots with new binutils deployed.
TEST=No issues seen in toolchain team rotating builders.

dianders NOTE: Picked from v3.14 linuxstable instead of mainline Linux
to avoid conflicts.

Change-Id: I6a4c70aad54fe92564db055f4df28d0ccb21a66d
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Binutils <binutils@sourceware.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 510458cb7da6205526445fcba0ea6fae7641698c)
Reviewed-on: https://chromium-review.googlesource.com/565355
Tested-by: Manoj Gupta <manojgupta@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/781bec55d5355d81d1d0792fec9a5b106d57115b/arch/x86/kernel/head_32.S
[modify] https://crrev.com/781bec55d5355d81d1d0792fec9a5b106d57115b/arch/x86/include/asm/segment.h
[modify] https://crrev.com/781bec55d5355d81d1d0792fec9a5b106d57115b/arch/x86/kernel/head64.c
[modify] https://crrev.com/781bec55d5355d81d1d0792fec9a5b106d57115b/arch/x86/kernel/head_64.S

Project Member

Comment 19 by bugdroid1@chromium.org, Jul 13 2017

Labels: merge-merged-chromeos-3.10
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/e838c0f9413bef9a550b70eead41c4e5f76fbfa3

commit e838c0f9413bef9a550b70eead41c4e5f76fbfa3
Author: Andy Lutomirski <luto@kernel.org>
Date: Thu Jul 13 00:55:53 2017

BACKPORT: x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers

commit 425be5679fd292a3c36cb1fe423086708a99f11a upstream.

The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart.  It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.

Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count.  Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code.  The new scheme should be much
clearer to future readers.

While we're at it, improve the comments and rename the array and
common code.

Binutils may start relaxing jumps to non-weak labels.  If so,
this change will fix our build, and we may need to backport this
change.

Before, on x86_64:

  0000000000000000 <early_idt_handlers>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 00 00 00 00          jmpq   9 <early_idt_handlers+0x9>
                          5: R_X86_64_PC32        early_idt_handler-0x4
  ...
    48:   66 90                   xchg   %ax,%ax
    4a:   6a 08                   pushq  $0x8
    4c:   e9 00 00 00 00          jmpq   51 <early_idt_handlers+0x51>
                          4d: R_X86_64_PC32       early_idt_handler-0x4
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   e9 00 00 00 00          jmpq   120 <early_idt_handler>
                          11c: R_X86_64_PC32      early_idt_handler-0x4

After:

  0000000000000000 <early_idt_handler_array>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 14 01 00 00          jmpq   11d <early_idt_handler_common>
  ...
    48:   6a 08                   pushq  $0x8
    4a:   e9 d1 00 00 00          jmpq   120 <early_idt_handler_common>
    4f:   cc                      int3
    50:   cc                      int3
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   eb 03                   jmp    120 <early_idt_handler_common>
   11d:   cc                      int3
   11e:   cc                      int3
   11f:   cc                      int3

BUG= chromium:738283 
BUG= chromium:711461 
TEST=caroline boots with new binutils deployed.
TEST=No issues seen in toolchain team rotating builders.

dianders NOTE: Picked from v3.10 linuxstable instead of mainline Linux
to avoid conflicts.

Change-Id: I6a4c70aad54fe92564db055f4df28d0ccb21a66d
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Binutils <binutils@sourceware.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 4800af91229e06e9d8517a6961f5b5d304b7e9bf)
Reviewed-on: https://chromium-review.googlesource.com/565908
Tested-by: Manoj Gupta <manojgupta@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/e838c0f9413bef9a550b70eead41c4e5f76fbfa3/arch/x86/kernel/head_32.S
[modify] https://crrev.com/e838c0f9413bef9a550b70eead41c4e5f76fbfa3/arch/x86/include/asm/segment.h
[modify] https://crrev.com/e838c0f9413bef9a550b70eead41c4e5f76fbfa3/arch/x86/kernel/head64.c
[modify] https://crrev.com/e838c0f9413bef9a550b70eead41c4e5f76fbfa3/arch/x86/kernel/head_64.S

Project Member

Comment 20 by bugdroid1@chromium.org, Jul 14 2017

Labels: merge-merged-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/92d776e4ce619c2b3a5100d4aaeb7d028f0a9046

commit 92d776e4ce619c2b3a5100d4aaeb7d028f0a9046
Author: Andy Lutomirski <luto@kernel.org>
Date: Fri Jul 14 02:46:05 2017

BACKPORT: x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers

[ Upstream commit 425be5679fd292a3c36cb1fe423086708a99f11a ]

The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart.  It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.

Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count.  Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code.  The new scheme should be much
clearer to future readers.

While we're at it, improve the comments and rename the array and
common code.

Binutils may start relaxing jumps to non-weak labels.  If so,
this change will fix our build, and we may need to backport this
change.

Before, on x86_64:

  0000000000000000 <early_idt_handlers>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 00 00 00 00          jmpq   9 <early_idt_handlers+0x9>
                          5: R_X86_64_PC32        early_idt_handler-0x4
  ...
    48:   66 90                   xchg   %ax,%ax
    4a:   6a 08                   pushq  $0x8
    4c:   e9 00 00 00 00          jmpq   51 <early_idt_handlers+0x51>
                          4d: R_X86_64_PC32       early_idt_handler-0x4
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   e9 00 00 00 00          jmpq   120 <early_idt_handler>
                          11c: R_X86_64_PC32      early_idt_handler-0x4

After:

  0000000000000000 <early_idt_handler_array>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 14 01 00 00          jmpq   11d <early_idt_handler_common>
  ...
    48:   6a 08                   pushq  $0x8
    4a:   e9 d1 00 00 00          jmpq   120 <early_idt_handler_common>
    4f:   cc                      int3
    50:   cc                      int3
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   eb 03                   jmp    120 <early_idt_handler_common>
   11d:   cc                      int3
   11e:   cc                      int3
   11f:   cc                      int3

BUG= chromium:738283 
BUG= chromium:711461 
TEST=caroline boots with new binutils deployed.
TEST=No issues seen in toolchain team rotating builders.

dianders NOTE: Picked from v3.18 linuxstable instead of mainline Linux
to avoid conflicts.

Change-Id: I6a4c70aad54fe92564db055f4df28d0ccb21a66d
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Binutils <binutils@sourceware.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit b28283637b3bd5dc31c0995602b86513ebbaeba7)
Reviewed-on: https://chromium-review.googlesource.com/565354
Commit-Ready: Manoj Gupta <manojgupta@chromium.org>
Tested-by: Manoj Gupta <manojgupta@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/92d776e4ce619c2b3a5100d4aaeb7d028f0a9046/arch/x86/kernel/head_32.S
[modify] https://crrev.com/92d776e4ce619c2b3a5100d4aaeb7d028f0a9046/arch/x86/include/asm/segment.h
[modify] https://crrev.com/92d776e4ce619c2b3a5100d4aaeb7d028f0a9046/arch/x86/kernel/head64.c
[modify] https://crrev.com/92d776e4ce619c2b3a5100d4aaeb7d028f0a9046/arch/x86/kernel/head_64.S

Status: Fixed (was: Untriaged)
Fixed then, I guess?  Did you ever confirm for via disassembly that kernel 3.8 is for sure not affected?
Have not confirmed yet.
We have decided to disable the optimization in the binutils-2.27 assembler that triggered this issue in the first place, so we know this won't bite us again.
Status: Verified (was: Fixed)
Closing. Please reopen it if its not fixed. Thanks!

Sign in to add a comment