New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 737659 link

Starred by 3 users

Issue metadata

Status: Verified
Owner:
Closed: Nov 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

clang: x86: double faults at boot with kernels >= v4.6

Project Member Reported by mka@chromium.org, Jun 28 2017

Issue description

clang built x86 kernels >= v4.6 crash reproducibly at boot time with a double fault:

[    3.151592] PANIC: double fault, error_code: 0x0
[    3.160258] CPU: 1 PID: 234 Comm: frecon Not tainted 4.12.0-rc6-00022-ga47cb87565a7 #63
[    3.169203] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
[    3.177961] task: ffff880075c2de00 task.stack: ffffc9000088c000
[    3.177969] RIP: 0010:drm_mode_setcrtc+0x336/0x532
[    3.177975] RSP: 0018:0000000000000000 EFLAGS: 00010206
[    3.177978] RAX: 00005622c671cd60 RBX: 0000000000000000 RCX: 0000000000000008
[    3.177979] RDX: 0000000000000001 RSI: ffffc9000088fcf0 RDI: ffffffff813869f3
[    3.177980] RBP: ffffc9000088fd70 R08: 00000000014000c0 R09: 0000000000000308
[    3.177982] R10: 0000000000000300 R11: 0000000000000556 R12: 0000000000000000
[    3.177983] R13: ffff880077a25000 R14: ffffc9000088fdc0 R15: 0000000000000000
[    3.177985] FS:  00007f0be70d2740(0000) GS:ffff88007ad00000(0000) knlGS:0000000000000000
[    3.177987] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.177988] CR2: fffffffffffffff8 CR3: 0000000076fa8000 CR4: 00000000001006e0
[    3.177989] Call Trace:
[    3.177992] Code: 24 18 48 8d 74 24 68 41 8b 56 08 39 d3 0f 83 8a 00 00 00 4c 63 fb 4a 83 24 f8 00 4a 8d 04 bd 00 00 00 00 49 03 06 48 8b 64 24 28 <e8> 15 a5 f0 ff 49 89 d4 48 89 64 24 28 85 c0 0f 85 a3 00 00 00 
[    3.178033] Kernel panic - not syncing: Machine halted.
[    3.178036] CPU: 1 PID: 234 Comm: frecon Not tainted 4.12.0-rc6-00022-ga47cb87565a7 #63
[    3.178037] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
[    3.178038] Call Trace:
[    3.178039]  <#DF>
[    3.178045]  __dump_stack+0x1d/0x23
[    3.178050]  ? df_debug+0x31/0x31
[    3.178053]  ? do_double_fault+0xe0/0x103
[    3.178055]  </#DF>
[    3.184590] Kernel Offset: disabled
[    3.329644] ACPI MEMORY or I/O RESET_REG.

The following upstream patch causes/exposes the problem:

commit f05058c4d652b619adfda6c78d8f5b341169c264
Author: Chris J Arges <chris.j.arges@canonical.com>
Date:   Thu Jan 21 16:49:25 2016 -0600

    x86/uaccess: Add stack frame output operand in get_user() inline asm
    
    Numerous 'call without frame pointer save/setup' warnings are introduced
    by stacktool because of functions using the get_user() macro. Bad stack
    traces could occur due to lack of or misplacement of stack frame setup
    code.
    
    This patch forces a stack frame to be created before the inline asm code
    if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an
    output operand for the get_user() inline assembly statement.

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index a4a30e4b2d34..9bbb3b2d0372 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -179,10 +179,11 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
 ({                                                                     \
        int __ret_gu;                                                   \
        register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);            \
+       register void *__sp asm(_ASM_SP);                               \
        __chk_user_ptr(ptr);                                            \
        might_fault();                                                  \
-       asm volatile("call __get_user_%P3"                              \
-                    : "=a" (__ret_gu), "=r" (__val_gu)                 \
+       asm volatile("call __get_user_%P4"                              \
+                    : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp)    \
                     : "0" (ptr), "i" (sizeof(*(ptr))));                \


With this patch reverted the system boots, however a similar crash is observed at reboot. Probably this is due to similar patches in other areas of the kernel.

The double fault occurs when drm_mode_setcrtc() calls get_user():

http://elixir.free-electrons.com/linux/v4.12-rc6/source/drivers/gpu/drm/drm_crtc.c#L678


The code generated by clang without the above patch is:

                        if (get_user(out_id, &set_connectors_ptr[i])) {
ffffffff81386955:       4a 8d 04 bd 00 00 00    lea    0x0(,%r15,4),%rax
ffffffff8138695c:       00 
ffffffff8138695d:       49 03 06                add    (%r14),%rax
ffffffff81386960:       e8 2b a5 f0 ff          callq  ffffffff81290e90 <__get_user_4>


And with the patch:

                        if (get_user(out_id, &set_connectors_ptr[i])) {
ffffffff81386a56:       4a 8d 04 bd 00 00 00    lea    0x0(,%r15,4),%rax
ffffffff81386a5d:       00 
ffffffff81386a5e:       49 03 06                add    (%r14),%rax
ffffffff81386a61:       48 8b 64 24 28          mov    0x28(%rsp),%rsp
ffffffff81386a66:       e8 15 a5 f0 ff          callq  ffffffff81290f80 <__get_user_4>


With gcc the effect of the patch is not apparent. The only difference in drm_mode_setcrtc() are different addresses for other calls (see attached dumps)
 
gcc_no_patch.dump
18.7 KB Download
gcc_with_patch.dump
18.7 KB Download

Comment 1 by mka@chromium.org, Jun 28 2017

md@ commented that he heard that the patch is bogus and doesn't work as intended even with gcc, and that it can probably be reverted.

This may be correct, however the crash on reboot without the patch indicates there is still something wrong stackwise. Interestingly the same function drm_mode_setcrtc() is involved in the crash:

[   19.715638] WARN_ON(pipe != PIPE_A && pipe != PIPE_B)
[   19.715681] ------------[ cut here ]------------
[   19.726670] WARNING: CPU: 0 PID: 1209 at /mnt/host/source/src/third_party/kernel/v4.4/drivers/gpu/drm/i915/intel_panel.c:771 vlv_disable_backlight+0x8e/0x92
[   19.742346] Modules linked in: uinput i2c_dev rtc_cmos bluetooth snd_hda_intel ecdh_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer fuse cfg80211 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables r8152 mii
[   19.765255] CPU: 0 PID: 1209 Comm: DrmThread Not tainted 4.12.0-rc6-00021-g7ce09f1ed647 #77
[   19.774622] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
[   19.783399] task: ffff880074d069c0 task.stack: ffffc9000057c000
[   19.790056] RIP: 0010:vlv_disable_backlight+0x8e/0x92
[   19.795710] RSP: 0018:ffffc9000057f7cd EFLAGS: 00010296
[   19.801574] RAX: 0000000000000029 RBX: 00000000ffffffff RCX: 0000000000000001
[   19.809560] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffffffff81c42150
[   19.817559] RBP: ffffc9000057f7e7 R08: 0000000000000029 R09: ffffffffffffffff
[   19.825567] R10: ffffffff819d2050 R11: 000000fd00000000 R12: ffff880077a88000
[   19.833574] R13: ffff880077ab9000 R14: ffff880077a88000 R15: ffff880077a85800
[   19.841559] FS:  00007f6abe5f0700(0000) GS:ffff88007ac00000(0000) knlGS:0000000000000000
[   19.850642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   19.857071] CR2: 0000254947e20018 CR3: 0000000072640000 CR4: 00000000001006f0
[   19.865074] Call Trace:
[   19.867832] BUG: stack guard page was hit at ffffc90000580000 (stack is ffffc9000057c000..ffffc9000057ffff)
[   19.878725] kernel stack overflow (page fault): 0000 [#1] SMP
[   19.885146] Modules linked in: uinput i2c_dev rtc_cmos bluetooth snd_hda_intel ecdh_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer fuse cfg80211 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables r8152 mii
[   19.908007] CPU: 0 PID: 1209 Comm: DrmThread Not tainted 4.12.0-rc6-00021-g7ce09f1ed647 #77
[   19.917340] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
[   19.926091] task: ffff880074d069c0 task.stack: ffffc9000057c000
[   19.932709] RIP: 0010:show_trace_log_lvl+0x113/0x244
[   19.938254] RSP: 0018:ffffc9000057f430 EFLAGS: 00010283
[   19.944091] RAX: 0000000000000000 RBX: ffffc9000057fffd RCX: ffffffff81d7ce00
[   19.952056] RDX: ffffffff81cf0120 RSI: 0000000000000002 RDI: 000000002b00007f
[   19.960021] RBP: ffffc9000057f507 R08: 0000000200000000 R09: 00000000ffff0a30
[   19.967993] R10: 00000000ffff0a00 R11: 0000000000000000 R12: 000000002b00007f
[   19.975965] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880074d069c0
[   19.983939] FS:  00007f6abe5f0700(0000) GS:ffff88007ac00000(0000) knlGS:0000000000000000
[   19.992981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   19.999391] CR2: ffffc90000580000 CR3: 0000000072640000 CR4: 00000000001006f0
[   20.007363] Call Trace:
[   20.010097]  ? vlv_disable_backlight+0x8e/0x92
[   20.015060]  ? __warn+0xb5/0xf3
[   20.018567]  ? do_trap+0x12b/0x183
[   20.022363]  ? do_error_trap+0xb8/0xc7
[   20.026549]  ? vlv_disable_backlight+0x8e/0x92
[   20.031514]  ? invalid_op+0x18/0x20
[   20.035409]  ? vlv_disable_backlight+0x8e/0x92
[   20.040372]  ? intel_edp_backlight_off+0x42/0x45
[   20.045529]  ? intel_disable_dp+0x67/0x96
[   20.050007]  ? intel_encoders_disable+0x63/0x7c
[   20.055067]  ? i9xx_crtc_disable+0x56/0x338
[   20.059738]  ? intel_atomic_commit_tail+0x10a/0x111d
[   20.065285]  ? update_cfs_rq_load_avg+0x225/0x23a
[   20.070539]  ? update_load_avg+0x1a4/0x31e
[   20.075113]  ? pick_next_task_fair+0x360/0x381
[   20.080077]  ? finish_task_switch+0xfd/0x18d
[   20.084846]  ? __schedule+0x3e7/0x484
[   20.088935]  ? intel_atomic_commit+0x3ff/0x404
[   20.093899]  ? drm_mode_setcrtc+0x3d2/0x51b
[   20.098572]  ? avc_has_extended_perms+0x49/0x318
[   20.103728]  ? drm_ioctl+0x28b/0x31f
[   20.107721]  ? drm_crtc_check_viewport+0x8d/0x8d
[   20.112879]  ? do_vfs_ioctl+0x3a7/0x56f
[   20.117163]  ? do_syscall_64+0x59/0x68
[   20.121350]  ? entry_SYSCALL64_slow_path+0x25/0x25
[   20.126700] Code: 92 f9 ff ff 49 89 c4 4d 85 e4 74 15 48 c7 c7 bf b9 95 81 31 c0 48 8b 34 24 4c 89 e2 e8 1e 7c 06 00 4c 89 64 24 10 e9 e5 00 00 00 <4c> 8b 23 83 7c 24 50 00 74 13 4c 8b ac 24 98 00 00 00 4d 85 ed 
[   20.147848] RIP: show_trace_log_lvl+0x113/0x244 RSP: ffffc9000057f430
[   20.155045] ---[ end trace 1a54997dba9c57b5 ]---
[   20.160202] Kernel panic - not syncing: Fatal exception
[   20.166046] Kernel Offset: disabled
[   20.169943] ACPI MEMORY or I/O RESET_REG.

The warning and the corresponding stack trace is also generated with gcc. Actually it seems the stack trace causes the stack guard page to be hit. For debugging I added a few dump_stack() calls to the function, a clang kernel crashes when it hits the first call at the entry of the function:

[    3.218386] MKA: drm_mode_setcrtc entry
[    3.226251] CPU: 0 PID: 247 Comm: frecon Not tainted 4.12.0-rc6-00021-g7ce09f1ed647-dirty #82
[    3.235780] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
[    3.244532] Call Trace:
[    3.247268]  __dump_stack+0x1d/0x23
[    3.251168]  ? drm_atomic_get_property+0x229/0x351
[    3.256523]  ? vfs_ioctl+0x1c/0x35
[    3.260325] BUG: stack guard page was hit at ffffc900009a8000 (stack is ffffc900009a4000..ffffc900009a7fff)
[    3.269454] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,data=ordered
[    3.278895] kernel stack overflow (page fault): 0000 [#1] SMP
[    3.285316] Modules linked in:
[    3.288726] CPU: 0 PID: 247 Comm: frecon Not tainted 4.12.0-rc6-00021-g7ce09f1ed647-dirty #82
[    3.298254] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
[    3.307004] task: ffff8800782f69c0 task.stack: ffffc900009a4000
[    3.313621] RIP: 0010:show_trace_log_lvl+0x113/0x244
[    3.319165] RSP: 0018:ffffc900009a7b50 EFLAGS: 00010083
[    3.325001] RAX: 0000000000000000 RBX: ffffc900009a7ffe RCX: ffffffff81d7ce00
[    3.332974] RDX: ffffffff81cf0120 RSI: 0000000000000086 RDI: 00000000002b0000
[    3.340948] RBP: ffffc900009a7c24 R08: 0000000200000000 R09: 00000000ffff0a30
[    3.348921] R10: 00000000ffff0a00 R11: 0000000000000000 R12: 00000000002b0000
[    3.356893] R13: ffffc900009a7c50 R14: 0000000000000000 R15: ffff8800782f69c0
[    3.364867] FS:  00007f47bf5ff740(0000) GS:ffff88007ac00000(0000) knlGS:0000000000000000
[    3.373900] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.380318] CR2: ffffc900009a8000 CR3: 0000000076f86000 CR4: 00000000001006f0
[    3.388290] Call Trace:
[    3.391023]  ? dump_stack+0x4a/0x68
[    3.394917]  ? dump_stack+0x4a/0x68
[    3.398812]  ? drm_mode_setcrtc+0x3d/0x5b8
[    3.403386]  ? __radix_tree_lookup+0x48/0xaf
[    3.408155]  ? ww_mutex_lock+0x1b/0x87
[    3.412342]  ? avc_has_extended_perms+0x49/0x318
[    3.417498]  ? drm_ioctl+0x28b/0x31f
[    3.421490]  ? drm_crtc_check_viewport+0x8d/0x8d
[    3.426649]  ? do_vfs_ioctl+0x3a7/0x56f
[    3.430931]  ? entry_SYSCALL_64_fastpath+0x13/0x94
[    3.436280] Code: 92 f9 ff ff 49 89 c4 4d 85 e4 74 15 48 c7 c7 ff b9 95 81 31 c0 48 8b 34 24 4c 89 e2 e8 1e 7c 06 00 4c 89 64 24 10 e9 e5 00 00 00 <4c> 8b 23 83 7c 24 50 00 74 13 4c 8b ac 24 98 00 00 00 4d 85 ed 
[    3.457429] RIP: show_trace_log_lvl+0x113/0x244 RSP: ffffc900009a7b50
[    3.464627] ---[ end trace e15374cc90b25a5f ]---
[    3.469783] Kernel panic - not syncing: Fatal exception
[    3.475626] Kernel Offset: disabled
[    3.479521] ACPI MEMORY or I/O RESET_REG.
Labels: OS-Chrome
Hi Matthias,
Can you provide me instructions to build with gcc and clang so that I can look at drm_mode_setcrtc() 

Comment 3 by mka@chromium.org, Jun 29 2017

To build a kernel with the initial problem:

cros_workon-squawks start chromeos-kernel-4_4

cd src/third_party/kernel/v4.4
git fetch cros refs/sandbox/mka/dbg/b737659
git checkout -b dbg_b737659 FETCH_HEAD

USE="-tpm clang pcserial tty_console_ttyS2" FEATURES="noclean" cros_workon_make --board=squawks --install chromeos-kernel-4_4

I guess one of the first questions to clarify is whether the offending patch does actually anything useful with gcc.


For the crash on reboot I found that it is related with another patch (1959a60182f4 "x86/dumpstack: Pin the target stack when dumping it"). I suggest to focus in this bug on the initial issue and open a separate bug for the other one.


Hi Matthias, I assume this is not critical since it is P3. Can you wait for a couple of  weeks or do you need someone to take a look soon?

Comment 5 by mka@chromium.org, Jun 30 2017

Labels: -Pri-3 Pri-2
It is definitely not critical, since it doesn't affect any kernel version we currently use. I would like to make progress, but it can wait for a couple of weeks. I changed the priority to P2 to reflect that it has a certain importance even though it is not critical.

Comment 6 by mka@chromium.org, Jun 30 2017

Created  Issue 738553  for the crash on reboot/backtrace.

Comment 7 by mka@chromium.org, Jul 12 2017

I submitted a patch upstream proposing to revert f05058c4d652 ("x86/uaccess: Add stack frame output operand in get_user() inline asm"), one of the authors replied:

  The original commit probably should have clarified:

   " ... forces a stack frame *if it doesn't already exist*."

  In *most* cases it will have no effect, as you saw, because users of
  get_user() tend to do other function calls beforehand, so they will have
  already saved the frame pointer before calling it.

  However, that isn't always the case.  We found that certain configs
  change GCC's behavior such that, for certain get_user() call sites, the
  containing function doesn't saved the frame pointer before inserting
  get_user()'s inline asm.

  GCC completely ignores inline asm, so it has no idea that it has a call
  instruction in it.  So in general, *any* inline asm with a call
  instruction needs this constraint, to force the frame pointer to be
  saved, if it hasn't already.

  This is admittedly an awkward way of achieving this goal, but it's the
  only way I know how to do it with GCC.

Comment 8 by mka@chromium.org, Jul 13 2017

I shared disassemblies and a crash log with Josh Poimboeuf, his analysis is:

  Here's the reason for the double fault.  First it puts zero on the stack
  at offset -0x58:

  > ffffffff81367616:     31 c0                   xor    %eax,%eax
  > ffffffff81367618:     48 89 45 c8             mov    %rax,-0x38(%rbp)
  > ffffffff8136761c:     45 31 ff                xor    %r15d,%r15d
  > ffffffff8136761f:     48 89 45 a8             mov    %rax,-0x58(%rbp)

  Then, later, it copies that zeroed word from the stack to RSP:

  > ffffffff81367874:     48 8b 65 a8             mov    -0x58(%rbp),%rsp

  Then it double faults because the call instruction tries to write RIP on
  the stack, but RSP is zero:

  > ffffffff81367878:     e8 73 26 f1 ff          callq  ffffffff81279ef0 <__get_user_4>

  Then clang tries to put RSP's value on the stack, at the same stack slot
  where the original zero was stored (though it never reaches this point):

  > ffffffff8136787d:     49 89 d4                mov    %rdx,%r12
  > ffffffff81367880:     48 89 65 a8             mov    %rsp,-0x58(%rbp)

  The panic is consistent with the above.  RIP points to the call
  instruction, RSP is zero:

  > [    3.798722] PANIC: double fault, error_code: 0x0
  > [    3.807387] CPU: 1 PID: 605 Comm: frecon Not tainted 4.12.0-00023-g711d82c128ff #107
  > [    3.816040] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
  > [    3.824792] task: ffff880075b92f00 task.stack: ffffc90000d6c000
  > [    3.829599] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=600,data=ordered
  > [    3.839092] RIP: 0010:drm_mode_setcrtc+0x328/0x51f
  > [    3.844443] RSP: 0018:0000000000000000 EFLAGS: 00010206
  > [    3.850280] RAX: 0000559e707c4d60 RBX: 0000000000000000 RCX: 0000000000000008
  > [    3.858253] RDX: 0000000000000001 RSI: ffffc90000d6fcc8 RDI: ffffffff81367805
  > [    3.866225] RBP: ffffc90000d6fd90 R08: 00000000014000c0 R09: 0000000000000308
  > [    3.874199] R10: 0000000000000300 R11: 0000000000000556 R12: 0000000000000000
  > [    3.882163] R13: ffff880077a25000 R14: ffffc90000d6fdd0 R15: 0000000000000000
  > [    3.890136] FS:  00007fb2dbd62740(0000) GS:ffff88007ad00000(0000) knlGS:0000000000000000
  > [    3.899177] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  > [    3.905595] CR2: fffffffffffffff8 CR3: 000000007596b000 CR4: 00000000001006e0
  > [    3.913568] Call Trace:
  > [    3.916296] Code: b8 48 8d b5 38 ff ff ff 41 8b 56 08 39 d3 0f 83 85 00 00 00 4c 63 fb 4a 83 24 f8 00 4a 8d 04 bd 00 00 00 00 49 03 06 48 8b 65 a8 <e8> 73 26 f1 ff 49 89 d4 48 89 65 a8 85 c0 0f 85 a0 00 00 00 ba
  > [    3.937440] Kernel panic - not syncing: Machine halted.
  > [    3.943268] CPU: 1 PID: 605 Comm: frecon Not tainted 4.12.0-00023-g711d82c128ff #107
  > [    3.951921] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
  > [    3.960671] Call Trace:
  > [    3.963398]  <#DF>
  > [    3.965637]  __dump_stack+0x19/0x1b
  > [    3.969531]  dump_stack+0x42/0x60

  clang is obviously getting confused by the RSP output constraint.  I
  think it tries to take the constraint literally, since it takes RSP as
  an output from the inline asm and stores it on the stack.  However, that
  behavior doesn't really make sense for a "register" variable.  It also
  doesn't explain why it's zeroing the register out first.


Link with the discussion: https://patchwork.kernel.org/patch/9837437/ (currently down)
with_f05058c4d652.dis
36.2 KB Download
without_f05058c4d652.dis
36.0 KB Download
kernel_crash.log
9.6 KB View Download

Comment 9 by mka@chromium.org, Jul 13 2017

Cc: manojgupta@chromium.org
Owner: mka@chromium.org
Status: Assigned (was: Untriaged)
From the discussion:

  I think there are two separate issues here.

  1) The first issue is whether it's supported behavior to specify RSP as
     an output constraint in order to force GCC to create a stack frame.
     As far as I know, this is a quirk of GCC, and not really considered
     defined behavior.

     However, the idea was suggested by some GCC developers:

       https://gcc.gnu.org/ml/gcc/2015-07/msg00079.html

     So at least it seems to be endorsed by GCC to some degree.  If you
     need details on why it works, that thread has the details.

  2) The second issue is whether clang should corrupt RSP.  I don't see a
     reason for clang to do that.  IMO, when using a local register
     variable as an input or output to inline asm, the compiler should
     leave the contents of the register alone.

     FWIW, my reading of the GCC manual seems to support that:

       https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables

Comment 10 by mka@chromium.org, Jul 13 2017

An upstream solution is underway, it will probably look similar to this:

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 11433f9..21f0c39 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -166,12 +166,12 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
 ({                                                                     \
        int __ret_gu;                                                   \
        register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);            \
-       register void *__sp asm(_ASM_SP);                               \
        __chk_user_ptr(ptr);                                            \
        might_fault();                                                  \
-       asm volatile("call __get_user_%P4"                              \
-                    : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp)    \
-                    : "0" (ptr), "i" (sizeof(*(ptr))));                \
+       asm volatile("call __get_user_%P3"                              \
+                    : "=a" (__ret_gu), "=r" (__val_gu)                 \
+                    : "0" (ptr), "i" (sizeof(*(ptr)))                  \
+                    : "sp");                                           \

Comment 11 by mka@chromium.org, Jul 19 2017

Owner: manojgupta@chromium.org
It seems there will be no upstream solution after all:

  After doing some testing, I don't think this approach is going to work
  after all.  In addition to forcing the stack frame, it also causes GCC
  to add an unnecessary extra instruction to the epilogue of each affected
  function:

    lea    -0x10(%rbp),%rsp

  We shouldn't be inserting extra instructions like that.  I also don't
  think the other suggestion of turning the '__sp' register variable into
  a global variable is a very good solution either, as that just wastes
  memory for no reason.

  It would be nice if both compilers could agree on a way to support this.


Manoj, can you discuss this with LLVM upstream?
Sure, Can you provide a short summary of the problem?

Comment 13 by mka@chromium.org, Jul 19 2017

The following upstream kernel commit intends to forces a stack frame before inline assembly code if it doesn't already exist:


commit f05058c4d652b619adfda6c78d8f5b341169c264
Author: Chris J Arges <chris.j.arges@canonical.com>
Date:   Thu Jan 21 16:49:25 2016 -0600

    x86/uaccess: Add stack frame output operand in get_user() inline asm
    
    Numerous 'call without frame pointer save/setup' warnings are introduced
    by stacktool because of functions using the get_user() macro. Bad stack
    traces could occur due to lack of or misplacement of stack frame setup
    code.
    
    This patch forces a stack frame to be created before the inline asm code
    if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an
    output operand for the get_user() inline assembly statement.

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index a4a30e4b2d34..9bbb3b2d0372 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -179,10 +179,11 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
 ({                                                                     \
        int __ret_gu;                                                   \
        register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);            \
+       register void *__sp asm(_ASM_SP);                               \
        __chk_user_ptr(ptr);                                            \
        might_fault();                                                  \
-       asm volatile("call __get_user_%P3"                              \
-                    : "=a" (__ret_gu), "=r" (__val_gu)                 \
+       asm volatile("call __get_user_%P4"                              \
+                    : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp)    \
                     : "0" (ptr), "i" (sizeof(*(ptr))));                \


This works properly with gcc, however in kernels built with clang RSP gets corrupted (more details in #8), apparently clang gets confused by the use of RSP in an outpout constraint. It seems worth to clarify the points brought up in #9.

Comment 14 by mka@chromium.org, Jul 24 2017

From the upstream discussion:

  It seems clang treats local named register almost the same as ordinary
  local variables.
  The only difference is that before reading the register variable clang
  puts variable's value into the specified register.

  So clang just assigns stack slot for the variable __sp where it's
  going to keep variable's value.
  But since __sp is unitialized (we haven't assign anything to it), the
  value of the __sp is some garbage from stack.
  inline asm specifies __sp as input, so clang assumes that it have to
  load __sp into 'rsp' because inline asm is going to use
  it. And it just loads garbage from stack into 'rsp'

  In fact, such behavior (I mean storing the value on stack and loading
  into reg before the use) is very useful.
  Clang's behavior allows to keep the value assigned to the
  call-clobbered register across the function calls.

  Unlike clang, gcc assigns value to the register right away and doesn't
  store the value anywhere else. So if the reg is
  call clobbered register you have to be absolutely sure that there is
  no subsequent function call that might clobber the register.

  E.g. see some real examples
  https://patchwork.kernel.org/patch/4111971/ or 98d4ded60bda("msm: scm:
  Fix improper register assignment").
  These bugs shouldn't happen with clang.

  But the global named register works slightly differently in clang. For
  the global, the value is just the value of the register itself,
  whatever it is. Read/write from global named register is just like
  direct read/write to the register
Upstream bug comment from Eli Friedman 2017-07-24 12:33:47 PDT

Consider the following:

void a() {
  register int a asm("sp") = 0;
  asm volatile("nop":"+r"(a));
}

In this case, both gcc and clang zero out "sp".

If you don't initialize the variable, you're basically asking the compiler to put uninitialized data into rsp.  If you're lucky, the compiler realizes that putting uninitialized data into rsp is a no-op, and therefore does nothing... but if you're unlucky, the compiler shoves some other unrelated value into rsp, and it explodes (which is what is happening here).

I think the right approach here is to propose some well-defined mechanism for getting the result you want... and then maybe add a hack to clang to map this particular construct to the same mechanism.
Discussed with wmi@ but not there is no clear solution to this. The patch is a gcc hack and just happens to work.
Will be sending to llvm dev list if someone can suggest an alternative (but honestly don't really expect anything since the underlying issue is a gcc hack). 

Comment 17 by mka@chromium.org, Jul 26 2017

Thanks, if no compatible solution can be found it might be possible to convince upstream to use #ifdef to have different code for gcc and clang.

There was also discussion on a possible solution in the upstream thread, but as of now it doesn't appear to be convincing.
Cc: glider@chromium.org
mka@ Can you try the following as suggested in https://bugs.llvm.org/show_bug.cgi?id=33913#c3 by glider@

+
+register unsigned long int __sp asm(_ASM_SP);
+
 #define get_user(x, ptr)						\
 ({									\
 	int __ret_gu;							\
 	register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX);		\
-	register void *__sp asm(_ASM_SP);				\
 	__chk_user_ptr(ptr);						\
 	might_fault();							\
 	asm volatile("call __get_user_%P4"				\
Cc: dvyukov@chromium.org
I'm going to send the aforementioned patch upstream, but according to https://github.com/torvalds/linux/commit/dff38e3e93bbc10653a232f68077e5d031624464 "LTO in gcc 4.6/47. has trouble with global register variables."
This may or may not be a showstopper, we'll see.

Comment 20 by mka@chromium.org, Aug 17 2017

A patch like the one from #18 has been suggested earlier in the upstream discussion (https://lkml.org/lkml/2017/7/13/709), but wasn't found convincing enough to pursue. Josh Poimboeuf is currently working on a possible solution: https://lkml.org/lkml/2017/7/28/775
Thanks for pointing this out!
We'll just sit and wait for the news from him then.

Comment 22 by mka@chromium.org, Aug 31 2017

Josh Poimboeuf posted a first version of his patch set:

https://lkml.org/lkml/2017/8/31/513

It probably won't go through as is, let's see how it unfolds.

One point raised be Linus Torvalds:

>  (a) approach clang about their obvious bug (a compiler that clobbers
> %rsp because we mark it as in/out is clearly buggy)

manojgupta@, can you bring this up with clang?

Will do!
mka@ Do you know the status of this? I believe Josh was making a change to make things work with both clang/gcc.
I believe this has been landed a while ago.

Comment 26 by mka@chromium.org, Nov 21 2017

Status: Verified (was: Assigned)
Sorry, I missed this. Yes, this has been fixed upstream.

Sign in to add a comment