New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 729743 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocking:
issue 702741



Sign in to add a comment

clang: Stack corruption in cts_cbc_decrypt () on x86

Project Member Reported by mka@chromium.org, Jun 5 2017

Issue description

Kernel crashes have been observed while running audio tests on squawks to sanity check clang kernel builds:

[   76.080624] general protection fault: 0000 [#1] PREEMPT SMP 
[   76.090008] gsmi: Log Shutdown Reason 0x03
[   76.094589] Modules linked in: uinput snd_soc_sst_baytrail_pcm snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_sst_byt_max98090_mach snd_soc_max98090 snd_intel_sst_acpi snd_intel_sst_core snd_soc_sst_mfld_platform snd_hda_codec_hdmi snd_soc_sst_acpi snd_soc_sst_match uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat zram bluetooth xt_mark bridge stp llc fuse ip6table_filter iwlmvm iwl7000_mac80211 r8152 mii iwlwifi cfg80211 joydev
[   76.150758] CPU: 0 PID: 4408 Comm: chrome Not tainted 4.4.64 #13
[   76.157478] Hardware name: GOOGLE Squawks, BIOS Google_Squawks.5216.152.76 03/04/2016
[   76.166242] task: ffff880054da0000 ti: ffff880071bc0000 task.ti: ffff880071bc0000
[   76.174616] RIP: 0010:[<ffffffff8133b3d3>]  [<ffffffff8133b3d3>] crypto_cbc_decrypt+0x223/0x230
[   76.184363] RSP: 0018:ffff880071bc3990  EFLAGS: 00010296
[   76.190304] RAX: 0000000000000000 RBX: 03ffffffffffffc0 RCX: ffffffff81f8c700
[   76.198289] RDX: ffffffff81ef41e0 RSI: 0000000000000096 RDI: 00007fff30fee520
[   76.206273] RBP: ffff880071bc3ac8 R08: ffffffff820e5b10 R09: 0000000000000000
[   76.214258] R10: 000000000000000a R11: 0000000000000000 R12: 00000000000009c8
[   76.222243] R13: ffff880071bc39d8 R14: 0000000000000010 R15: ffff880071bc39f8
[   76.230229] FS:  000078557d777780(0000) GS:ffff88007ac00000(0000) knlGS:0000000000000000
[   76.239283] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   76.245710] CR2: 00001acf04864000 CR3: 00000000759ea000 CR4: 00000000001006f0
[   76.253694] Stack:
[   76.255938]  5362f7f3029a2bd1 274d9d2ae0c435c4 ffff00ff81059fc0 c2951cd40eecda4c
[   76.264233]  ffff880000001994 94a5aba0ac90920e d389dcda441a53eb a5dc8a924a717072
[   76.272532]  000000000000122c a5dc8a924a717072 000000000000122c c2951cd40eecda4c
[   76.280828]  ffff880000001994 ffff880073d72800 ffff880071bc3b68 ffff880071bc39b8
[   76.289123]  ffff880071bc3b38 00000010930d299f ffff880071bc3998 ffff880000000020
[   76.297422]  0000000000000010 0000001071bc3b88 ffff88005f881a80 ffff880071bc39d8
[   76.305719]  ffff880000000600 ffffea0001c6f0c2 0000001000000990 0000000000000000
[   76.314005]  0000000000000000 ffffea0001c6f0c2 00000010000009c8 0000000000000000
[   76.322301]  0000000000000000 00000000930d299f ffff88005fa8e2f0 0000000000000020
[   76.330599]  ffff880071bc3b38 0000000000000010 ffff880071bc3b68 ffff880071bc3b28
[   76.338897]  ffffffff8133b8f1 ffff880071bc3b88 ffff88005f881a80 ffff880071bc3ba8
[   76.347195]  0000000000000600 00000000930d299f ffff880071bc3b68 ffff88005845d800
[   76.355495]  ffff880071bc3b88 ffff880071bc3c48 ffff880071bc3c58 ffff880071bc3b58
[   76.363781]  ffffffff813326ca ffff88005fa8e2a0 ffff880071bc3ba8 ffff880000000600
[   76.372076]  00000000930d299f ffff880071bc3c28 ffffffff812c5708 ffffea00017ea382
[   76.380370]  0000004000000c60 0000000000000000 0000000000000000 ffffea0001b5f342
[   76.388665] Call Trace:
[   76.391403]  [<ffffffff8133b8f1>] crypto_cts_decrypt+0xd1/0x100
[   76.398030]  [<ffffffff813326ca>] async_decrypt+0x4a/0x70
[   76.404073]  [<ffffffff812c5708>] _ext4_fname_disk_to_usr+0x1a8/0x390
[   76.411283]  [<ffffffff8132821c>] ? chromiumos_get_symlink_traversal_policy+0x15c/0x180
[   76.420244]  [<ffffffff812993de>] ext4_encrypted_follow_link+0x11e/0x190
[   76.427747]  [<ffffffff811fae70>] trailing_symlink+0x140/0x310
[   76.434273]  [<ffffffff811f5636>] path_openat+0xbf6/0x1490
[   76.440413]  [<ffffffff811bd7e7>] ? do_wp_page+0x577/0x650
[   76.446551]  [<ffffffff811f4996>] do_filp_open+0x86/0x130
[   76.452593]  [<ffffffff8195d7ef>] ? _raw_spin_unlock+0x1f/0x60
[   76.459120]  [<ffffffff81208fc9>] ? __alloc_fd+0x139/0x150
[   76.465259]  [<ffffffff811e6697>] do_sys_open+0x157/0x2a0
[   76.471299]  [<ffffffff811e6803>] SyS_open+0x23/0x30
[   76.476853]  [<ffffffff8195e061>] entry_SYSCALL_64_fastpath+0x1c/0x74
[   76.484061] Code: c0 cc ff e8 c0 ab 03 00 65 48 8b 04 25 28 00 00 00 48 3b 45 d0 75 15 44 89 e8 48 81 c4 b8 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d <c3> e8 97 55 d4 ff 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 
[   76.505560] RIP  [<ffffffff8133b3d3>] crypto_cbc_decrypt+0x223/0x230
[   76.512676]  RSP <ffff880071bc3990>
[   76.519859] ---[ end trace ee099423df31c359 ]---
[   76.540514] Kernel panic - not syncing: Fatal exception
[   76.546392] Kernel Offset: disabled
[   76.550431] gsmi: Log Shutdown Reason 0x02
[   76.561837] ACPI MEMORY or I/O RESET_REG.


The exception occurs when returning from crypto_cbc_encrypt():

static int crypto_cbc_decrypt(struct blkcipher_desc *desc,
                              struct scatterlist *dst, struct scatterlist *src,
                              unsigned int nbytes)
{
...
ffffffff8133b3d3:       c3                      retq 


The value at the top of the stack (0x5362f7f3029a2bd1) is not a valid kernel code address. When crypto_cts_decrypt() is entered the location on the stack contains a valid return address, however it is overwritten during the execution of aes_decrypt() (see attached log).

The issue occurs with the image trybot-squawks-paladin/R60-9592.0.0-b2921 , which was built through 'cbuildbot --remote -g 513163 -g 513142 -g 513462 squawks-paladin'.

With this build it is easily reproducible:

cd /usr/local/autotest/cros/; python -c "import common; from autotest_lib.client.common_lib.cros import chrome; c = chrome.Chrome()"

The code path is executed multiple times (also without re-launching Chrome) and not every execution leads to corruption, which makes debugging tedious. I will try to isolate the issue further and ideally reproduce it without Chrome.

 
stack_corruption.log
37.7 KB View Download

Comment 1 by mka@chromium.org, Jun 5 2017

Description: Show this description

Comment 2 by mka@chromium.org, Jun 6 2017

The crash can also be reproduced by logging in with a user account.

The file corresponding to the SyS_open() call is /home/chronos/user/log/chrome , it does not exist without logging in or restarting Chrome through Python.

No success so far with triggering the issue without involving Chrome, like opening/creating the file for read/write.

Comment 3 by mka@chromium.org, Jun 7 2017

Summary: clang: Stack corruption in crypto_cts_decrypt() on x86 (was: clang: Stack corruption in aes_decrypt() on x86)
The trouble begins in crypto_cbc_encrypt() (https://chromium.googlesource.com/chromiumos/third_party/kernel/+/release-R60-9592.B-chromeos-4.4/crypto/cts.c#166)

The array d is located on the stack right next to the return address. Using the addresses of the trace in the description d is located at 0xffff880071bc3998 and the return address at 0xffff880071bc3990.

The scatter gather code (sg_*() functions) uses the macro offset_in_page() to determine the offset of an address within a page:

#define offset_in_page(p)	((unsigned long)(p) & ~PAGE_MASK)

With 4k pages PAGE_MASK is 0xfffffffffffff000, and ~PAGE_MASK 0xfff.

For some reason clang applies a mask of 0xff0 in offset_in_page() (at least in the context of crypto_cbc_encrypt()), which clears the 4 least significant bits and initializes sgdst with the stack location of the return address, instead of the address of d. As a result the return address is overwritten with the output of the decryption.

Assembly generated by clang:

sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
ffffffff8133bed4:       48 c1 ee 06             shr    $0x6,%rsi
ffffffff8133bed8:       48 21 de                and    %rbx,%rsi
ffffffff8133bedb:       4c 01 ce                add    %r9,%rsi
ffffffff8133bede:       44 89 f8                mov    %r15d,%eax
ffffffff8133bee1:       25 f0 0f 00 00          and    $0xff0,%eax

And gcc:

sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
ffffffff8131e29f:       44 89 d9                mov    %r11d,%ecx
ffffffff8131e2a2:       81 e1 ff 0f 00 00       and    $0xfff,%ecx
ffffffff8131e2a8:       4c 89 c0                mov    %r8,%rax
ffffffff8131e2ab:       4d 01 cb                add    %r9,%r11
ffffffff8131e2ae:       48 0f 42 05 5a 6d af    cmovb  0xaf6d5a(%rip),%rax   

Comment 4 by mka@chromium.org, Jun 7 2017

Cc: manojgupta@chromium.org mka@chromium.org
Owner: llozano@chromium.org
Status: Assigned (was: Started)
Luis, this is blocking the switch to clang, could someone from your team have a look at it?

For if the toolchain team just wants to compile crypto/cts.c and not build an  entire kernel:

# use attached .config
cp .config src/third_party/kernel/v4.4
cd src/third_party/kernel/v4.4
make oldconfig
make CC=x86_64-cros-linux-gnu-clang crypto/cts.o
.config
138 KB Download
Is p a pointer type? 
Can you try to replace ((unsigned long)(p) & ~PAGE_MASK) by ((uintptr_t)(p) & ~PAGE_MASK) and see if it helps?

Comment 6 by mka@chromium.org, Jun 7 2017

Summary: clang: Stack corruption in cts_cbc_decrypt () on x86 (was: clang: Stack corruption in crypto_cts_decrypt() on x86)
In this specific instance p is a pointer type, however the macro is widely used in the kernel and other data types may be used.

Using uintptr_t instead of unsigned long doesn't make a difference.
Looking at it.
Thanks, I am looking into it. 

Comment 9 by mka@chromium.org, Jun 7 2017

Thanks for having a look!

fyi, on arm64 the correct bitmask is used:

sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
ffffffc0004e32ac:       d346fd0b        lsr     x11, x8, #6
ffffffc0004e32b0:       8b08030c        add     x12, x24, x8
ffffffc0004e32b4:       8b080388        add     x8, x28, x8
ffffffc0004e32b8:       8b0d018c        add     x12, x12, x13
ffffffc0004e32bc:       8b0d0108        add     x8, x8, x13
ffffffc0004e32c0:       d2b8000d        mov     x13, #0xc0000000                // #3221225472
ffffffc0004e32c4:       f2dff7ad        movk    x13, #0xffbd, lsl #32
ffffffc0004e32c8:       9268856b        and     x11, x11, #0x3ffffffff000000
ffffffc0004e32cc:       f2ffffed        movk    x13, #0xffff, lsl #48
ffffffc0004e32d0:       d346fd8c        lsr     x12, x12, #6
ffffffc0004e32d4:       d346fd08        lsr     x8, x8, #6
ffffffc0004e32d8:       cb0b01ab        sub     x11, x13, x11
ffffffc0004e32dc:       927acd8c        and     x12, x12, #0x3ffffffffffffc0
ffffffc0004e32e0:       927acd08        and     x8, x8, #0x3ffffffffffffc0
ffffffc0004e32e4:       8b0c016c        add     x12, x11, x12
ffffffc0004e32e8:       8b080168        add     x8, x11, x8
ffffffc0004e32ec:       12002f0b        and     w11, w24, #0xfff
Owner: manojgupta@chromium.org
assigned to Manoj,

Does this only happen with O2?

Comment 11 by mka@chromium.org, Jun 7 2017

The same occurs with -Oz.

Building an entire kernel without optimizations fails, the assembly for  crypto/cts.c uses 0xfff as mask in this case. For -O1 0xfff is used in one instance and 0xff0 in another ...

A kernel built with -O1 does not crash, however a debug trace indicates that a wrong offset is used:

[   36.930458] MKA: cts_cbc_decrypt(): d = ffff88006f94f988, offset in page = 0980

I suppose the stack layout is different with -O1, which saves the return address from being overwritten.

Comment 12 by mka@chromium.org, Jun 8 2017

Could it be that clang is making assumptions about d being 64-bit aligned?

An experiment of isolating the code from the kernel context seems to point in this direction:


#include <stdio.h>

#define PAGE_MASK 0xfffffffffffff000

#define offset_in_page(p)((unsigned long)(p) & ~PAGE_MASK)

void f(int bsize) {
  char a[bsize];
  char c;

  printf("offset %p = 0x%04x\n", a, (unsigned int)offset_in_page(a));
  printf("offset %p = 0x%04x\n", &c, (unsigned int)offset_in_page(&c));
}

int main(int argc, char *argv[])
{
  f(16);
  f(3);
}


The assembly generated for the print statements is:

  printf("offset %p = 0x%04x\n", a, (unsigned int)offset_in_page(a));
  4007c6:       89 f2                   mov    %esi,%edx
  4007c8:       81 e2 f0 0f 00 00       and    $0xff0,%edx
  4007ce:       bf 7c 09 40 00          mov    $0x40097c,%edi
  4007d3:       31 c0                   xor    %eax,%eax
  4007d5:       e8 46 fe ff ff          callq  400620 <printf@plt>
  4007da:       48 8d 75 f7             lea    -0x9(%rbp),%rsi
  printf("offset %p = 0x%04x\n", &c, (unsigned int)offset_in_page(&c));
  4007de:       89 f2                   mov    %esi,%edx
  4007e0:       81 e2 ff 0f 00 00       and    $0xfff,%edx
  4007e6:       bf 7c 09 40 00          mov    $0x40097c,%edi
  4007eb:       31 c0                   xor    %eax,%eax
  4007ed:       e8 2e fe ff ff          callq  400620 <printf@plt>


A mask of 0xff0 is used for the dynamic array and 0xfff for the character type.


For compilation most of the flags of a kernel build were used:

x86_64-cros-linux-gnu-clang -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64   -mno-80387    -mtune=generic -mno-red-zone -funit-at-a-time  -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables     -O2 -Wframe-larger-than=2048 -Wno-unused-variable -Wno-format-invalid-specifier -Wno-gnu -Wno-address-of-packed-member -Wno-duplicate-decl-specifier -Wno-unused-function -Wno-tautological-compare  -no-integrated-as -fno-omit-frame-pointer -fno-optimize-sibling-calls  -g -pg  -mfentry -DCC_USING_FENTRY -Werror -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow  -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-uninitialized -fstack-protector-strong /tmp/offset_in_page.c

Its a bit difficult to track because of heavy inlining but initial setup of offset_in_page looks ok.

define internal fastcc void @sg_set_buf(%struct.scatterlist* %sg, i8* %buf, i32 %buflen) unnamed_addr #7 !dbg !3967 {
entry:
  call void @llvm.dbg.value(metadata %struct.scatterlist* %sg, i64 0, metadata !3971, metadata !3974), !dbg !3975
  call void @llvm.dbg.value(metadata i8* %buf, i64 0, metadata !3972, metadata !3974), !dbg !3976
  call void @llvm.dbg.value(metadata i32 %buflen, i64 0, metadata !3973, metadata !3974), !dbg !3977
  %0 = ptrtoint i8* %buf to i64, !dbg !3978
  %call = call fastcc i64 @__phys_addr_nodebug(i64 %0) #9, !dbg !3978
  %shr = lshr i64 %call, 12, !dbg !3978
  %add.ptr = getelementptr %struct.page, %struct.page* inttoptr (i64 -24189255811072 to %struct.page*), i64 %shr, !dbg !3978
  %1 = trunc i64 %0 to i32, !dbg !3979 // convert buf to int32
  %conv = and i32 %1, 4095, !dbg !3979 // 4095 = 0xfff
  call fastcc void @sg_set_page(%struct.scatterlist* %sg, %struct.page* %add.ptr, i32 %buflen, i32 %conv) #9, !dbg !3980
  ret void, !dbg !3981
}
llvm is actually expecting the buf argument to have 16 byte alignment. 
With 16 byte alignment, lower 4 bits are expected to be zero.

But the function is not getting the 16 byte alignment at runtime for the variable-length arrays? 

%vla = alloca i8, i64 %1, align 16, !dbg !4016 //  u8 tmp[bsize]

%vla3 = alloca i8, i64 %5, align 16, !dbg !4029 //  u8 s[bsize * 2]

  call fastcc void @sg_set_buf(%struct.scatterlist* %arrayidx, i8* nonnull %vla3, i32 %call) #9, !dbg !4052 // buf arg => vla3
  %arrayidx10 = getelementptr inbounds [1 x %struct.scatterlist], [1 x %struct.scatterlist]* %sgdst, i64 0, i64 0, !dbg !4053
  call fastcc void @sg_set_buf(%struct.scatterlist* %arrayidx10, i8* nonnull %vla, i32 %call) #9, !dbg !4054 // buf arg => vla


Exact command lines used:

/usr/bin/clang --sysroot=/usr/x86_64-cros-linux-gnu -Qunused-arguments -D_FORTIFY_SOURCE=2 -fno-stack-protector -fno-omit-frame-pointer -Wp,-MD,crypto/.cts.o.d -nostdinc -isystem /usr/lib64/clang/5.0.0/include -I./arch/x86/include -Iarch/x86/include/generated/uapi -Iarch/x86/include/generated -Iinclude -I./arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I./include/uapi -Iinclude/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Qunused-arguments -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -O2 --param=allow-store-data-races=0 -Wframe-larger-than=2048 -fstack-protector-strong -Wno-unused-variable -Wno-format-invalid-specifier -Wno-gnu -Wno-address-of-packed-member -Wno-duplicate-decl-specifier -Wno-tautological-compare -mno-global-merge -no-integrated-as -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -pg -mfentry -DCC_USING_FENTRY -Werror -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-uninitialized -DKBUILD_STR(s)=#s -DKBUILD_BASENAME=KBUILD_STR(cts) -DKBUILD_MODNAME=KBUILD_STR(cts) -c -o crypto/cts.o crypto/cts.c -v -B/usr/libexec/gcc/x86_64-cros-linux-gnu -target x86_64-cros-linux-gnu

 "/usr/bin/clang-5.0" "-cc1" "-triple" "x86_64-cros-linux-gnu" "-S" "-disable-free" "-main-file-name" "cts.c" "-mrelocation-model" "static" "-mthread-model" "posix" "-mllvm" "-warn-stack-size=2048" "-mdisable-fp-elim" "-relaxed-aliasing" "-mdisable-tail-calls" "-fmath-errno" "-masm-verbose" "-no-integrated-as" "-mconstructor-aliases" "-fuse-init-array" "-mcode-model" "kernel" "-target-cpu" "x86-64" "-target-feature" "-sse" "-target-feature" "-mmx" "-target-feature" "-sse2" "-target-feature" "-3dnow" "-target-feature" "-avx" "-target-feature" "-x87" "-disable-red-zone" "-dwarf-column-info" "-debug-info-kind=limited" "-dwarf-version=4" "-debugger-tuning=gdb" "-coverage-notes-file" "/home/manojgupta/trunk/src/third_party/kernel/v4.4/crypto/cts.gcno" "-nostdsysteminc" "-nobuiltininc" "-resource-dir" "/usr/lib64/clang/5.0.0" "-isystem" "/usr/lib64/clang/5.0.0/include" "-include" "./include/linux/kconfig.h" "-D" "_FORTIFY_SOURCE=2" "-I" "./arch/x86/include" "-I" "arch/x86/include/generated/uapi" "-I" "arch/x86/include/generated" "-I" "include" "-I" "./arch/x86/include/uapi" "-I" "arch/x86/include/generated/uapi" "-I" "./include/uapi" "-I" "include/generated/uapi" "-D" "__KERNEL__" "-D" "CONFIG_AS_CFI=1" "-D" "CONFIG_AS_CFI_SIGNAL_FRAME=1" "-D" "CONFIG_AS_CFI_SECTIONS=1" "-D" "CONFIG_AS_FXSAVEQ=1" "-D" "CONFIG_AS_SSSE3=1" "-D" "CONFIG_AS_CRC32=1" "-D" "CONFIG_AS_AVX=1" "-D" "CONFIG_AS_AVX2=1" "-D" "CONFIG_AS_SHA1_NI=1" "-D" "CONFIG_AS_SHA256_NI=1" "-D" "CC_USING_FENTRY" "-D" "KBUILD_STR(s)=#s" "-D" "KBUILD_BASENAME=KBUILD_STR(cts)" "-D" "KBUILD_MODNAME=KBUILD_STR(cts)" "-isysroot" "../../../../../../../usr/x86_64-cros-linux-gnu" "-O2" "-Wall" "-Wundef" "-Wstrict-prototypes" "-Wno-trigraphs" "-Werror-implicit-function-declaration" "-Wno-format-security" "-Wno-sign-compare" "-Wno-unused-variable" "-Wno-format-invalid-specifier" "-Wno-gnu" "-Wno-address-of-packed-member" "-Wno-duplicate-decl-specifier" "-Wno-tautological-compare" "-Werror" "-Wdeclaration-after-statement" "-Wno-pointer-sign" "-Werror=implicit-int" "-Werror=strict-prototypes" "-Werror=date-time" "-Wno-initializer-overrides" "-Wno-unused-value" "-Wno-format" "-Wno-sign-compare" "-Wno-format-zero-length" "-Wno-uninitialized" "-std=gnu89" "-fno-dwarf-directory-asm" "-fdebug-compilation-dir" "/home/manojgupta/trunk/src/third_party/kernel/v4.4" "-ferror-limit" "19" "-fmessage-length" "0" "-pg" "-mfentry" "-fwrapv" "-stack-protector" "2" "-fobjc-runtime=gcc" "-fno-common" "-fdiagnostics-show-option" "-fcolor-diagnostics" "-vectorize-loops" "-vectorize-slp" "-o" "/tmp/cts-db0497.s" "-x" "c" "crypto/cts.c"

Complete optimization log using -debug -print-after-all is at https://drive.google.com/a/google.com/file/d/0B6pPpjh2SnF_cDMwUHRnMWhmenM/view?usp=sharing
cts.i
1.2 MB Download
cts.ll
653 KB Download

Comment 16 by mka@chromium.org, Jun 8 2017

On my build all the variable length arrays are on an 8 byte boundary:

[  435.270725] MKA: cts_cbc_decrypt(): tmp = ffff88005485f9d8, iv = ffff88005485f9c8, s = ffff88005485f9a8, d = ffff88005485f988
mka@ My understanding is clang is expecting a stack alignment of 16 for the function cts_cbc_encrypt. Can you check if that is the case.

From comment #3, it appears that the frame is aligned at 8 bytes, not 16.

Comment 18 by mka@chromium.org, Jun 8 2017

In different logs (e.g. #1) the frame pointer (RBP) is aligned at 8 bytes.

I noticed that for x86 the kernel changes the stack alignment for gcc:

-mpreferred-stack-boundary=2

My understanding is that the corresponding option for clang would be -mstack-alignment , since it isn't specified the default should be used.
clang default stack-alignment is 4 => 16 bytes. Which is probably why it is expecting that frame is 16 byte aligned. 
I talked to Mi Wei (wei@) and he told me that whole program should use same alignment.
So I assume there is some code which does not adjust stack to be 16 byte aligned before calling another function (e.g. asm code).

So can you pass -mstack-alignment=2 and see if it fixes the crash.

Comment 20 by mka@chromium.org, Jun 9 2017

I didn't encounter the crash with -mstack-alignment=3, which is what the kernel uses for x86_64. However with this build the arrays are aligned at 16 bytes, which means the corruption wouldn't occur in this case even if the underlying problem was still present.

I experimented a bit to gain more confidence that this really fixes the issue, adding a few more local variables to the function, including variable length arrays with smaller size (bsize = 16 in our case):

u8 a1[bsize / 4], a2[bsize / 8];
u8 c;

and then logged the variable addresses and offset_in_page():

MKA: cts_cbc_decrypt(): a1 = ffff88006667f960, a2 = ffff88006667f950, c = ffff88006667fa27, off a1 = 0x0960, off a2 = 0x0950, off c = 0x0a27

Even the smaller variable length arrays are aligned at 16 bytes, which I suppose is due to:

"An array uses the same alignment as its elements, except that a local or global array variable of length at least 16 bytes or a C99 variable-length array variable always has alignment of at least 16 bytes"

System V Application Binary Interface, AMD64 Architecture Processor Supplement (Draft Version 0.3), 3.1.2 Data Representation, Aggregates and Unions
https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf

On the other hand the variable c is byte aligned and the offset is calculated correctly.
Yes, the arrays will always be 16 byte aligned. 

What changes with mstack-alignment is that clang understands that frame is now 8 byte aligned. So it will adjust stack to be 16 bytes aligned before allocating the arrays.

The default in clang is 16 bytes (mstack-alignment=4). So clang did not adjust stack before allocating the arrays in ctc_cbc_encrypt in the original code.

Comment 22 by mka@chromium.org, Jun 9 2017

Thanks for the clarification!

For reference, from the commit message of the patch that added -mpreferred-stack-boundary=3 (d9b0cde91c60 "x86-64, gcc: Use -mpreferred-stack-boundary=3 if supported"):

"On x86-64, the standard ABI requires alignment to 16 bytes.  However, this is not actually necessary in the kernel (we don't do SSE except in very controlled ways); and furthermore, the standard kernel entry on x86-64 actually leaves the stack on an odd 8-byte boundary, which means that gcc will generate extra instructions to keep the stack *mis*aligned!"
Labels: Build-Toolchain
Owner: mka@chromium.org
Assign back to mka@
Project Member

Comment 24 by bugdroid1@chromium.org, Jun 28 2017

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/018001c788210b897af3b98ed364fb981ce0ca7f

commit 018001c788210b897af3b98ed364fb981ce0ca7f
Author: Matthias Kaehlcke <mka@chromium.org>
Date: Wed Jun 28 00:56:07 2017

FROMGIT: kbuild: Add __cc-option macro

cc-option uses KBUILD_CFLAGS and KBUILD_CPPFLAGS when it determines
whether an option is supported or not. This is fine for options used to
build the kernel itself, however some components like the x86 boot code
use a different set of flags.

Add the new macro __cc-option which is a more generic version of
cc-option with additional parameters. One parameter is the compiler
with which the check should be performed, the other the compiler options
to be used instead KBUILD_C*FLAGS.

Refactor cc-option and hostcc-option to use __cc-option and move
hostcc-option to scripts/Kbuild.include.

BUG= chromium:729743 , chromium:702741
TEST=build for squawks

Conflicts:
	scripts/Kbuild.include

Change-Id: I8beb1fa33e746538a8196858b63ffc80935d47b3
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
(cherry picked from git.kernel.org masahiroy/linux-kbuild kbuild
commit 9f3f1fd299768782465cb32cdf0dd4528d11f26b)
Reviewed-on: https://chromium-review.googlesource.com/548977
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/018001c788210b897af3b98ed364fb981ce0ca7f/Makefile
[modify] https://crrev.com/018001c788210b897af3b98ed364fb981ce0ca7f/scripts/Kbuild.include

Project Member

Comment 25 by bugdroid1@chromium.org, Jun 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/bd7655327c49c51102c63fa2f2a66959a2034830

commit bd7655327c49c51102c63fa2f2a66959a2034830
Author: Matthias Kaehlcke <mka@chromium.org>
Date: Wed Jun 28 00:56:08 2017

FROMGIT: x86/build: Use __cc-option for boot code compiler options

cc-option is used to enable compiler options for the boot code if they
are available. The macro uses KBUILD_CFLAGS and KBUILD_CPPFLAGS for the
check, however these flags aren't used to build the boot code, in
consequence cc-option can yield wrong results. For example
-mpreferred-stack-boundary=2 is never set with a 64-bit compiler,
since the setting is only valid for 16 and 32-bit binaries. This
is also the case for 32-bit kernel builds, because the option -m32 is
added to KBUILD_CFLAGS after the assignment of REALMODE_CFLAGS.

Use __cc-option instead of cc-option for the boot mode options.
The macro receives the compiler options as parameter instead of using
KBUILD_C*FLAGS, for the boot code we pass REALMODE_CFLAGS.

Also use separate statements for the __cc-option checks instead
of performing them in the initial assignment of REALMODE_CFLAGS since
the variable is an input of the macro.

BUG= chromium:729743 , chromium:702741
TEST=build for squawks

Change-Id: I9bbee118f9d04bd9cc8bb1c104b097967eee8c82
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
(cherry picked from git.kernel.org masahiroy/linux-kbuild kbuild
commit 032a2c4f65a2f81c93e161a11197ba19bc14a909)
Reviewed-on: https://chromium-review.googlesource.com/548978
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/bd7655327c49c51102c63fa2f2a66959a2034830/arch/x86/Makefile

Project Member

Comment 26 by bugdroid1@chromium.org, Jun 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/cf18a5ec40a8ae8350beae8141a84f49f6210d9f

commit cf18a5ec40a8ae8350beae8141a84f49f6210d9f
Author: Matthias Kaehlcke <mka@chromium.org>
Date: Wed Jun 28 00:56:10 2017

FROMGIT: x86/build: Specify stack alignment for clang

For gcc stack alignment is configured with -mpreferred-stack-boundary=N,
clang has the option -mstack-alignment=N for that purpose. Use the same
alignment as with gcc.

If the alignment is not specified clang assumes an alignment of
16 bytes, as required by the standard ABI. However as mentioned in
d9b0cde91c60 ("x86-64, gcc: Use -mpreferred-stack-boundary=3 if
supported") the standard kernel entry on x86-64 leaves the stack
on an 8-byte boundary, as a consequence clang will keep the stack
misaligned.

BUG= chromium:729743 , chromium:702741
TEST=build for squawks with clang
  boot DUT
  login with user account
    => login ok, system does not crash

Change-Id: Ie10c6a2f533c846a5c386590ecfb519cf3ea71c1
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
(cherry picked from git.kernel.org masahiroy/linux-kbuild kbuild
commit d77698df39a512911586834d303275ea5fda74d0)
Reviewed-on: https://chromium-review.googlesource.com/548979
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/cf18a5ec40a8ae8350beae8141a84f49f6210d9f/arch/x86/Makefile

Comment 27 by mka@chromium.org, Jun 28 2017

Status: Verified (was: Assigned)

Sign in to add a comment