Issue metadata
Sign in to add a comment
|
clang: implement -fno-delete-null-pointer-checks |
||||||||||||||||||||||||
Issue description
The upstream kernel recently enabled the ORC unwinder by default for x86, this unwinder is also used by the CrOS v4.14 kernel.
When building with clang and the ORC unwinder enabled objtool generates a series of warnings:
arch/x86/mm/pti.o: warning: objtool: pti_init() falls through to next
function pti_user_pagetable_walk_pmd()
s/debugfs/file.o: warning: objtool: full_proxy_llseek() falls through to next function full_proxy_read()
fs/debugfs/file.o: warning: objtool: full_proxy_read() falls through to next function full_proxy_write()
fs/debugfs/file.o: warning: objtool: full_proxy_write() falls through to next function full_proxy_poll()
fs/debugfs/file.o: warning: objtool: full_proxy_poll() falls through to next function full_proxy_unlocked_ioctl()
fs/debugfs/file.o: warning: objtool: full_proxy_unlocked_ioctl() falls
through to next function fops_u8_open()
The author of the unwinder investigated these warnings and came to the conclusion that clang is the culprit:
Ok, I did a little more digging. Surprise, these objtool warnings are
actually correct. Somehow the WARN macros are causing Clang to produce
bad code. In some cases, Clang is assuming that a WARN is "noreturn",
i.e. that the UD2 doesn't return from the exception.
As an example, here's the full_proxy_llseek() function and its compiled
code:
#define FULL_PROXY_FUNC(name, ret_type, filp, proto, args) \
static ret_type full_proxy_ ## name(proto) \
{ \
struct dentry *dentry = F_DENTRY(filp); \
const struct file_operations *real_fops; \
ret_type r; \
\
r = debugfs_file_get(dentry); \
if (unlikely(r)) \
return r; \
real_fops = debugfs_real_fops(filp); \
r = real_fops->name(args); \
debugfs_file_put(dentry); \
return r; \
}
FULL_PROXY_FUNC(llseek, loff_t, filp,
PROTO(struct file *filp, loff_t offset, int whence),
ARGS(filp, offset, whence));
0000000000000ca0 <full_proxy_llseek>:
ca0: 55 push %rbp
ca1: 41 57 push %r15
ca3: 41 56 push %r14
ca5: 53 push %rbx
ca6: 48 83 ec 10 sub $0x10,%rsp
caa: 41 89 d6 mov %edx,%r14d
cad: 49 89 f7 mov %rsi,%r15
cb0: 48 89 fd mov %rdi,%rbp
cb3: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
cba: 00 00
cbc: 48 89 44 24 08 mov %rax,0x8(%rsp)
cc1: 48 8b 5d 18 mov 0x18(%rbp),%rbx
cc5: 48 89 df mov %rbx,%rdi
cc8: e8 00 00 00 00 callq ccd <full_proxy_llseek+0x2d>
cc9: R_X86_64_PC32 debugfs_file_get-0x4
ccd: 85 c0 test %eax,%eax
ccf: 75 65 jne d36 <full_proxy_llseek+0x96>
cd1: 48 8b 45 18 mov 0x18(%rbp),%rax
cd5: 48 8b 40 78 mov 0x78(%rax),%rax
cd9: a8 01 test $0x1,%al
cdb: 75 63 jne d40 <full_proxy_llseek+0xa0>
...
d40: 0f 0b ud2
d42: 0f 1f 40 00 nopl 0x0(%rax)
d46: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
d4d: 00 00 00
0000000000000d50 <full_proxy_read>:
After the debugfs_file_get() call, the debugfs_real_fops() call is
inlined. Here it is:
const struct file_operations *debugfs_real_fops(const struct file *filp)
{
struct debugfs_fsdata *fsd = F_DENTRY(filp)->d_fsdata;
if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT) {
/*
* Urgh, we've been called w/o a protecting
* debugfs_file_get().
*/
WARN_ON(1);
return NULL;
}
return fsd->real_fops;
}
The WARN_ON() is the UD2 at offset d40. Notice that, as objtool found,
it just falls through to the next function instead of "returning" to
full_proxy_llseek().
At first I thought this might be some kind of "optimization", since, if
you look closely, you'll see that the warning can never happen in this
situation. But no, I downloaded the Clang binary, changed the
if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT) /* DEBUGFS_FSDATA_IS_REAL_FOPS_BIT == 1 */
to
if ((unsigned long)fsd & 0x2)
and still got the same issue.
Then I figured that Clang must be peeking into the inline asm, and is
assuming that UD2 is a dead end. But no, I changed ASM_UD2 to a 2-byte
NOP, and got the same issue.
So unless I'm missing some "undefined behavior", this looks like a Clang bug to me. Somehow it doesn't like the WARN macros (but apparently only in a small number of cases).
https://lkml.org/lkml/2018/3/20/980
To reproduce:
# cd to kernel git repo
git checkout v4.15
make CC=clang defconfig
make CC=clang fs/debugfs/
,
Mar 23 2018
The preprocessed file for Linux v4.15 is attached. The command line is: clang -Wp,-MD,fs/debugfs/.file.o.d -nostdinc -isystem /usr/lib64/clang/7.0.0/include -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Qunused-arguments -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mstack-alignment=8 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_AVX512=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -O2 -Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-variable -Wno-format-invalid-specifier -Wno-gnu -Wno-address-of-packed-member -Wno-tautological-compare -mno-global-merge -no-integrated-as -fomit-frame-pointer -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fno-stack-check -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-uninitialized -save-temps -Wno-constant-logical-operand -DKBUILD_BASENAME='"file"' -DKBUILD_MODNAME='"debugfs"' -c -o fs/debugfs/file.o fs/debugfs/file.c
,
Mar 26 2018
Matthias, can you upload the .i file?
,
Mar 26 2018
file.i is attached
,
Mar 27 2018
Here is my brief summary (overlapping with existing observations):
1. debugfs_real_fops is inlined.
2. There are two paths in debugfs_real_fops, one that return null, other doesn't.
3. There is a pointer dereference after call to debugfs_real_fops:
real_fops = debugfs_real_fops(filp); \
r = real_fops->name(args); // Always dereferences the real_fops pointer
4. Dereferencing null pointer is undefined behavior. So llvm optimizer happily drops the code path that returns null. Since the asm is marked is volatile, only the return statement gets killed.
Changing r = real_fops->name(args); to if (real_fops) r = real_fops->name(args) should fix it.
,
Mar 28 2018
Thanks Manoj, I sent a patch upstream: https://patchwork.kernel.org/patch/10311923/
,
Apr 4 2018
This seems to be another case of clang not having -fno-delete-null-pointer-checks: https://lkml.org/lkml/2018/3/28/499 https://bugs.llvm.org/show_bug.cgi?id=9251 An earlier issue that was encountered and worked around: https://groups.google.com/a/google.com/forum/#!msg/c-toolchain-team/B3quWYKCSsk/KsoPrDj1BwAJ
,
Apr 4 2018
from Linus Torvalds:
Note that we explicitly use "-fno-delete-null-pointer-checks" because
we do *not* want NULL pointer check removal even in the case of a bug.
See commit a3ca86aea507 ("Add '-fno-delete-null-pointer-checks' to gcc
CFLAGS") for the reason: we had buggy code that accessed a pointer
before the NULL pointer check, but the bug was "benign" as long as the
compiler didn't actually remove the check.
And note how the buggy code *looked* safe. It was doing the right
check, it was just that the check was hidden and disabled by another
bug.
Removing the NULL pointer check turned a benign bug into a trivially
exploitable one by just mapping user space data at NULL (which avoided
the kernel oops, and then made the kernel use the user value!).
Now, admittedly we have a ton of other security features in place to
avoid these kinds of issues - SMAP helps on the hardware side, and not
allowing users to mmap() NULL in the first place helps with most
distributions, but the point remains: the kernel generally really
doesn't want optimizations that are perhaps allowed by the standard,
but that result in code generation that doesn't match the source code.
The NULL pointer removal is one such thing: don't remove checks in the
kernel based on "standard says". It's ok to do optimizations that are
based on "hardware does the exact same thing", but not on "the
standard says this is undefined so we can remove it".
https://lkml.org/lkml/2018/4/4/601
I think we need to prioritize this, for our own sake and to avoid it being brought up repeatedly as argument that "clang isn't ready to build the kernel".
,
Apr 30 2018
Matthias, I have a working patch for this now and would like to test it before sending to upstream for review. How can I repro the objtool warnings? Also, are there any test cases that I can use.
,
Apr 30 2018
To reproduce: git checkout v4.16 make CC=clang defconfig make CC=clang arch/x86/mm/pti.o fs/debugfs/file.o I don't know about existing test cases, maybe gcc has some?
,
Jun 7 2018
Manoj has posted the clang & llvm patches for review: https://reviews.llvm.org/D47894 https://reviews.llvm.org/D47895
,
Jun 14 2018
mka@ Can you try the latest llvm changes (not yet commited upstream) at https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1037776 Note: I built amd64-generic 4.14 kernel locally and saw one objtool warning: arch/x86/kernel/reboot.o: warning: objtool: native_machine_emergency_restart() falls through to next function machine_power_off() I did some investigation to verify this is not caused by any null checks removal. My findings are: native_machine_emergency_restart function has an infinite loop at the end of the function. Because of infinite loop, it is not possible for the function to return. Therefore, llvm optimizer omitted return statements at the end of this function. Which makes objtool think that this function is doing a fall through to next function. I am not sure how to fix this warning in objtool so ptal.
,
Jun 15 2018
Does marking native_machine_emergency_restart() __attribute__((noreturn)) help?
,
Jun 15 2018
#13: __attribute__((noreturn)) / __noreturn seemed promising, but it doesn't help. The warning doesn't translate to a build error and as long as we know why it is generated it shouldn't be a big issue if it isn't fixed. In any case I'll reach out Josh, maybe he has some more ideas (though this brings us into 'you can't test this with upstream' territory ...)
,
Jun 15 2018
For the record: when trying to describe a repro case for upstream I found that the warning is not issued for a defconfig. Apparently the return statement is only optimized away with -Oz / CONFIG_CC_OPTIMIZE_FOR_SIZE=y.
,
Jun 15 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/72963164aedad8501dcfb9b88bdc9ae566c99fbe commit 72963164aedad8501dcfb9b88bdc9ae566c99fbe Author: Manoj Gupta <manojgupta@chromium.org> Date: Fri Jun 15 21:13:06 2018 Revert "chrome-sdk: Remove absolute path from simple chrome build using goma" This reverts commit 402795d084bfc112e1a47a401887aeec106238c2. Reason for revert: Caused https://bugs.chromium.org/p/chromium/issues/detail?id=824548 BUG= chromium:824548 TEST=TestSimpleChromeWorkflow made progress with tryjobs. No objcopy errors were seen. Original change's description: > chrome-sdk: Remove absolute path from simple chrome build using goma > > This patch relativizes absolute path in compiler flags. > > This is a preparation CL in chromite side for cache sharing between > goma users. Cache sharing will improve build time of simple chrome for > developers. > > While we don't use -MD, relativeness of sysroot does not affect > .d/.o/.dwo files. > > Bug: 846610 > Change-Id: Ib816c7df87c69d15aec1f83cb2709ca12eb3a449 > Reviewed-on: https://chromium-review.googlesource.com/1073132 > Commit-Ready: Takuto Ikuta <tikuta@chromium.org> > Tested-by: Takuto Ikuta <tikuta@chromium.org> > Reviewed-by: Mike Frysinger <vapier@chromium.org> Bug: 846610 Change-Id: If480937acd85abf6f702fade205545acd01e15e5 Reviewed-on: https://chromium-review.googlesource.com/1102918 Commit-Queue: Manoj Gupta <manojgupta@chromium.org> Tested-by: Manoj Gupta <manojgupta@chromium.org> Reviewed-by: Steven Bennetts <stevenjb@chromium.org> [modify] https://crrev.com/72963164aedad8501dcfb9b88bdc9ae566c99fbe/cli/cros/cros_chrome_sdk.py
,
Jun 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0496b4e506eb35f69d49007198fa1e666d5646df commit 0496b4e506eb35f69d49007198fa1e666d5646df Author: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Sat Jun 16 15:52:18 2018 Roll src/third_party/chromite 3a51f2a80a38..ae92f3c98fbe (10 commits) https://chromium.googlesource.com/chromiumos/chromite.git/+log/3a51f2a80a38..ae92f3c98fbe git log 3a51f2a80a38..ae92f3c98fbe --date=short --no-merges --format='%ad %ae %s' 2018-06-16 chrome-bot@chromium.org Update config settings by config-updater. 2018-06-16 dgarrett@google.com chromeos_config: Require all non-slaves to be important. 2018-06-15 lannm@chromium.org Annotate ScheduleSlavesStage with scheduled build links 2018-06-15 dgarrett@google.com generic_stages: Stop tagging skipped stages. 2018-06-15 dgarrett@google.com chromeos_config: Switch full builders to manifest_version. 2018-06-15 manojgupta@chromium.org Revert "chrome-sdk: Remove absolute path from simple chrome build using goma" 2018-06-15 xixuan@chromium.org SkylabHWTest: Swtich to use nyan_blaze release image to test. 2018-06-15 vapier@chromium.org cros lint: ignore initial UTF-8 BOM in JSON files 2018-06-15 ayatane@chromium.org chromeos-infra-go: Disable tests 2018-06-15 chrome-bot@chromium.org Update config settings by config-updater. Created with: gclient setdep -r src/third_party/chromite@ae92f3c98fbe The AutoRoll server is located here: https://chromite-chromium-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. BUG=chromium:None,chromium:852600,chromium:852916,chromium:851183,chromium:824548,chromium:846610,chromium:845314,chromium:None,chromium:852633 TBR=chrome-os-gardeners@chromium.org Change-Id: Id646fa70fc50cd3fd10506b67812fb6782bfdf36 Reviewed-on: https://chromium-review.googlesource.com/1103501 Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#567895} [modify] https://crrev.com/0496b4e506eb35f69d49007198fa1e666d5646df/DEPS
,
Jul 10
Update: Llvm part (https://reviews.llvm.org/D47895) has been merged upstream. The corresponding clang changes (https://reviews.llvm.org/D47894) still need to be reviewed upstream.
,
Jul 10
,
Jul 17
mka@: Can you test support for -fno-delete-null-pointer-checks that has been accepted upstream ? CL: https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1037776 To build llvm: $ USE="llvm-next" sudo emerge llvm
,
Jul 19
It's not clear to me how to test this properly, since it doesn't address a specific (known) issue. The best coverage is probably provided by the automated tests of the CQ, etc I verified that a v4.18-rc2 kernel boots on kevin (arm64) and pyro (x86). Some hackery was needed due to the missing asm-goto support and other clang incompatiblities. Kevin didn't boot fully up to the console due to the lack of graphics support and init dependencies. objtool still generates a bunch of warnings: fs/debugfs/file.o: warning: objtool: full_proxy_read()+0xa1: return with modified stack frame fs/debugfs/file.o: warning: objtool: full_proxy_read()+0x0: stack state mismatch: cfa1=7+56 cfa2=7+8 fs/debugfs/file.o: warning: objtool: full_proxy_poll()+0x92: return with modified stack frame fs/debugfs/file.o: warning: objtool: full_proxy_poll()+0x0: stack state mismatch: cfa1=7+64 cfa2=7+8 fs/debugfs/file.o: warning: objtool: fops_u8_open()+0x15: sibling call from callable instruction with modified stack frame fs/debugfs/file.o: warning: objtool: fops_u8_open()+0x0: stack state mismatch: cfa1=7+56 cfa2=7+8 If I recall correctly most or all of them went away with an earlier WIP patch from Manoj. I suppose we still see them now because only the LLVM part was merged and the clang changes are still outstanding.
,
Jul 19
I have merged both clang + llvm parts upstream and https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1037776 includes all the changes. We need to understand these objtool warnings, what does stack state mismatch mean?
,
Jul 19
file: warning: objtool: func()+0x5c: stack state mismatch The instruction's frame pointer state is inconsistent, depending on which execution path was taken to reach the instruction. Make sure that, when CONFIG_FRAME_POINTER is enabled, the function pushes and sets up the frame pointer (for x86_64, this means rbp) at the beginning of the function and pops it at the end of the function. Also make sure that no other code in the function touches the frame pointer. Another possibility is that the code has some asm or inline asm which does some unusual things to the stack or the frame pointer. In such cases it's probably appropriate to use the unwind hint macros in asm/unwind_hints.h.
,
Jul 19
Let me also double-check that really the clang version with the fix was used. I switched back and forth between different issues in the past days and I'm now not 100% sure if my USE="llvm-next" LLVM build really included CL:1037776.
,
Jul 19
My bad, no more warnings after re-emerging LLVM, sorry for the noise. I'll also re-test kevin and pyro with upstream.
,
Jul 20
Submitted https://reviews.llvm.org/rL337433 for clang changes. Will try to cherry-pick them to Chrome OS soon.
,
Jul 31
For record, these are the set of CLs I submitted for -fno-delete-null-pointer-checks: https://reviews.llvm.org/rL336613 https://reviews.llvm.org/rL337433 https://reviews.llvm.org/rL337742 https://reviews.llvm.org/rL338292
,
Sep 28
The compiler with support for this feature is live now, closing it finally!
,
Sep 28
woohoo! |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by manojgupta@chromium.org
, Mar 23 2018