New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 824548 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: Sep 28
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug
Build-Toolchain


Show other hotlists

Hotlists containing this issue:
Hotlist-1


Sign in to add a comment

clang: implement -fno-delete-null-pointer-checks

Project Member Reported by mka@chromium.org, Mar 21 2018

Issue description

The upstream kernel recently enabled the ORC unwinder by default for x86, this unwinder is also used by the CrOS v4.14 kernel.

When building with clang and the ORC unwinder enabled objtool generates a series of warnings:

arch/x86/mm/pti.o: warning: objtool: pti_init() falls through to next
function pti_user_pagetable_walk_pmd()
s/debugfs/file.o: warning: objtool: full_proxy_llseek() falls through to next function full_proxy_read()
fs/debugfs/file.o: warning: objtool: full_proxy_read() falls through to next function full_proxy_write()
fs/debugfs/file.o: warning: objtool: full_proxy_write() falls through to next function full_proxy_poll()
fs/debugfs/file.o: warning: objtool: full_proxy_poll() falls through to next function full_proxy_unlocked_ioctl()
fs/debugfs/file.o: warning: objtool: full_proxy_unlocked_ioctl() falls
through to next function fops_u8_open()

The author of the unwinder investigated these warnings and came to the conclusion that clang is the culprit:


Ok, I did a little more digging.  Surprise, these objtool warnings are
actually correct.  Somehow the WARN macros are causing Clang to produce
bad code.  In some cases, Clang is assuming that a WARN is "noreturn",
i.e. that the UD2 doesn't return from the exception.

As an example, here's the full_proxy_llseek() function and its compiled
code:

#define FULL_PROXY_FUNC(name, ret_type, filp, proto, args)		\
static ret_type full_proxy_ ## name(proto)				\
{									\
	struct dentry *dentry = F_DENTRY(filp);			\
	const struct file_operations *real_fops;			\
	ret_type r;							\
									\
	r = debugfs_file_get(dentry);					\
	if (unlikely(r))						\
		return r;						\
	real_fops = debugfs_real_fops(filp);				\
	r = real_fops->name(args);					\
	debugfs_file_put(dentry);					\
	return r;							\
}

FULL_PROXY_FUNC(llseek, loff_t, filp,
		PROTO(struct file *filp, loff_t offset, int whence),
		ARGS(filp, offset, whence));


0000000000000ca0 <full_proxy_llseek>:
     ca0:       55                      push   %rbp
     ca1:       41 57                   push   %r15
     ca3:       41 56                   push   %r14
     ca5:       53                      push   %rbx
     ca6:       48 83 ec 10             sub    $0x10,%rsp
     caa:       41 89 d6                mov    %edx,%r14d
     cad:       49 89 f7                mov    %rsi,%r15
     cb0:       48 89 fd                mov    %rdi,%rbp
     cb3:       65 48 8b 04 25 28 00    mov    %gs:0x28,%rax
     cba:       00 00
     cbc:       48 89 44 24 08          mov    %rax,0x8(%rsp)
     cc1:       48 8b 5d 18             mov    0x18(%rbp),%rbx
     cc5:       48 89 df                mov    %rbx,%rdi
     cc8:       e8 00 00 00 00          callq  ccd <full_proxy_llseek+0x2d>
                        cc9: R_X86_64_PC32      debugfs_file_get-0x4
     ccd:       85 c0                   test   %eax,%eax
     ccf:       75 65                   jne    d36 <full_proxy_llseek+0x96>
     cd1:       48 8b 45 18             mov    0x18(%rbp),%rax
     cd5:       48 8b 40 78             mov    0x78(%rax),%rax
     cd9:       a8 01                   test   $0x1,%al
     cdb:       75 63                   jne    d40 <full_proxy_llseek+0xa0>
     ...
     d40:       0f 0b                   ud2
     d42:       0f 1f 40 00             nopl   0x0(%rax)
     d46:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
     d4d:       00 00 00

0000000000000d50 <full_proxy_read>:


After the debugfs_file_get() call, the debugfs_real_fops() call is
inlined.  Here it is:

const struct file_operations *debugfs_real_fops(const struct file *filp)
{
	struct debugfs_fsdata *fsd = F_DENTRY(filp)->d_fsdata;

	if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT) {
		/*
		 * Urgh, we've been called w/o a protecting
		 * debugfs_file_get().
		 */
		WARN_ON(1);
		return NULL;
	}

	return fsd->real_fops;
}

The WARN_ON() is the UD2 at offset d40.  Notice that, as objtool found,
it just falls through to the next function instead of "returning" to
full_proxy_llseek().


At first I thought this might be some kind of "optimization", since, if
you look closely, you'll see that the warning can never happen in this
situation.  But no, I downloaded the Clang binary, changed the 

  if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT) /* DEBUGFS_FSDATA_IS_REAL_FOPS_BIT == 1 */

to

  if ((unsigned long)fsd & 0x2)

and still got the same issue.


Then I figured that Clang must be peeking into the inline asm, and is
assuming that UD2 is a dead end.  But no, I changed ASM_UD2 to a 2-byte
NOP, and got the same issue.


So unless I'm missing some "undefined behavior", this looks like a Clang bug to me.  Somehow it doesn't like the WARN macros (but apparently only in a small number of cases).

https://lkml.org/lkml/2018/3/20/980


To reproduce:

# cd to kernel git repo
git checkout v4.15
make CC=clang defconfig
make CC=clang fs/debugfs/

 
Matthias, as usual, can you please provide me with a pre-processed file and the clang command line used?
(Use clang option -save-temps to get the pre processed file)

Comment 2 by mka@chromium.org, Mar 23 2018

The preprocessed file for Linux v4.15 is attached.

The command line is:

clang -Wp,-MD,fs/debugfs/.file.o.d  -nostdinc -isystem /usr/lib64/clang/7.0.0/include -I./arch/x86/include -I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Qunused-arguments -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mstack-alignment=8 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_AVX512=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -O2 -Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-variable -Wno-format-invalid-specifier -Wno-gnu -Wno-address-of-packed-member -Wno-tautological-compare -mno-global-merge -no-integrated-as -fomit-frame-pointer -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fno-stack-check -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-uninitialized -save-temps -Wno-constant-logical-operand    -DKBUILD_BASENAME='"file"'  -DKBUILD_MODNAME='"debugfs"' -c -o fs/debugfs/file.o fs/debugfs/file.c
file.s
98.4 KB Download
Matthias, can you upload the .i file?

Comment 4 by mka@chromium.org, Mar 26 2018

file.i is attached
file.i
1.3 MB Download
Here is my brief summary (overlapping with existing observations):
1. debugfs_real_fops is inlined.
2. There are two paths in debugfs_real_fops, one that return null, other doesn't.
3. There is a pointer dereference after call to debugfs_real_fops: 
      real_fops = debugfs_real_fops(filp);				\
      r = real_fops->name(args); // Always dereferences the real_fops pointer
4. Dereferencing null pointer is undefined behavior. So llvm optimizer happily drops the code path that returns null. Since the asm is marked is volatile, only the return statement gets killed.

Changing r = real_fops->name(args); to if (real_fops) r = real_fops->name(args) should fix it.

Comment 6 by mka@chromium.org, Mar 28 2018

Thanks Manoj, I sent a patch upstream: https://patchwork.kernel.org/patch/10311923/

Comment 7 by mka@chromium.org, Apr 4 2018

Summary: clang: implement -fno-delete-null-pointer-checks (was: clang: objtool warnings when ORC unwinder is enabled)
This seems to be another case of clang not having -fno-delete-null-pointer-checks:

https://lkml.org/lkml/2018/3/28/499

https://bugs.llvm.org/show_bug.cgi?id=9251

An earlier issue that was encountered and worked around:

https://groups.google.com/a/google.com/forum/#!msg/c-toolchain-team/B3quWYKCSsk/KsoPrDj1BwAJ

Comment 8 by mka@chromium.org, Apr 4 2018

from Linus Torvalds: 

Note that we explicitly use "-fno-delete-null-pointer-checks" because
we do *not* want NULL pointer check removal even in the case of a bug.

See commit a3ca86aea507 ("Add '-fno-delete-null-pointer-checks' to gcc
CFLAGS") for the reason: we had buggy code that accessed a pointer
before the NULL pointer check, but the bug was "benign" as long as the
compiler didn't actually remove the check.

And note how the buggy code *looked* safe. It was doing the right
check, it was just that the check was hidden and disabled by another
bug.

Removing the NULL pointer check turned a benign bug into a trivially
exploitable one by just mapping user space data at NULL (which avoided
the kernel oops, and then made the kernel use the user value!).

Now, admittedly we have a ton of other security features in place to
avoid these kinds of issues - SMAP helps on the hardware side, and not
allowing users to mmap() NULL in the first place helps with most
distributions, but the point remains: the kernel generally really
doesn't want optimizations that are perhaps allowed by the standard,
but that result in code generation that doesn't match the source code.

The NULL pointer removal is one such thing: don't remove checks in the
kernel based on "standard says". It's ok to do optimizations that are
based on "hardware does the exact same thing", but not on "the
standard says this is undefined so we can remove it".

https://lkml.org/lkml/2018/4/4/601


I think we need to prioritize this, for our own sake and to avoid it being brought up repeatedly as argument that "clang isn't ready to build the kernel".
Matthias,
I have a working patch for this now and would like to test it before sending to upstream for review.
How can I repro the objtool warnings? Also, are there any test cases that I can use.

Comment 10 by mka@google.com, Apr 30 2018

To reproduce:

git checkout v4.16
make CC=clang defconfig
make CC=clang arch/x86/mm/pti.o fs/debugfs/file.o

I don't know about existing test cases, maybe gcc has some?
Manoj has posted the clang & llvm patches for review:
https://reviews.llvm.org/D47894
https://reviews.llvm.org/D47895
mka@ Can you try the latest llvm changes (not yet commited upstream) at https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1037776

Note: I built amd64-generic 4.14 kernel locally and saw one objtool warning:
arch/x86/kernel/reboot.o: warning: objtool: native_machine_emergency_restart() falls through to next function machine_power_off()

I did some investigation to verify this is not caused by any null checks removal. 
My findings are:
native_machine_emergency_restart function has an infinite loop at the end of the function.
Because of infinite loop, it is not possible for the function to return.
Therefore, llvm optimizer omitted return statements at the end of this function. Which makes objtool think that this function is doing a fall through to next function.

I am not sure how to fix this warning in objtool so ptal.
Does marking native_machine_emergency_restart() __attribute__((noreturn)) help?

Comment 14 by mka@chromium.org, Jun 15 2018

#13: __attribute__((noreturn)) / __noreturn seemed promising, but it doesn't help.

The warning doesn't translate to a build error and as long as we know why it is generated it shouldn't be a big issue if it isn't fixed. In any case I'll reach out Josh, maybe he has some more ideas (though this brings us into 'you can't test this with upstream' territory ...) 

Comment 15 by mka@chromium.org, Jun 15 2018

For the record: when trying to describe a repro case for upstream I found that the warning is not issued for a defconfig. Apparently the return statement is only optimized away with -Oz / CONFIG_CC_OPTIMIZE_FOR_SIZE=y.
Project Member

Comment 16 by bugdroid1@chromium.org, Jun 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/72963164aedad8501dcfb9b88bdc9ae566c99fbe

commit 72963164aedad8501dcfb9b88bdc9ae566c99fbe
Author: Manoj Gupta <manojgupta@chromium.org>
Date: Fri Jun 15 21:13:06 2018

Revert "chrome-sdk: Remove absolute path from simple chrome build using goma"

This reverts commit 402795d084bfc112e1a47a401887aeec106238c2.

Reason for revert: Caused https://bugs.chromium.org/p/chromium/issues/detail?id=824548

BUG= chromium:824548 
TEST=TestSimpleChromeWorkflow made progress with tryjobs. No objcopy errors were seen.

Original change's description:
> chrome-sdk: Remove absolute path from simple chrome build using goma
>
> This patch relativizes absolute path in compiler flags.
>
> This is a preparation CL in chromite side for cache sharing between
> goma users. Cache sharing will improve build time of simple chrome for
> developers.
>
> While we don't use -MD, relativeness of sysroot does not affect
> .d/.o/.dwo files.
>
> Bug: 846610
> Change-Id: Ib816c7df87c69d15aec1f83cb2709ca12eb3a449
> Reviewed-on: https://chromium-review.googlesource.com/1073132
> Commit-Ready: Takuto Ikuta <tikuta@chromium.org>
> Tested-by: Takuto Ikuta <tikuta@chromium.org>
> Reviewed-by: Mike Frysinger <vapier@chromium.org>

Bug: 846610
Change-Id: If480937acd85abf6f702fade205545acd01e15e5
Reviewed-on: https://chromium-review.googlesource.com/1102918
Commit-Queue: Manoj Gupta <manojgupta@chromium.org>
Tested-by: Manoj Gupta <manojgupta@chromium.org>
Reviewed-by: Steven Bennetts <stevenjb@chromium.org>

[modify] https://crrev.com/72963164aedad8501dcfb9b88bdc9ae566c99fbe/cli/cros/cros_chrome_sdk.py

Project Member

Comment 17 by bugdroid1@chromium.org, Jun 16 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0496b4e506eb35f69d49007198fa1e666d5646df

commit 0496b4e506eb35f69d49007198fa1e666d5646df
Author: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Sat Jun 16 15:52:18 2018

Roll src/third_party/chromite 3a51f2a80a38..ae92f3c98fbe (10 commits)

https://chromium.googlesource.com/chromiumos/chromite.git/+log/3a51f2a80a38..ae92f3c98fbe


git log 3a51f2a80a38..ae92f3c98fbe --date=short --no-merges --format='%ad %ae %s'
2018-06-16 chrome-bot@chromium.org Update config settings by config-updater.
2018-06-16 dgarrett@google.com chromeos_config: Require all non-slaves to be important.
2018-06-15 lannm@chromium.org Annotate ScheduleSlavesStage with scheduled build links
2018-06-15 dgarrett@google.com generic_stages: Stop tagging skipped stages.
2018-06-15 dgarrett@google.com chromeos_config: Switch full builders to manifest_version.
2018-06-15 manojgupta@chromium.org Revert "chrome-sdk: Remove absolute path from simple chrome build using goma"
2018-06-15 xixuan@chromium.org SkylabHWTest: Swtich to use nyan_blaze release image to test.
2018-06-15 vapier@chromium.org cros lint: ignore initial UTF-8 BOM in JSON files
2018-06-15 ayatane@chromium.org chromeos-infra-go: Disable tests
2018-06-15 chrome-bot@chromium.org Update config settings by config-updater.


Created with:
  gclient setdep -r src/third_party/chromite@ae92f3c98fbe

The AutoRoll server is located here: https://chromite-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.



BUG=chromium:None,chromium:852600,chromium:852916,chromium:851183,chromium:824548,chromium:846610,chromium:845314,chromium:None,chromium:852633
TBR=chrome-os-gardeners@chromium.org

Change-Id: Id646fa70fc50cd3fd10506b67812fb6782bfdf36
Reviewed-on: https://chromium-review.googlesource.com/1103501
Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#567895}
[modify] https://crrev.com/0496b4e506eb35f69d49007198fa1e666d5646df/DEPS

Update:
Llvm part (https://reviews.llvm.org/D47895) has been merged upstream.
The corresponding clang changes (https://reviews.llvm.org/D47894) still need to be reviewed upstream.
Cc: llozano@chromium.org mka@chromium.org
 Issue 742607  has been merged into this issue.
mka@: Can you test support for -fno-delete-null-pointer-checks that has been accepted upstream ?
CL: https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1037776

To build llvm: $ USE="llvm-next" sudo emerge llvm

It's not clear to me how to test this properly, since it doesn't address a specific (known) issue. The best coverage is probably provided by the automated tests of the CQ, etc

I verified that a v4.18-rc2 kernel boots on kevin (arm64) and pyro (x86). Some hackery was needed due to the missing asm-goto support and other clang incompatiblities. Kevin didn't boot fully up to the console due to the lack of graphics support and init dependencies.

objtool still generates a bunch of warnings:

fs/debugfs/file.o: warning: objtool: full_proxy_read()+0xa1: return with modified stack frame
fs/debugfs/file.o: warning: objtool: full_proxy_read()+0x0: stack state mismatch: cfa1=7+56 cfa2=7+8
fs/debugfs/file.o: warning: objtool: full_proxy_poll()+0x92: return with modified stack frame
fs/debugfs/file.o: warning: objtool: full_proxy_poll()+0x0: stack state mismatch: cfa1=7+64 cfa2=7+8
fs/debugfs/file.o: warning: objtool: fops_u8_open()+0x15: sibling call from callable instruction with modified stack frame
fs/debugfs/file.o: warning: objtool: fops_u8_open()+0x0: stack state mismatch: cfa1=7+56 cfa2=7+8

If I recall correctly most or all of them went away with an earlier WIP patch from Manoj. I suppose we still see them now because only the LLVM part was merged and the clang changes are still outstanding.
I have merged both clang + llvm parts upstream and https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1037776 includes all the changes.
We need to understand these objtool warnings, what does stack state mismatch mean?
file: warning: objtool: func()+0x5c: stack state mismatch

   The instruction's frame pointer state is inconsistent, depending on
   which execution path was taken to reach the instruction.

   Make sure that, when CONFIG_FRAME_POINTER is enabled, the function
   pushes and sets up the frame pointer (for x86_64, this means rbp) at
   the beginning of the function and pops it at the end of the function.
   Also make sure that no other code in the function touches the frame
   pointer.

   Another possibility is that the code has some asm or inline asm which
   does some unusual things to the stack or the frame pointer.  In such
   cases it's probably appropriate to use the unwind hint macros in
   asm/unwind_hints.h.
Let me also double-check that really the clang version with the fix was used. I switched back and forth between different issues in the past days and I'm now not 100% sure if my USE="llvm-next" LLVM build really included CL:1037776.
My bad, no more warnings after re-emerging LLVM, sorry for the noise. I'll also re-test kevin and pyro with upstream.
Submitted https://reviews.llvm.org/rL337433 for clang changes.

Will try to cherry-pick them to Chrome OS soon.
For record, these are the set of CLs I submitted for -fno-delete-null-pointer-checks:

https://reviews.llvm.org/rL336613
https://reviews.llvm.org/rL337433
https://reviews.llvm.org/rL337742
https://reviews.llvm.org/rL338292

Comment 28 Deleted

Status: Verified (was: Assigned)
The compiler with support for this feature is live now, closing it finally!
woohoo!

Sign in to add a comment