New issue
Advanced search Search tips

Issue 734256 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

clang: Newer kernels crash in function tracer initialization

Project Member Reported by mka@chromium.org, Jun 16 2017

Issue description

4.12-rc1 crashes on Kevin during early kernel initialization. The crash occurs in the ftrace initialization:

ftrace_process_locs()
  ftrace_update_code()
    ftrace_code_disable()
      ftrace_make_nop()
        ftrace_modify_code()

http://elixir.free-electrons.com/linux/v4.12-rc1/source/arch/arm64/kernel/ftrace.c#L25

ftrace_code_disable() is called > 29000x without problem, then the system suddenly resets. No messages are available since the serial console and ramoops aren't initialized yet.

ftrace_modify_code() live patches the kernel code, replacing a call to mcount() with a nop. In the last successful call 'pc' actually corresponds to an mcount call:

[    0.000000] MKA: pc = ffffff8008776674 old = 0000000097e4704f, new = 00000000d503201f

ffffff8008776674:       97e4704f        bl      ffffff80080927b0 <_mcount>


In the next call 'pc' is an address in .rodata:

[    0.000000] MKA: pc = ffffff8008863280 old = 0000000097e0bd4c, new = 00000000d503201f

ffffff8008863270 r rodata
ffffff8008863278 R lkdtm_rodata_do_nothing
ffffff800886328c r test_text


A kernel build with clang has over 2000 entries more to patch than gcc:

gcc:
[    0.000000] MKA: start: ffffff8008c8c748, end: ffffff8008cc4fe0, count: 28947

clang:
[    0.000000] MKA: start: ffffff8008c8c740, end: ffffff8008cc9640, count: 31200

The significantly higher number is suspicious and is likely related, since the problem occurs in the higher addresses.

The list of all calls to mcount() is assembled by http://elixir.free-electrons.com/linux/v4.12-rc5/source/scripts/recordmcount.pl and made accessible to the kernel through __start_mcount_loc and __stop_mcount_loc.

The issue might be related with this clang problem reported in 2011:

"Another problem is with -pg, which enables instrumentation code for function calls in GCC, and is used when building Ftrace. For inline functions, the calls to mcount() get added multiple times, both when the code is generated and when it is expanded inline. The no_instrument_function attribute is not properly propagated to inline functions"

https://lwn.net/Articles/441018/
 

Comment 1 by mka@chromium.org, Jun 17 2017

The problem doesn't seem to be directly related with the higher number of patched calls (though it's still odd). The 'address in .rodata actually corresponds to a function, however it is a special one:

"This includes functions that are meant to live entirely in .rodata (via objcopy tricks), to validate the non-executability of .rodata."

http://elixir.free-electrons.com/linux/v4.12-rc5/source/drivers/misc/lkdtm_rodata.c

The problem is that lkdtm_rodata_do_nothing() ends up in the list of code to patch. The reset does not occur when the "driver" is disabled.

The kernel includes a patch to avoid the function from being included in the list of traceable functions:

  lkdtm: Mark lkdtm_rodata_do_nothing() notrace

    
  lkdtm_rodata_do_nothing() is an empty function which is generated in
  order to test the non-executability of rodata.
    
  Currently if function tracing is enabled then an mcount callsite will be
  generated for lkdtm_rodata_do_nothing(), and it will appear in the list
  of available functions for function tracing (available_filter_functions).

  c012268b37db6b10b59dac9b7f45956cb9a8bcb2

However this doesn't seem to work as intended with clang.

The attribute 'notrace' is defined as:

#define notrace __attribute__((no_instrument_function))

I believe that  the inlining bug is fixed (https://bugs.llvm.org//show_bug.cgi?id=28660) 
https://reviews.llvm.org/D22825
https://reviews.llvm.org/D22666

I'll look into __attribute__((no_instrument_function)) asupport.
Have a patch for __attribute__((no_instrument_function)) with -pg. Will be sending for review soon.
Sent patch for review at https://reviews.llvm.org/D34357

Comment 5 by mka@chromium.org, Jun 19 2017

Thanks manojgupta@!
Owner: manojgupta@chromium.org

Comment 7 by mka@chromium.org, Jun 20 2017

Status: Verified (was: Unconfirmed)
Verified that the compiler change fixes the crash.

It also brings the number of mcount() calls to patch down to roughly the same number as with gcc:

[    0.000000] MKA: ftrace_process_locs: count = 29566

Sign in to add a comment