clang: Newer kernels crash in function tracer initialization |
|||
Issue description
4.12-rc1 crashes on Kevin during early kernel initialization. The crash occurs in the ftrace initialization:
ftrace_process_locs()
ftrace_update_code()
ftrace_code_disable()
ftrace_make_nop()
ftrace_modify_code()
http://elixir.free-electrons.com/linux/v4.12-rc1/source/arch/arm64/kernel/ftrace.c#L25
ftrace_code_disable() is called > 29000x without problem, then the system suddenly resets. No messages are available since the serial console and ramoops aren't initialized yet.
ftrace_modify_code() live patches the kernel code, replacing a call to mcount() with a nop. In the last successful call 'pc' actually corresponds to an mcount call:
[ 0.000000] MKA: pc = ffffff8008776674 old = 0000000097e4704f, new = 00000000d503201f
ffffff8008776674: 97e4704f bl ffffff80080927b0 <_mcount>
In the next call 'pc' is an address in .rodata:
[ 0.000000] MKA: pc = ffffff8008863280 old = 0000000097e0bd4c, new = 00000000d503201f
ffffff8008863270 r rodata
ffffff8008863278 R lkdtm_rodata_do_nothing
ffffff800886328c r test_text
A kernel build with clang has over 2000 entries more to patch than gcc:
gcc:
[ 0.000000] MKA: start: ffffff8008c8c748, end: ffffff8008cc4fe0, count: 28947
clang:
[ 0.000000] MKA: start: ffffff8008c8c740, end: ffffff8008cc9640, count: 31200
The significantly higher number is suspicious and is likely related, since the problem occurs in the higher addresses.
The list of all calls to mcount() is assembled by http://elixir.free-electrons.com/linux/v4.12-rc5/source/scripts/recordmcount.pl and made accessible to the kernel through __start_mcount_loc and __stop_mcount_loc.
The issue might be related with this clang problem reported in 2011:
"Another problem is with -pg, which enables instrumentation code for function calls in GCC, and is used when building Ftrace. For inline functions, the calls to mcount() get added multiple times, both when the code is generated and when it is expanded inline. The no_instrument_function attribute is not properly propagated to inline functions"
https://lwn.net/Articles/441018/
,
Jun 17 2017
I believe that the inlining bug is fixed (https://bugs.llvm.org//show_bug.cgi?id=28660) https://reviews.llvm.org/D22825 https://reviews.llvm.org/D22666 I'll look into __attribute__((no_instrument_function)) asupport.
,
Jun 17 2017
Have a patch for __attribute__((no_instrument_function)) with -pg. Will be sending for review soon.
,
Jun 19 2017
Sent patch for review at https://reviews.llvm.org/D34357
,
Jun 19 2017
Thanks manojgupta@!
,
Jun 19 2017
,
Jun 20 2017
Verified that the compiler change fixes the crash. It also brings the number of mcount() calls to patch down to roughly the same number as with gcc: [ 0.000000] MKA: ftrace_process_locs: count = 29566 |
|||
►
Sign in to add a comment |
|||
Comment 1 by mka@chromium.org
, Jun 17 2017