New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 617730 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit 15 days ago
Closed: May 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: ----

Blocking:
issue 624362



Sign in to add a comment

TraceStackFramePointers should check stack

Project Member Reported by dskiba@chromium.org, Jun 6 2016

Issue description

TraceStackFramePointers relies on heuristics to check whether stack frame pointer is inside the stack (i.e. safe to read). For Android we had to implement precise checks because heuristics are not enough in some cases (see  crbug.com/602701#c18 ).

We should try do something similar on Linux.

Things I've already tried (see crrev.com/1975393002):

1. Use pthread_getattr_np() to get stack attributes. Issues:

* Sometimes fails for the main thread because sandbox prevents reading proc/maps.
* Calls sched_getaffinity, which is blocked by the sandbox.
* Allocates memory, which causes it to reenter itself and deadlock (only seen on main thread).

All those issues can be worked around:

* Use __libc_stack_end for the main thread to avoid errors and the deadlocks.
* Punch a hole in the sandbox to allow sched_getaffinity for profiling builds.

This works, but is fragile - any thread that calls pthread_getattr_np() before
doing any allocation will deadlock.

2. Use mincore() to detect unmapped pages. Issues:

* Requires hole in a sandbox, since by default mincore syscall is banned.
* Stacks for threads go one after another without unmapped pages in between:

7f4299c1f000-7f429a41f000 rw-p 00000000 00:00 0 [stack:109410]
7f429a41f000-7f429a420000 ---p 00000000 00:00 0
7f429a420000-7f429ac20000 rw-p 00000000 00:00 0 [stack:109287]
7f429ac20000-7f429ac21000 ---p 00000000 00:00 0
7f429ac21000-7f429b421000 rw-p 00000000 00:00 0 [stack:109286]

So not only can't we reliably determine where stack ends, but we can also give unaccessible guard page in between (---p) a pass, and crash later trying to read it.
 
Specifying url to open in the command line together with startup tracing crashes browser process with ~50% chance:

out-linux/Release/chrome --enable-heap-profiling=native --user-data-dir=/tmp/fresh-profile --no-sandbox --trace-config-file=trace.config "cnn.com"

Crash happens on the main thread:

#2  0x00007f3c499512ca in base::debug::TraceStackFramePointers (out_trace=<optimized out>, max_depth=<optimized out>, 
    skip_initial=<optimized out>) at ../../base/debug/stack_trace.cc:145
#3  0x00007f3c499a566b in base::trace_event::AllocationContextTracker::GetContextSnapshot (this=0x7f3c47686380)
    at ../../base/trace_event/heap_profiler_allocation_context_tracker.cc:214
#4  0x00007f3c499c3787 in base::trace_event::MallocDumpProvider::InsertAllocation (this=0x1ded293a3730, address=0x7f3c476a46c0, 
    size=568) at ../../base/trace_event/malloc_dump_provider.cc:235
#5  0x00007f3c499c3869 in base::trace_event::(anonymous namespace)::HookAlloc (self=<optimized out>, size=568)
    at ../../base/trace_event/malloc_dump_provider.cc:38
#6  0x00007f3c4836ea9e in ShimMalloc (size=size@entry=568) at ../../base/allocator/allocator_shim.cc:174
#7  0x00007f3c412dd37d in __fopen_internal (filename=0x7ffd0181df36 "/usr/local/google/home/dskiba/.Xauthority", 
    mode=0x7f3c3e107af7 "rb", is32=1) at iofopen.c:73
#8  0x00007f3c3e1072b7 in XauGetBestAuthByAddr () from /usr/lib/x86_64-linux-gnu/libXau.so.6
#9  0x00007f3c403651b4 in ?? () from /usr/lib/x86_64-linux-gnu/libxcb.so.1
#10 0x00007f3c403653bd in ?? () from /usr/lib/x86_64-linux-gnu/libxcb.so.1
#11 0x00007f3c40364fe4 in xcb_connect_to_display_with_auth_info () from /usr/lib/x86_64-linux-gnu/libxcb.so.1
#12 0x00007f3c2577048a in pa_client_conf_from_x11 () from /usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-4.0.so
#13 0x00007f3c25bab8a5 in pa_context_new_with_proplist () from /usr/lib/x86_64-linux-gnu/libpulse.so.0
#14 0x00007f3c484e4596 in media::AudioManagerPulse::InitPulse (this=0x1ded29358360)
    at ../../media/audio/pulse/audio_manager_pulse.cc:257
#15 0x00007f3c484dec70 in media::CreateAudioManager (task_runner=..., worker_task_runner=..., audio_log_factory=
    0x7f3c4e889100 <(anonymous namespace)::g_media_internals+8>) at ../../media/audio/linux/audio_manager_linux.cc:50
#16 0x00007f3c484d29a7 in media::AudioManager::Create (task_runner=..., worker_task_runner=..., 
    audio_log_factory=0xffffffffffffffff) at ../../media/audio/audio_manager.cc:335
#17 0x00007f3c485fdbc0 in content::BrowserMainLoop::CreateAudioManager (this=0x7f3c476b6a80)
    at ../../content/browser/browser_main_loop.cc:1519
#18 0x00007f3c485fb5d4 in content::BrowserMainLoop::BrowserThreadsStarted (this=0x7f3c476b6a80)
    at ../../content/browser/browser_main_loop.cc:1271
#19 0x00007f3c488a3507 in Run (this=<optimized out>) at ../../base/callback.h:389
---Type <return> to continue, or q <return> to quit---
#20 content::StartupTaskRunner::RunAllTasksNow (this=<optimized out>) at ../../content/browser/startup_task_runner.cc:45
#21 0x00007f3c485fa859 in content::BrowserMainLoop::CreateStartupTasks (this=0x7f3c476b6a80)
    at ../../content/browser/browser_main_loop.cc:833
#22 0x00007f3c485febcf in content::BrowserMainRunnerImpl::Initialize (this=0x7f3c4769af80, parameters=...)
    at ../../content/browser/browser_main_runner.cc:140
#23 0x00007f3c485f7e78 in content::BrowserMain (parameters=...) at ../../content/browser/browser_main.cc:42
#24 0x00007f3c495da4c2 in content::ContentMainRunnerImpl::Run (this=0x7f3c476972d0)
    at ../../content/app/content_main_runner.cc:785
#25 0x00007f3c495d8f96 in content::ContentMain (params=...) at ../../content/app/content_main.cc:20
#26 0x00007f3c4836d52a in ChromeMain (argc=6, argv=0x7ffd0181c238) at ../../chrome/app/chrome_main.cc:85
#27 0x00007f3c41290f45 in __libc_start_main (main=0x7f3c4836d4e0 <main(int, char const**)>, argc=6, argv=0x7ffd0181c238, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd0181c228) at libc-start.c:287
#28 0x00007f3c4836d3fd in _start ()

We ended up with sp value of 0x7ffd0181df36, which is 7430 bytes beyond the end of the stack (0x7ffd0181c230).

Values of sp that we've found before that one:

0x7ffd0181a470
0x7ffd0181a8b0 (+1088)
0x7ffd0181ac00 (+848)
0x7ffd0181ac30 (+48)
0x7ffd0181ac60 (+48)
0x7ffd0181df36 (+13014)  --> #8 XauGetBestAuthByAddr

So some system library built without frame pointers called us with $rbp that happened to look like a valid stack pointer to our heuristics. We've seen (and fixed) this for Android before ( crbug.com/602701#c24 ).

Comment 2 by dskiba@chromium.org, Aug 10 2016

Idea: we don't need strict stack end for the purpose of reliable stack unwinding, it's enough to know some address at which we should stop. So we could interpose pthread_create, provide our own thread function, get stack address there, put it in pthread specific, and call user provided thread function. TraceStackFramePointers() would then retrieve the address from the pthread specific.

We already have a solution for the main thread on Linux (__libc_stack_end), and this trick might solve remaining non main thread cases.
Project Member

Comment 3 by bugdroid1@chromium.org, Aug 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2ff54586e0102659c1034d58269ac376d1aacc20

commit 2ff54586e0102659c1034d58269ac376d1aacc20
Author: dskiba <dskiba@google.com>
Date: Tue Aug 30 03:57:43 2016

Check stack pointer to be inside the stack when unwinding on Linux.

On Linux unwinding stack using frame pointers relies on heuristics, which
sometimes fail to catch invalid stack pointers. However it's hard to get
stack boundaries for all threads on Linux, see the bug.

This CL checks stack pointers against real stack boundary for the main thread
only, which is enough to fix recently discovered crash ( crbug.com/617730#c1 ).

BUG= 617730 

Review-Url: https://codereview.chromium.org/2203053003
Cr-Commit-Position: refs/heads/master@{#415065}

[modify] https://crrev.com/2ff54586e0102659c1034d58269ac376d1aacc20/base/debug/stack_trace.cc

Comment 4 by dskiba@chromium.org, Nov 28 2016

Blocking: 624362

Comment 5 by dskiba@chromium.org, May 16 2018

Status: WontFix (was: Untriaged)
We now switched to memlog, which can do sampling. With sampling it does a lot less stack unwinding, so actually just using standard drwarf unwinding is fine.

Sign in to add a comment