New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 624362 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Last visit 15 days ago
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on:
issue 617730

Blocking:
issue 698266



Sign in to add a comment

Some traces from external libraries are missing with --enable-heap-profiling=native

Project Member Reported by primiano@chromium.org, Jun 29 2016

Issue description

Looks like something is missing from TraceStackFramePointers. Not sure if this is libx11 missing frame pointers or what. Would be nice to figure out. 

Copying from internal thread with tasak@
------
I've finished investigating why dropped.
This is a bug of base::debug::TraceStackFramePointers. in base/debug/stack_trace.cc.

I replaced the function with backtrace() defined in execinfo.h and ran chrome with --enable-heap-profiling=native (... it's too slow...)

So I saw:
[Thead: CrBrowserMain]
  _start
    __libc_start_main
     ChromeMain
      ...
       StartupBrowserCreatorImpl::Launch
        ...
         StartupBrowserCreatorImpl::ProcessSpecifiedURLs
          StartupBrowserCreatorImpl::OpenTabsInBrowser
           content::Navigate
            content::WebContents::Create
             content::WebContentsImpl::CreateWithOpener
              views::Widget::Init
               views::DesktopNativeWidgetAura::InitNativeWidget
                ....
                 ui:SkBitmapToXcusorImageCreate
                  XCursorImageCreate

The XCursorImageCreate was reported as "unknown". The information about who invoked was dropped. However, now we have entire stacktraces from _start to XCursorImageCreate.
-----

 

Comment 1 by dskiba@chromium.org, Jun 29 2016

Yeah, this is expected. Libraries on Linux don't have frame pointers, so TraceStackFramePointers() can't unwind into them. This is a tradeoff - as tasak@ discovered backtrace() is too slow.

I guess we can try scanning the stack to find first valid frame (which will be in Chrome's code) and unwind from there. That would get us something like this:

[Thead: CrBrowserMain]
  _start
    __libc_start_main
     ChromeMain
      ...
       StartupBrowserCreatorImpl::Launch
        ...
         StartupBrowserCreatorImpl::ProcessSpecifiedURLs
          StartupBrowserCreatorImpl::OpenTabsInBrowser
           content::Navigate
            content::WebContents::Create
             content::WebContentsImpl::CreateWithOpener
              views::Widget::Init
               views::DesktopNativeWidgetAura::InitNativeWidget
                ....
                 ui:SkBitmapToXcusorImageCreate
                  <libx11.so>
                   XLastFunctionFromTheLibrary
                    malloc
                     ShimXXX
                      InsertAllocation

I.e. we would skip all frames inside libx11, except for the last one (which called malloc).

Cc: primiano@chromium.org
I'd say that it's rather important to fix this bug to explain the 30% of "unknown" objects in the native-stack profiler.

Do we have any plan to fix this?

Owner: dskiba@chromium.org
Status: Assigned (was: Untriaged)
Discussed offline. Our plan is to do the following two things.

a) Integrate the task categorization with the native-stack profiler. This will give us mostly precise results.

b) Experiment with fully unwinding frame pointers of third-party libraries.

a) is blocked by tracing v2. b) is not blocked. Dimitry is planning to try b).

Comment 4 by dskiba@chromium.org, Aug 22 2016

Status: Started (was: Assigned)
Got it working on Linux, will submit CL soon.

But on Android it's harder, because on the main thread we're always called from JNI, and we always try to scan the stack, and we never succeed, only find some garbage like two last frames in the following stack:

...
08-22 12:40:13.418 10670 10670 E chromium: #16 0xd573ee6b /data/app/com.google.android.apps.chrome-3/lib/arm/libchrome.so+0x0051ce6b
08-22 12:40:13.418 10670 10670 E chromium: #17 0xd55063e3 /data/app/com.google.android.apps.chrome-3/lib/arm/libchrome.so+0x002e43e3
08-22 12:40:13.418 10670 10670 E chromium: #18 0xe101303e /data/app/com.google.android.apps.chrome-3/oat/arm/base.odex+0x00aa503e
08-22 12:40:13.418 10670 10670 E chromium: #19 0xebad806f /system/usr/icu/icudt55l.dat+0x0001906f
08-22 12:40:13.418 10670 10670 E chromium: #20 0xebad8073 /system/usr/icu/icudt55l.dat+0x00019073

So, should I try to make it work on Android? I have some ideas, but there will be runtime cost...
Cc: tasak@google.com
tasak@: Would you re-try the native stack profiler once dskiba@ lands the CL for Linux?

Comment 6 by dskiba@chromium.org, Aug 24 2016

Actually, I would appreciate it if you could check it before it lands.

The CL is here: https://codereview.chromium.org/2276813002, but you need to apply parent CL first, so:

git cl patch 2203053003
git cl patch 2276813002

tasak@: Please check it today so that dskiba@ can land it.


Comment 8 by tasak@google.com, Aug 25 2016

Sorry. I'm still working in progress.

Comment 9 by tasak@google.com, Aug 29 2016

So would you give me an advice about how to see this CL?

I did:
- I prepared 2 chrome (enable_profiling=true), i.e. one was trunk and the other was patched trunk.
- For each chrome, I captured trace.json by using chrome://tracing, and ran symbolize_trace to solve pc:address.
- I reopened 2 trace.json (by using chrome://tracing), and compared.

So ... both trace.json have [unknown] and have stacktraces containing  </lib/x86_64-linux-gpu/libglib-2.0.so.0.4002.0>.

I think, probably I made some mistakes...

So the CL that is being reviewed scans 512 bytes of stack to find frames once we stumble on a bad frame. Here are allocations from CrBrowserMain from three different unwinding modes:

No Scanning (i.e. without the CL):

88.5 KiB   <[stack]>
36.3 KiB   <unnamed>
1.4 KiB    cairo_debug_reset_static_data
1.5 KiB    _XUpdateAtomCache
74.6 KiB   <unresolved>
2.2 KiB    char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag)
2.2 KiB    ShimCppNew
3.4 KiB    ValidateChunkAMD64
4.0 KiB    gpu::GpuControlList::LoadList(std::string const&, gpu::GpuControlList::OsFilter)
742.6 KiB  g_main_context_dispatch
4.4 KiB    PrefRegistrySimple::RegisterBooleanPref(std::string const&, bool)
5.6 KiB    dlopen_doit
6.0 KiB    ft_mem_qrealloc
6.5 KiB    __GI__dl_allocate_tls
7.2 KiB    pa_xmalloc0
7.9 KiB    _XEnq
19.4 KiB   ft_mem_qalloc
19.7 KiB   _dl_new_object
28.5 KiB   </lib/x86_64-linux-gnu/libglib-2.0.so.0.4002.0>
32.2 KiB   XcursorImageCreate
40.5 KiB   g_malloc
43.7 KiB   g_realloc
60.3 KiB   g_malloc0
354.9 KiB  std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)
3.4 MiB    __libc_start_main
132.0 KiB  <other>

Scanning 512 bytes:

4.5 KiB    <unresolved>
2.0 KiB    g_object_new
7.9 KiB    <unnamed>
4.1 KiB    ft_mem_qrealloc
11.4 KiB   ft_mem_qalloc
14.0 KiB   g_malloc
19.7 KiB   _dl_new_object
30.3 KiB   g_realloc
51.2 KiB   g_malloc0
121.0 KiB  </lib/x86_64-linux-gnu/libglib-2.0.so.0.4002.0>
4.7 MiB    __libc_start_main
4.4 KiB    <other>

Scanning 8192 bytes:

3.1 KiB    ft_mem_qrealloc
5.1 MiB    __libc_start_main

So while scanning 512 bytes clearly improves situation, it looks like scanning 8 KiB gives better results. However, there is probably performance penalty to that, I'll measure it and come back here.
Wait I'm just confused about one thing: why scanning more (8k instead of 512 bytes) gives you *less* frames?

SO far the 512bytes output seems the most realistic, but at this point I'm curious why increasing to 8k makes such a big difference.
Ah, I forgot to mention that those are top-level entries for CrBrowserMain. So ideally we should see a single entry there, __libc_start_main, but if we fail to unwind to that frame and stop somewhere else, that frame ends in the top-level.

512 vs 8192 means that 3rd party libraries that we call (and which then call us back) are sometimes using more than 512 bytes of stack.
Ahh I see, all that stuff happened in 3rd party libraries then. I though it was chrome frames.
OK, I timed TraceStackFramePointers() by adding a histogram to AllocationContextTracker::GetContextSnapshot(). Increasing scan area to 8192 didn't affect timing at all, so I'm bumping it. Overall, scanning costs us ~100 nanoseconds (+25%) per call.

No scanning:
Histogram: A-TraceStackFramePointers(ns) recorded 2158421 samples, average = 394.5
0      ... 
114    O                                                                         (1 = 0.0%) {0.0%}
135    -------------------------------------------------------------O            (252458 = 11.7%) {0.0%}
160    ------------------------------------------------------O                   (223723 = 10.4%) {11.7%}
190    -----------------------------------------------------------------O        (270543 = 12.5%) {22.1%}
226    -----------------------------------------O                                (170961 = 7.9%) {34.6%}
268    --------------------------------O                                         (134364 = 6.2%) {42.5%}
318    ------------------------------O                                           (125542 = 5.8%) {48.7%}
378    ---------------------------------------------------O                      (213111 = 9.9%) {54.6%}
449    ------------------------------------------------O                         (198925 = 9.2%) {64.4%}
533    ------------------------------------------------------------------------O (299522 = 13.9%) {73.6%}
633    ----------------------------------------O                                 (168217 = 7.8%) {87.5%}
752    -----------O                                                              (45935 = 2.1%) {95.3%}
894    -------O                                                                  (29470 = 1.4%) {97.4%}
1062   ---O                                                                      (10969 = 0.5%) {98.8%}
1262   -O                                                                        (5809 = 0.3%) {99.3%}
1500   -O                                                                        (4622 = 0.2%) {99.6%}
1782   -O                                                                        (2382 = 0.1%) {99.8%}
2117   O                                                                         (1014 = 0.0%) {99.9%}
2516   O                                                                         (180 = 0.0%) {100.0%}
2990   O                                                                         (57 = 0.0%) {100.0%}
3553   O                                                                         (23 = 0.0%) {100.0%}
4222   O                                                                         (36 = 0.0%) {100.0%}
5017   O                                                                         (40 = 0.0%) {100.0%}
5961   O                                                                         (47 = 0.0%) {100.0%}
7083   O                                                                         (50 = 0.0%) {100.0%}
8416   O                                                                         (34 = 0.0%) {100.0%}
10000  O  

Scan area = 512:
Histogram: A-TraceStackFramePointers(ns) recorded 1515524 samples, average = 517.8
0      ... 
135    ----------------------------------------------------O                     (117984 = 7.8%) {0.0%}
160    --------------------------------------O                                   (85123 = 5.6%) {7.8%}
190    ---------------------------------------------------O                      (114894 = 7.6%) {13.4%}
226    ---------------------------------------------------------O                (128872 = 8.5%) {21.0%}
268    ----------------------------------------------O                           (104038 = 6.9%) {29.5%}
318    ------------------------------------------------O                         (108763 = 7.2%) {36.4%}
378    ---------------------------------------------------------------------O    (156053 = 10.3%) {43.5%}
449    --------------------------------------------------------O                 (125668 = 8.3%) {53.8%}
533    ------------------------------------------------------------------------O (163028 = 10.8%) {62.1%}
633    ---------------------------------------O                                  (87784 = 5.8%) {72.9%}
752    ------------------------------------------------O                         (107630 = 7.1%) {78.7%}
894    --------------------------------------------------------O                 (127631 = 8.4%) {85.8%}
1062   --------------------O                                                     (44649 = 2.9%) {94.2%}
1262   --------O                                                                 (19123 = 1.3%) {97.1%}
1500   ------O                                                                   (14293 = 0.9%) {98.4%}
1782   --O                                                                       (5573 = 0.4%) {99.3%}
2117   -O                                                                        (2067 = 0.1%) {99.7%}
2516   -O                                                                        (1364 = 0.1%) {99.8%}
2990   O                                                                         (391 = 0.0%) {99.9%}
3553   O                                                                         (116 = 0.0%) {100.0%}
4222   O                                                                         (46 = 0.0%) {100.0%}
5017   O                                                                         (21 = 0.0%) {100.0%}
5961   O                                                                         (33 = 0.0%) {100.0%}
7083   O                                                                         (40 = 0.0%) {100.0%}
8416   O                                                                         (23 = 0.0%) {100.0%}
10000  O  

Scan area = 8192:
Histogram: A-TraceStackFramePointers(ns) recorded 1960225 samples, average = 501.2
0      ... 
135    -----------------------------------------------O                          (129803 = 6.6%) {0.0%}
160    ------------------------------------------------------------------------O (200897 = 10.2%) {6.6%}
190    ----------------------------------------------------------------------O   (194457 = 9.9%) {16.9%}
226    -----------------------------------------------O                          (131631 = 6.7%) {26.8%}
268    ---------------------------------------------O                            (125332 = 6.4%) {33.5%}
318    -------------------------------------------------O                        (135780 = 6.9%) {39.9%}
378    -------------------------------------------------------------------O      (187126 = 9.5%) {46.8%}
449    -----------------------------------------------------------O              (165970 = 8.5%) {56.4%}
533    ---------------------------------------------------------------------O    (191590 = 9.8%) {64.8%}
633    ----------------------------------------O                                 (110963 = 5.7%) {74.6%}
752    -----------------------------------------O                                (114912 = 5.9%) {80.3%}
894    --------------------------------------------------------O                 (156789 = 8.0%) {86.1%}
1062   ----------------------O                                                   (60539 = 3.1%) {94.1%}
1262   --------O                                                                 (23371 = 1.2%) {97.2%}
1500   ------O                                                                   (16938 = 0.9%) {98.4%}
1782   ---O                                                                      (7795 = 0.4%) {99.3%}
2117   -O                                                                        (2843 = 0.1%) {99.7%}
2516   -O                                                                        (1744 = 0.1%) {99.8%}
2990   O                                                                         (695 = 0.0%) {99.9%}
3553   O                                                                         (285 = 0.0%) {99.9%}
4222   O                                                                         (75 = 0.0%) {100.0%}
5017   O                                                                         (65 = 0.0%) {100.0%}
5961   O                                                                         (61 = 0.0%) {100.0%}
7083   O                                                                         (64 = 0.0%) {100.0%}
8416   O                                                                         (44 = 0.0%) {100.0%}
10000  O  

Comment 15 by tasak@google.com, Sep 8 2016

I built the patched chrome kMaxTraces=162, and ran telemetry with enable-heap-profiling=native.
It ran much faster than chrome using backtrace() for stacktrace.


awesome, thanks for the thorough data Dmitry. +1 for the bump
Project Member

Comment 17 by bugdroid1@chromium.org, Oct 25 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a8c951effbf899ae4951c67d606a634905f88ab4

commit a8c951effbf899ae4951c67d606a634905f88ab4
Author: dskiba <dskiba@google.com>
Date: Tue Oct 25 23:39:11 2016

Link stack frames of JNI stubs to JNI callbacks.

Chrome's native heap profiler relies on stack frame pointers to unwind. That
is fast, but requires that all code in a call chain is built with frame pointer
support. Which is not the case for Java / JNI code on Android, so unwinding
stops prematurely once it crosses JNI boundary.

This CL exploits the fact that both JNI stubs (which call into Java) and JNI
callbacks (which are called by Java) are generated. It changes JNI generator
to save stack frame pointer in a JNI stub, and later link saved frame pointer
to a frame pointer of a JNI callback. That repairs broken stack frame chain
and allows unwinder to yield complete trace.

Changes in this CL are active only for profiling builds (ones that build with
enable_profiling=true). Otherwise JNI generator produces exactly the same code
as before (modulo removed empty lines).

Example of a complete trace made possible by this CL:

<.../base.odex>
Java_org_chromium_base_SystemMessageHandler_nativeDoRunLoopOnce
base::MessageLoop::DoWork
base::MessageLoop::DeferOrRunPendingTask
base::MessageLoop::RunTask
base::debug::TaskAnnotator::RunTask
cc::SingleThreadProxy::BeginMainFrame
cc::SingleThreadProxy::DoBeginMainFrame
content::CompositorImpl::UpdateLayerTreeHost..............................#4
<.../base.odex>
Java_org_chromium_chrome_..._ToolbarSceneLayer_nativeUpdateToolbarLayer...#3
android::ToolbarSceneLayer::UpdateToolbarLayer
android::ToolbarLayer::PushResource
ui::ResourceManagerImpl::GetResource......................................#2
<.../base.odex>
Java_org_chromium_ui_resources_ResourceManager_nativeOnResourceReady......#1
ui::ResourceManagerImpl::OnResourceReady
gfx::CreateSkBitmapFromJavaBitmap
...more frames from Chrome...

Previously unwinding would've stopped at #1. Code added by JNI generator links
frames #1->#2 and #3->#4.

BUG= 624362 

Review-Url: https://codereview.chromium.org/2361353002
Cr-Commit-Position: refs/heads/master@{#427534}

[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/android/jni_android.cc
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/android/jni_android.h
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/android/jni_generator/golden_sample_for_tests_jni.h
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/android/jni_generator/jni_generator.py
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/android/jni_generator/jni_generator_tests.py
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/android/jni_generator/testCalledByNatives.golden
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/android/jni_generator/testConstantsFromJavaP.golden
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/android/jni_generator/testFromJavaP.golden
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/android/jni_generator/testFromJavaPGenerics.golden
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/android/jni_generator/testMultipleJNIAdditionalImport.golden
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/android/jni_generator/testNativeExportsOnlyOption.golden
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/android/jni_generator/testSingleJNIAdditionalImport.golden
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/debug/stack_trace.cc
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/base/debug/stack_trace.h
[modify] https://crrev.com/a8c951effbf899ae4951c67d606a634905f88ab4/build/config/android/rules.gni

triaging pass: I think this has been fixed after all the changes to the stack scanner right?
Blockedon: 617730
There is one case still not fixed: stack scanning on Linux on non-main threads. This is  because we still don't have a good way to get stack size / address for those threads, see  issue 617730 .
Components: Internals>Instrumentation>Memory
Blocking: 698266
Status: WontFix (was: Started)
Chrome switched to memlog for native heap profiling, and memlog has sampling mode which unwinds way less, making perf acceptable even with CFI unwinding (not to mention that on Android we have optimized CFI).

Sign in to add a comment