New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 755225 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

Refactor third_party/android_platform/development/scripts/stack

Project Member Reported by hzl@chromium.org, Aug 14 2017

Issue description

Refactor third_party/android_platform/development/scripts/stack. For example the script should have a flag that when set as true, will print out logs that have not been used for symbolization. Currently the script will just ignore these information and extract only the 'useful' log for symbolization.
 
Project Member

Comment 1 by bugdroid1@chromium.org, May 7 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/503c69a42c3e6de5ddeb879b6107a74f485b683b

commit 503c69a42c3e6de5ddeb879b6107a74f485b683b
Author: David 'Digit' Turner <digit@google.com>
Date: Mon May 07 16:06:59 2018

android: elf_symbolizer.py: Add WaitForIdle() method.

This method of ELFSymbolizer can be used to wait for the
completion of all async tasks, then allow additional ones
to be performed after that.

This is unlike the Join(), which waits for completion then
terminates all addr2line processes immediately, making any
future call to SymbolizeAsync() crash at runtime.

BUG=755225
R=pasko@chromium.org, lizeb@chromium.org, agrieve@chromium.org,jbudorick@chromium.org

Change-Id: Iea50c5e06bf495390dc1bc50a81151e069841175
Reviewed-on: https://chromium-review.googlesource.com/1046846
Reviewed-by: agrieve <agrieve@chromium.org>
Commit-Queue: David Turner <digit@chromium.org>
Cr-Commit-Position: refs/heads/master@{#556459}
[modify] https://crrev.com/503c69a42c3e6de5ddeb879b6107a74f485b683b/build/android/pylib/symbols/elf_symbolizer.py
[modify] https://crrev.com/503c69a42c3e6de5ddeb879b6107a74f485b683b/build/android/pylib/symbols/elf_symbolizer_unittest.py

Project Member

Comment 2 by bugdroid1@chromium.org, Jul 18

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/591e5064254750adbe421d3d1395ad47099833e5

commit 591e5064254750adbe421d3d1395ad47099833e5
Author: David 'Digit' Turner <digit@google.com>
Date: Wed Jul 18 20:38:15 2018

android: apk_native_libs.py: New Python helper functions.

This CL introduces a new Python script providing various
functions to make it easier to parse Android APK files
for symbolization, i.e.:

- ApkReader: convenience class to read the content of an APK,
  especially information related to uncompressed native shared
  libraries. Mostly useful because it can be easily mocked
  for unit-testing purpose.

- ApkNativeLibrariesMap: convenience class to build a map
  of native shared libraries contained in an APK.

- ApkLibraryPathTranslator: convenience class used to translate
  on-device APK file paths and offsets (as found in stack traces
  or tombstones) into a virtual device library path + offset.

+ Add unit-tests for all features.

+ Add a small apk_lib_dump.py script used to dump the list of
  uncompressed native libraries inside an APK and their file
  start/end offsets and sizes. This is mostly useful for debugging
  symbolization issues.

NOTE: This is a partial rewrite of the corresponding code from
      the following patch, which shows how this will be used in
      the future to better symbolize tombstones:

    https://chromium-review.googlesource.com/c/chromium/src/+/1034054

BUG=755225
R=agrieve@chromium.org, jduborick@chromium.org, pasko@chromium.org, lizeb@chromium.org

Change-Id: If71e0048fa88c859e5f256d8c960134b6b88d06f
Reviewed-on: https://chromium-review.googlesource.com/1047211
Commit-Queue: David Turner <digit@chromium.org>
Reviewed-by: Egor Pasko <pasko@chromium.org>
Reviewed-by: agrieve <agrieve@chromium.org>
Cr-Commit-Position: refs/heads/master@{#576193}
[modify] https://crrev.com/591e5064254750adbe421d3d1395ad47099833e5/build/android/PRESUBMIT.py
[add] https://crrev.com/591e5064254750adbe421d3d1395ad47099833e5/build/android/pylib/symbols/apk_lib_dump.py
[add] https://crrev.com/591e5064254750adbe421d3d1395ad47099833e5/build/android/pylib/symbols/apk_native_libs.py
[add] https://crrev.com/591e5064254750adbe421d3d1395ad47099833e5/build/android/pylib/symbols/apk_native_libs_unittest.py

Project Member

Comment 3 by bugdroid1@chromium.org, Jul 24

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a6c94dee3a6c28d629639302a72d7d4971d102ce

commit a6c94dee3a6c28d629639302a72d7d4971d102ce
Author: David 'Digit' Turner <digit@google.com>
Date: Tue Jul 24 16:34:06 2018

android: Add pylib/symbols/symbol_utils.py

This CL introduces a new Python script providing various
functions to deal with symbolization of crash back traces,
stack traces, and memory map sections, as they appear in
logcat and tombstone files on Android.

They also rely on the classes introduced by the previous CL at [1]

- HostLibraryFind() is used to locate unstripped version of
  native libraries on the host, based on their device path
  as it appears in tombstones / logcat.

- SymbolResolver, and its derived classes, are used to turn
  a library (path, offset) into a symbol info string.

- MemoryMap is used to model the memory map as it appears
  in a tombstone file, which can be used to symbolize
  random addresses that appear in the stack.

- BackTraceTranslator is used to parse and symbolize
  back traces as they appear in logcat and tombstones.

- StackTranslator is used to parse and symbolize
  stacks as they appear in tombstones.

BUG=755225
R=agrieve@chromium.org, jbudorick@chromium.org, pasko@chromium.org, lizeb@chromium.org

[1] https://chromium-review.googlesource.com/c/chromium/src/+/1047211

Change-Id: Ie6d5e8ad395c8953fc6c88948a761ecd2fda8092
Reviewed-on: https://chromium-review.googlesource.com/1138323
Commit-Queue: David Turner <digit@chromium.org>
Reviewed-by: agrieve <agrieve@chromium.org>
Cr-Commit-Position: refs/heads/master@{#577575}
[modify] https://crrev.com/a6c94dee3a6c28d629639302a72d7d4971d102ce/build/android/PRESUBMIT.py
[add] https://crrev.com/a6c94dee3a6c28d629639302a72d7d4971d102ce/build/android/pylib/symbols/symbol_utils.py
[add] https://crrev.com/a6c94dee3a6c28d629639302a72d7d4971d102ce/build/android/pylib/symbols/symbol_utils_unittest.py

Cc: -hzl@chromium.org agrieve@chromium.org
Owner: digit@chromium.org
Status: Assigned (was: Untriaged)
Assigning to digit - clearly working on this.

Note: there's an existing script here that hzl@ wrote at some point that ultimately addresses this bug:
https://cs.chromium.org/chromium/src/build/android/pylib/android/logcat_symbolizer.py

I think digit's work has more been to improve quality of stacks (e.g. they generally don't work for monochrome)
Cc: lincolnfrog@chromium.org cjgrant@chromium.org
@digit:  It looks like monochrome stack traces don't decode properly.  However, it doesn't seem like the script is at fault.  I induced the same crash in ChromePublic.apk, and MonochromePublic.apk.

It looks like there may be two issues:

First, it looks like the script can't properly decode symbols.  A dump like this:  

09-20 15:29:52.042  3198  3198 F DEBUG   : Abort message: '[FATAL:autocomplete_controller_android.cc(146)] Check failed: false. 
09-20 15:29:52.042  3198  3198 F DEBUG   : #00 0xc9b2faab /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk+0x03bddaab
09-20 15:29:52.042  3198  3198 F DEBUG   : #01 0xcb96a603 /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk+0x05a18603
09-20 15:29:52.042  3198  3198 F DEBUG   : #02 0xcb96ab29 /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk+0x05a18b29
09-20 15:29:52.042  3198  3198 F DEBUG   : #03 0xcb76d339 /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk+0x0581b339

Generates script output like this, where it can't seem to identify the library to look into:

DEBUG:root:Found trace line: 09-20 15:29:52.042  3198  3198 F DEBUG   : #00 0xc9b2faab /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk+0x03bddaab
DEBUG:root:Identified lib: /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk
DEBUG:root:TranslateLibPath: lib=/data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk library_name=base.apk
DEBUG:root:GetCandidates: prefiltered candidates = [u'/usr/local/google/work/chromium/src/out/Debug/lib.unstripped/base.apk', u'/usr/local/google/work/chromium/src/out/Debug/./base.apk']
DEBUG:root:TranslateLibPath: candidate_libraries=[]

But, even if that worked, it seems that the Android log itself isn't outputting proper backtraces for Monochrome.  I induced the same crash in Chrome and Monochrome, and got both a complete stack for Chrome, and a 2-line bogus stack for Monochrome (the Chrome stack does decode properly as well).

Chrome log dump:

09-20 15:23:54.510  2816  2816 F DEBUG   : backtrace:
09-20 15:23:54.510  2816  2816 F DEBUG   :     #00 pc 0001a772  /system/lib/libc.so (abort+63)
09-20 15:23:54.510  2816  2816 F DEBUG   :     #01 pc 02b21bd7  /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000)
09-20 15:23:54.510  2816  2816 F DEBUG   :     #02 pc 02acd123  /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000)
09-20 15:23:54.510  2816  2816 F DEBUG   :     #03 pc 02a4754d  /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000)
09-20 15:23:54.510  2816  2816 F DEBUG   :     #04 pc 02a47a73  /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000)
09-20 15:23:54.510  2816  2816 F DEBUG   :     #05 pc 03b738c1  /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000)
09-20 15:23:54.510  2816  2816 F DEBUG   :     #06 pc 0396087d  /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000)
09-20 15:23:54.511  2816  2816 F DEBUG   :     #07 pc 02a479c9  /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000)
09-20 15:23:54.511  2816  2816 F DEBUG   :     #08 pc 02a46911  /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000)
09-20 15:23:54.511  2816  2816 F DEBUG   :     #09 pc 000809ad  /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/oat/arm/base.odex (offset 0x58000)

Same crash in Monochrome:

09-20 15:29:52.043  3198  3198 F DEBUG   : backtrace:
09-20 15:29:52.043  3198  3198 F DEBUG   :     #00 pc 0001a772  /system/lib/libc.so (abort+63)
09-20 15:29:52.043  3198  3198 F DEBUG   :     #01 pc 01a89e87  /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk (offset 0x21a9000)

Ie, the script wouldn't have anything to go on in the Monochrome case.

Are either of these issues understood, and/or considered part of the work planned?

I can file separate bugs if these are considered separate issues.



Cc: -lincolnfrog@chromium.org digit@chromium.org
Owner: mattcary@chromium.org
Re-assigning to Matt, who is going to work on this for the Chrome "Make it  better" week. Yeah!
Status update:

crrev.com/c/1249102 is out for review. This appears to correctly symbolize for monochrome, as well as chrome on M. However, the backtrace is usually very shallow (only containing one frame from chrome). Any addresses in the stack dump are (usually) correctly symbolized, so one can often piece together a plausible stack trace manually.

Getting substantially better backtraces from tombstones is probably not possible but I'm happy to dig further into particular use cases to see if we can do better.
I tried out digit's script, using that new script and a tombstone manually pulled from a Monochrome-loaded device.  As you said Matt, the stack has few frames (in this case, 2 frames), and they do decode properly.  Excellent.

Matt or digit@, I guess the next question is, why do tombstones contain abbreviated stacks, whereas pre-Monochrome yields a complete stack (or something much closer)?  Is that a known problem?


For monochrome, we don't use the link register, so absent stack unwinding information, android's tombstoned can't figure out what the stack is.

I would have thought that would be true pre-monochrome as well, but if there is more stack I guess it's not the case. Maybe digit@ knows.

But, to answer your question, I don't think there is much to do post-monochrome about getting a deeper stack.

The output of the new script seems to do an okay job of desymbolizing things in the stack part of the output (what appears after the backtrace). By manually inspecting the symbols one can sometimes get an a idea of a couple more stack frames. But, unfortunately, tombstones don't dump enough of the stack to get much deeper.

There's a change underway right now to switch from breakpad -> crashpad, which might affect how this works.

At least for breakpad, so long as the signal handler is registered before the crash, it should be capturing a stack dumps, which can then be unwound by minidump_stackwalk tool. Might be worth looking into why this isn't happening, or just wait for the crashpad switch-over to happen and see if anything's different.
Cc: jperaza@chromium.org
+jperaza might know whether crashpad affects this.
Crashpad shouldn't generally affect how this works, but:

1. Is there a reason we need to do this in addition to Crashpad? Breakpad only produced minidumps under certain conditions: user consent or an extra flag was required. Crashpad always produces minidumps (with user consent only affecting minidump upload). As agrieve mentioned, minidump_stackwalk can be used on those minidumps (from either Breakpad or Crashpad) to produce stack traces. Crashpad+minidump_stackwalk shouldn't have a problem not using the link register on monochrome, since it can use debug information to aid unwinding.

2. Breakpad restored the signal handler which triggers debuggerd tombstones. Currently, Crashpad restores old signal handlers for the browser (needed for WebView), but not child processes. This is easy to do if we want to continue generating tombstones. We do have a preference to not restore old crash signal handlers without reason since we can't necessarily trust them to e.g. actually crash the process.
Labels: OS-Android
I don't know cjgrant@'s use case. I agree with jperaza@, though: we can't do much without debugging information, and even if we did the herculean task of parsing debug unwind tables and applying to the tombstone, I think the amount of stack in a tombstone is too small to have anything useful.

So unless there's a use case where only tombstones exist, we should just delete all of the tombstone-specific scripts and make the crashpad/breakpad cases easier to use. 
Since there's been a lack of comments/interest, I think it will be best to just move to breakpad/crashpad and deprecate all of the tombstone-specific, for all the reasons mentioned above.

Whether or not the tombstone scripts should just be removed is another question. On the one hand, they are occasionally useful. On the other hand, they are a maintenance burden, and often a false path for people to stumble upon. I guess I'd lean towards removing them, but I won't do anything until I hear other opinions.
sgtm to first pursue a crashpad/breakpad-based approach. It might make sense to first wait for the crashpad transition to happen (seems very close), and then look at ensuring that crash stacks show up in logcats (or in some other form), such that our tools (test_runner & wrapper script logcat wrappers) can automatically process them.
Make sense, thanks Andrew.
Labels: DevX
Cc: torne@chromium.org
I think it *is* valuable to have the android debuggerd output actually be meaningful, even if we spend little/no effort maintaining the actual symbolisation script. I had a look at this problem while helping someone with another issue and the problem with tombstones printed from load-from-APK use cases is a regression (this used to work), and the specific thing that's wrong is that debuggerd is calculating the wrong base address for the library: it's just subtracting the base address of the mapping from the absolute addresses, which leaves the resulting address off by exactly the file's offset within the APK. This causes garbage stacks because it can't look up the right unwind info as a result.

debuggerd is explicitly supposed to be able to handle this (the offset printed in the unwind output is supposed to be *that* offset, but it's not, it's some other number), and if it was broken in general then android hopefully would've noticed now that ~all preinstalled applications load their .so files from their APK?

I wonder if this regressed when we switched to ldd?

So, yeah, I think it's worth figuring out why debuggerd fails to do the right thing and taking some action to fix it, regardless of whether we maintain the forked stack script or not, because 1) debuggerd can actually unwind accurately through system libraries in most cases, which means it can produce better stacks than breakpad/crashpad when non-chromium code is involved and 2) if the bug is actually in debuggerd it might affect someone else too :)
Cc: p...@chromium.org
Labels: -Pri-3 Pri-2
lld looks like the cause.  I forced compiler.gni's use_lld variable to false, and my stack traces become sane (no missing frames, and existing scripts decode addresses properly).

Funny, the CL to move to lld states "This CL ... is a likely culprit for any issues involving crash dumping or symbolization on Android."

https://chromium-review.googlesource.com/872225
I wonder if the platform had some kind of special handling for the old separate relocation packer behaviour (where the binary would end up with a nonzero load bias) and this is now counterproductive when using lld which handles the relocation packing itself and doesn't touch the load bias?

Forcing relocation packing off in https://cs.chromium.org/chromium/src/chrome/android/chrome_common_shared_library.gni?rcl=0bc7090c097058780157f186cee11b060743ceb1&l=83 may work around it if that's the case (and would narrow down the problem a bit). Just disabling use_lld is *also* turning off relocation packing, because the old non-lld based version has been removed afaik, so you're conflating two things there :|
Note that the behavior of debuggerd changes between M and O. In local testing, I got different behavior with how the library base address was incorporated into the stack traces depending on which device I was using (all with the default value of use_lld).

It's straightforward to heuristically determine what debuggerd did (see the above-mentioned crrev.com/c/1249102), but it sounds like from the rest of the discussion on this bug that it's not worth maintaining, especially since it would require figuring out the differences in debuggerd behavior across multiple android versions.

I can get that CL into submitable shape if there's interest, but it still doesn't sound like it's worth the ongoing maintenance burden.
FWIW I've just re-uploaded a rebased version of the CL.
I've filed  issue 919499  to explicitly track broken debuggerd unwinding.

Sign in to add a comment