Refactor third_party/android_platform/development/scripts/stack |
|||||||||
Issue descriptionRefactor third_party/android_platform/development/scripts/stack. For example the script should have a flag that when set as true, will print out logs that have not been used for symbolization. Currently the script will just ignore these information and extract only the 'useful' log for symbolization.
,
Jul 18
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/591e5064254750adbe421d3d1395ad47099833e5 commit 591e5064254750adbe421d3d1395ad47099833e5 Author: David 'Digit' Turner <digit@google.com> Date: Wed Jul 18 20:38:15 2018 android: apk_native_libs.py: New Python helper functions. This CL introduces a new Python script providing various functions to make it easier to parse Android APK files for symbolization, i.e.: - ApkReader: convenience class to read the content of an APK, especially information related to uncompressed native shared libraries. Mostly useful because it can be easily mocked for unit-testing purpose. - ApkNativeLibrariesMap: convenience class to build a map of native shared libraries contained in an APK. - ApkLibraryPathTranslator: convenience class used to translate on-device APK file paths and offsets (as found in stack traces or tombstones) into a virtual device library path + offset. + Add unit-tests for all features. + Add a small apk_lib_dump.py script used to dump the list of uncompressed native libraries inside an APK and their file start/end offsets and sizes. This is mostly useful for debugging symbolization issues. NOTE: This is a partial rewrite of the corresponding code from the following patch, which shows how this will be used in the future to better symbolize tombstones: https://chromium-review.googlesource.com/c/chromium/src/+/1034054 BUG=755225 R=agrieve@chromium.org, jduborick@chromium.org, pasko@chromium.org, lizeb@chromium.org Change-Id: If71e0048fa88c859e5f256d8c960134b6b88d06f Reviewed-on: https://chromium-review.googlesource.com/1047211 Commit-Queue: David Turner <digit@chromium.org> Reviewed-by: Egor Pasko <pasko@chromium.org> Reviewed-by: agrieve <agrieve@chromium.org> Cr-Commit-Position: refs/heads/master@{#576193} [modify] https://crrev.com/591e5064254750adbe421d3d1395ad47099833e5/build/android/PRESUBMIT.py [add] https://crrev.com/591e5064254750adbe421d3d1395ad47099833e5/build/android/pylib/symbols/apk_lib_dump.py [add] https://crrev.com/591e5064254750adbe421d3d1395ad47099833e5/build/android/pylib/symbols/apk_native_libs.py [add] https://crrev.com/591e5064254750adbe421d3d1395ad47099833e5/build/android/pylib/symbols/apk_native_libs_unittest.py
,
Jul 24
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a6c94dee3a6c28d629639302a72d7d4971d102ce commit a6c94dee3a6c28d629639302a72d7d4971d102ce Author: David 'Digit' Turner <digit@google.com> Date: Tue Jul 24 16:34:06 2018 android: Add pylib/symbols/symbol_utils.py This CL introduces a new Python script providing various functions to deal with symbolization of crash back traces, stack traces, and memory map sections, as they appear in logcat and tombstone files on Android. They also rely on the classes introduced by the previous CL at [1] - HostLibraryFind() is used to locate unstripped version of native libraries on the host, based on their device path as it appears in tombstones / logcat. - SymbolResolver, and its derived classes, are used to turn a library (path, offset) into a symbol info string. - MemoryMap is used to model the memory map as it appears in a tombstone file, which can be used to symbolize random addresses that appear in the stack. - BackTraceTranslator is used to parse and symbolize back traces as they appear in logcat and tombstones. - StackTranslator is used to parse and symbolize stacks as they appear in tombstones. BUG=755225 R=agrieve@chromium.org, jbudorick@chromium.org, pasko@chromium.org, lizeb@chromium.org [1] https://chromium-review.googlesource.com/c/chromium/src/+/1047211 Change-Id: Ie6d5e8ad395c8953fc6c88948a761ecd2fda8092 Reviewed-on: https://chromium-review.googlesource.com/1138323 Commit-Queue: David Turner <digit@chromium.org> Reviewed-by: agrieve <agrieve@chromium.org> Cr-Commit-Position: refs/heads/master@{#577575} [modify] https://crrev.com/a6c94dee3a6c28d629639302a72d7d4971d102ce/build/android/PRESUBMIT.py [add] https://crrev.com/a6c94dee3a6c28d629639302a72d7d4971d102ce/build/android/pylib/symbols/symbol_utils.py [add] https://crrev.com/a6c94dee3a6c28d629639302a72d7d4971d102ce/build/android/pylib/symbols/symbol_utils_unittest.py
,
Sep 20
Assigning to digit - clearly working on this. Note: there's an existing script here that hzl@ wrote at some point that ultimately addresses this bug: https://cs.chromium.org/chromium/src/build/android/pylib/android/logcat_symbolizer.py I think digit's work has more been to improve quality of stacks (e.g. they generally don't work for monochrome)
,
Sep 20
@digit: It looks like monochrome stack traces don't decode properly. However, it doesn't seem like the script is at fault. I induced the same crash in ChromePublic.apk, and MonochromePublic.apk. It looks like there may be two issues: First, it looks like the script can't properly decode symbols. A dump like this: 09-20 15:29:52.042 3198 3198 F DEBUG : Abort message: '[FATAL:autocomplete_controller_android.cc(146)] Check failed: false. 09-20 15:29:52.042 3198 3198 F DEBUG : #00 0xc9b2faab /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk+0x03bddaab 09-20 15:29:52.042 3198 3198 F DEBUG : #01 0xcb96a603 /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk+0x05a18603 09-20 15:29:52.042 3198 3198 F DEBUG : #02 0xcb96ab29 /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk+0x05a18b29 09-20 15:29:52.042 3198 3198 F DEBUG : #03 0xcb76d339 /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk+0x0581b339 Generates script output like this, where it can't seem to identify the library to look into: DEBUG:root:Found trace line: 09-20 15:29:52.042 3198 3198 F DEBUG : #00 0xc9b2faab /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk+0x03bddaab DEBUG:root:Identified lib: /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk DEBUG:root:TranslateLibPath: lib=/data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk library_name=base.apk DEBUG:root:GetCandidates: prefiltered candidates = [u'/usr/local/google/work/chromium/src/out/Debug/lib.unstripped/base.apk', u'/usr/local/google/work/chromium/src/out/Debug/./base.apk'] DEBUG:root:TranslateLibPath: candidate_libraries=[] But, even if that worked, it seems that the Android log itself isn't outputting proper backtraces for Monochrome. I induced the same crash in Chrome and Monochrome, and got both a complete stack for Chrome, and a 2-line bogus stack for Monochrome (the Chrome stack does decode properly as well). Chrome log dump: 09-20 15:23:54.510 2816 2816 F DEBUG : backtrace: 09-20 15:23:54.510 2816 2816 F DEBUG : #00 pc 0001a772 /system/lib/libc.so (abort+63) 09-20 15:23:54.510 2816 2816 F DEBUG : #01 pc 02b21bd7 /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000) 09-20 15:23:54.510 2816 2816 F DEBUG : #02 pc 02acd123 /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000) 09-20 15:23:54.510 2816 2816 F DEBUG : #03 pc 02a4754d /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000) 09-20 15:23:54.510 2816 2816 F DEBUG : #04 pc 02a47a73 /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000) 09-20 15:23:54.510 2816 2816 F DEBUG : #05 pc 03b738c1 /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000) 09-20 15:23:54.510 2816 2816 F DEBUG : #06 pc 0396087d /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000) 09-20 15:23:54.511 2816 2816 F DEBUG : #07 pc 02a479c9 /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000) 09-20 15:23:54.511 2816 2816 F DEBUG : #08 pc 02a46911 /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/lib/arm/libchrome.so (offset 0xee7000) 09-20 15:23:54.511 2816 2816 F DEBUG : #09 pc 000809ad /data/app/org.chromium.chrome-X-yRPzkcdVpnrHo2IG-LWw==/oat/arm/base.odex (offset 0x58000) Same crash in Monochrome: 09-20 15:29:52.043 3198 3198 F DEBUG : backtrace: 09-20 15:29:52.043 3198 3198 F DEBUG : #00 pc 0001a772 /system/lib/libc.so (abort+63) 09-20 15:29:52.043 3198 3198 F DEBUG : #01 pc 01a89e87 /data/app/org.chromium.chrome-Y56YdyZkZ-edGnUBOySeJg==/base.apk (offset 0x21a9000) Ie, the script wouldn't have anything to go on in the Monochrome case. Are either of these issues understood, and/or considered part of the work planned? I can file separate bugs if these are considered separate issues.
,
Sep 21
Re-assigning to Matt, who is going to work on this for the Chrome "Make it better" week. Yeah!
,
Oct 3
Status update: crrev.com/c/1249102 is out for review. This appears to correctly symbolize for monochrome, as well as chrome on M. However, the backtrace is usually very shallow (only containing one frame from chrome). Any addresses in the stack dump are (usually) correctly symbolized, so one can often piece together a plausible stack trace manually. Getting substantially better backtraces from tombstones is probably not possible but I'm happy to dig further into particular use cases to see if we can do better.
,
Oct 10
I tried out digit's script, using that new script and a tombstone manually pulled from a Monochrome-loaded device. As you said Matt, the stack has few frames (in this case, 2 frames), and they do decode properly. Excellent. Matt or digit@, I guess the next question is, why do tombstones contain abbreviated stacks, whereas pre-Monochrome yields a complete stack (or something much closer)? Is that a known problem?
,
Oct 10
For monochrome, we don't use the link register, so absent stack unwinding information, android's tombstoned can't figure out what the stack is. I would have thought that would be true pre-monochrome as well, but if there is more stack I guess it's not the case. Maybe digit@ knows. But, to answer your question, I don't think there is much to do post-monochrome about getting a deeper stack. The output of the new script seems to do an okay job of desymbolizing things in the stack part of the output (what appears after the backtrace). By manually inspecting the symbols one can sometimes get an a idea of a couple more stack frames. But, unfortunately, tombstones don't dump enough of the stack to get much deeper.
,
Oct 10
There's a change underway right now to switch from breakpad -> crashpad, which might affect how this works. At least for breakpad, so long as the signal handler is registered before the crash, it should be capturing a stack dumps, which can then be unwound by minidump_stackwalk tool. Might be worth looking into why this isn't happening, or just wait for the crashpad switch-over to happen and see if anything's different.
,
Oct 10
+jperaza might know whether crashpad affects this.
,
Oct 10
Crashpad shouldn't generally affect how this works, but: 1. Is there a reason we need to do this in addition to Crashpad? Breakpad only produced minidumps under certain conditions: user consent or an extra flag was required. Crashpad always produces minidumps (with user consent only affecting minidump upload). As agrieve mentioned, minidump_stackwalk can be used on those minidumps (from either Breakpad or Crashpad) to produce stack traces. Crashpad+minidump_stackwalk shouldn't have a problem not using the link register on monochrome, since it can use debug information to aid unwinding. 2. Breakpad restored the signal handler which triggers debuggerd tombstones. Currently, Crashpad restores old signal handlers for the browser (needed for WebView), but not child processes. This is easy to do if we want to continue generating tombstones. We do have a preference to not restore old crash signal handlers without reason since we can't necessarily trust them to e.g. actually crash the process.
,
Oct 16
I don't know cjgrant@'s use case. I agree with jperaza@, though: we can't do much without debugging information, and even if we did the herculean task of parsing debug unwind tables and applying to the tombstone, I think the amount of stack in a tombstone is too small to have anything useful. So unless there's a use case where only tombstones exist, we should just delete all of the tombstone-specific scripts and make the crashpad/breakpad cases easier to use.
,
Nov 13
Since there's been a lack of comments/interest, I think it will be best to just move to breakpad/crashpad and deprecate all of the tombstone-specific, for all the reasons mentioned above. Whether or not the tombstone scripts should just be removed is another question. On the one hand, they are occasionally useful. On the other hand, they are a maintenance burden, and often a false path for people to stumble upon. I guess I'd lean towards removing them, but I won't do anything until I hear other opinions.
,
Nov 19
sgtm to first pursue a crashpad/breakpad-based approach. It might make sense to first wait for the crashpad transition to happen (seems very close), and then look at ensuring that crash stacks show up in logcats (or in some other form), such that our tools (test_runner & wrapper script logcat wrappers) can automatically process them.
,
Nov 20
Make sense, thanks Andrew.
,
Dec 21
,
Jan 2
I think it *is* valuable to have the android debuggerd output actually be meaningful, even if we spend little/no effort maintaining the actual symbolisation script. I had a look at this problem while helping someone with another issue and the problem with tombstones printed from load-from-APK use cases is a regression (this used to work), and the specific thing that's wrong is that debuggerd is calculating the wrong base address for the library: it's just subtracting the base address of the mapping from the absolute addresses, which leaves the resulting address off by exactly the file's offset within the APK. This causes garbage stacks because it can't look up the right unwind info as a result. debuggerd is explicitly supposed to be able to handle this (the offset printed in the unwind output is supposed to be *that* offset, but it's not, it's some other number), and if it was broken in general then android hopefully would've noticed now that ~all preinstalled applications load their .so files from their APK? I wonder if this regressed when we switched to ldd? So, yeah, I think it's worth figuring out why debuggerd fails to do the right thing and taking some action to fix it, regardless of whether we maintain the forked stack script or not, because 1) debuggerd can actually unwind accurately through system libraries in most cases, which means it can produce better stacks than breakpad/crashpad when non-chromium code is involved and 2) if the bug is actually in debuggerd it might affect someone else too :)
,
Jan 3
lld looks like the cause. I forced compiler.gni's use_lld variable to false, and my stack traces become sane (no missing frames, and existing scripts decode addresses properly). Funny, the CL to move to lld states "This CL ... is a likely culprit for any issues involving crash dumping or symbolization on Android." https://chromium-review.googlesource.com/872225
,
Jan 3
I wonder if the platform had some kind of special handling for the old separate relocation packer behaviour (where the binary would end up with a nonzero load bias) and this is now counterproductive when using lld which handles the relocation packing itself and doesn't touch the load bias? Forcing relocation packing off in https://cs.chromium.org/chromium/src/chrome/android/chrome_common_shared_library.gni?rcl=0bc7090c097058780157f186cee11b060743ceb1&l=83 may work around it if that's the case (and would narrow down the problem a bit). Just disabling use_lld is *also* turning off relocation packing, because the old non-lld based version has been removed afaik, so you're conflating two things there :|
,
Jan 7
Note that the behavior of debuggerd changes between M and O. In local testing, I got different behavior with how the library base address was incorporated into the stack traces depending on which device I was using (all with the default value of use_lld). It's straightforward to heuristically determine what debuggerd did (see the above-mentioned crrev.com/c/1249102), but it sounds like from the rest of the discussion on this bug that it's not worth maintaining, especially since it would require figuring out the differences in debuggerd behavior across multiple android versions. I can get that CL into submitable shape if there's interest, but it still doesn't sound like it's worth the ongoing maintenance burden.
,
Jan 7
FWIW I've just re-uploaded a rebased version of the CL.
,
Jan 7
I've filed issue 919499 to explicitly track broken debuggerd unwinding. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by bugdroid1@chromium.org
, May 7 2018