Improve android breakpad minidumps analysis, include executable module flag.
Reported by
my...@yandex-team.ru,
May 30 2016
|
|||
Issue descriptionIssue: android minidump stack analysis can be inaccurate. Steps to reproduce: take a minidump and run minidump_stackwalk on it. Expect: some sane stack trace, that can help to investigate the crash. Observe: sometimes it looks like libdvm.so <addr> icudtl.dat <addr> <end of stack> While analyzing android crash minidumps, minidump_stackwalk falls back to stack scan (https://chromium.googlesource.com/breakpad/breakpad/+/master/src/processor/stackwalker_arm.cc#264) It seems to be quite a simple algorithm, though it can be inaccurate if the stack memory contains an address of non-executable module (like icu data blob, or any other mmap-ed file). Obviously, you cannot jump to a no-exec address. Also, since microdumps don't contain no-exec modules, this leads to micro/mini stackwalk inconsistency, when different tools produce different stacks on the same crash micro/mini dump. My proposal is to add executable info to minidumps (MDRawModule structure has some reserved fields we can use for backwards compatibility), and then exclude it from stack scan analysis (here: https://chromium.googlesource.com/breakpad/breakpad/+/master/src/processor/stackwalker.cc#279). I have a CL ready, please tell what you think of it, and I can roll it if the idea looks good to you.
,
Jun 6 2016
,
Jun 6 2016
> While analyzing android crash minidumps, minidump_stackwalk falls back to stack scan Can I ask you in which cases? Do you see that only when the top of the stack contains libs (or non-chrome) functions? If that happens when the top of the stack contains chrome frames that is really bad and means that something in the CFI unwinder is going wrong (or you don't have the right symbols with cfi) > It seems to be quite a simple algorithm, though it can be inaccurate Yes, I generally don't trust stack scanning % few cases in which it historically always worked (e.g. to unwind the abort -> pthread_kill -> raise sequence of CHECKs) > Also, since microdumps don't contain no-exec modules, this leads to micro/mini stackwalk inconsistency, I never checked the code, out of curiosity why is that? Does the stackscan need the non-exec modules for its logic? Is it to figure out stack boundaries? > My proposal is to add executable info to minidumps (MDRawModule structure has some reserved fields we can use for backwards compatibility), and then exclude it from stack scan analysis Very likely that will reduce the quality of minidumps. I agree that consistency would be good, but I seriously fear that this will destroy all our crash reports that end up in libc, which today relies on stack scanning on arm. Do you know which modules specifically are required by stackscan? Is it just a matter of having the [stack:] mmaps? If so I'd be much more in favor of adding those to microdumps.
,
Jun 9 2016
> Can I ask you in which cases? Do you see that only when the top of the stack contains libs (or non-chrome) functions? In my case, it was SIGABRT in libdvm.so (which we don't have symbols for), caused by use-after-free on jni local ref. Since non-exec modules are included in minidumps, stack scan encountered something that looked like an address in icudtl.dat, and placed it as a second step in the stack: 0 libdvm.so libdvm.so@0x45dd0 1 icudtl.dat icudtl.dat@0x597e26 <end> Apparently, it was not a genuine stack. In case of microdumps, where non-exec modules are excluded during its creation, stack scan was lucky enough to unroll it until my browser main library, giving a valid place where the bug was. > I never checked the code, out of curiosity why is that? Microdumps filter out non-exec modules on creation (https://goo.gl/hMPuEo), minidumps do not. So module ranges map on unwinding is different, stack scannig of microdumps selects only from exec modules (since its the only thing it has), while scanning of minidumps selects from various icu files, fonts, resource paks, etc, that were written to minidumps from /proc/pid/maps. > Does the stackscan need the non-exec modules for its logic It is exactly the problem; it does not, but uses them nevertheless. My idea is to forbid them as a valid next step of a stack. Here, https://goo.gl/zXIhtN, we can say that, if module is not executable - it is not a valid address. Since there's no info about module exec status, we will need to add it to minidumps. Also, since the main change is on stackwalk side, we can always revert the change, and rescan dumps as before. We have this change in production for a week, and didn't encounter any serious problems - CHECK and DVM abort stacks work fine. Can you say, which crash cases do I need to check as well for possible regression? I've found some strange thing, some stacks point on data@app@<appid>-1@base.apk@classes.dex, but this module is marked as non-executable on proc/maps. Is it some special case on Android, where mmaped non-exec code could actually be executed? Or is it some ART optimization? If the next stack frame was genuinely in such module, my idea of excluding it will fail. Or is it just a broken dump? Also, I have a ticket in breakpad tracker, (which is being ignored) https://bugs.chromium.org/p/google-breakpad/issues/detail?id=700 Do you have such a problem in chrome? ModernLinker we use for 6+ is the same as chrome's, proxying the call to dlopen_ext.
,
Jun 9 2017
Issue has not been modified or commented on in the last 365 days, please re-open or file a new bug if this is still an issue. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot |
|||
►
Sign in to add a comment |
|||
Comment 1 by rsgav...@chromium.org
, Jun 3 2016