New issue
Advanced search Search tips

Issue 744956 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner: ----
Closed: Sep 27
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android , Chrome
Pri: 1
Type: Bug



Sign in to add a comment

dump_syms or minidump_stackwalk confused by clang's DWARF4 debug data when targeting arm

Project Member Reported by boliu@chromium.org, Jul 17 2017

Issue description

I monitor a few release check failures on android related all related to gpu, and noticed that on canary, one particular signature is completely missing, which is too good to be true. Note this is *after*  issue 735027  is fixed.

The crash is crbug.com/680777 for this release check: https://cs.chromium.org/chromium/src/content/browser/renderer_host/compositor_impl_android.cc?rcl=a313c233fa036a1439c203b25d0377d1578dab75&l=714

It's usually among the top crashes in beta/stable, but moves around more in canary because there can be other crashes.

I looked around, and turns out the crash got this signature, which is pretty generic and useless:
https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Android%27%20AND%20product.version%3D%2761.0.3157.3%27%20AND%20custom_data.ChromeCrashProto.channel%3D%27canary%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27%5BAssert%5D%20logging%3A%3ALogMessage%3A%3A~LogMessage%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D

If you check the reports with logs, at least some of them has "[FATAL:compositor_impl_android.cc(715)] Timed out waiting for GPU channel.", corresponding to the release check.

But this is a check failure, it should have a mostly clean symbolized stack near the top at least, like in m60. I don't really know if it's clang related, but there was still a clean decoding up until the last build that switched to clang (61.0.3124.3).

Also I don't really know if this affects any other crashes, and I don't really know much about symbolization itself. So just reporting this here..
 
Cc: rsesek@chromium.org
Even before reading the bug, the cc-list gave me strong hints that this was another can of worms :P
I should setup better gmail labels

TL;DR
this seems due to the fact that LOG(FATAL) != CHECK(false) and goes into libc's abort().
Not sure how much this is related with the clang switch.

Part 1: I never realized that CHECK(false) !== LOG(FATAL) on official builds
----------------------------------------------------------------------------
I generally see a problem here, which seems unrelated with the clang switch.
In the past a combination of thakis, myself and other people fixed logging.h and made official CHECK be like this:
#define CHECK(condition) UNLIKELY(!(condition)) ? IMMEDIATE_CRASH() : EAT_STREAM_PARAMETERS

this guarantees that a failed CHECK is just a trapping assembly instruction within libchrome.so and doesn't jump to other libraries.
On Android jumping on other libraries causes wrong symbolizations for reasons thoroughly explained in Issue 480835.

I never realized until now that the same does NOT apply to LOG(FATAL).
Conceptually LOG(FATAL) is a CHECK(false).
In practice, however, LOG(FATAL) goes through all LogMessage logic that ultimately [1] ends up with the old BreakDebugger() which goes into [2] which in turn calls abort() which then jumps to libc.so. And when we don't have symbols for libc.so (we index only some popular Android images), boom, auf wiedersehen to all these crash reports.

Having identified (one of the many possible causes of this) problem, I am not 100% sure on what is the best way to fix it.
One thing I'd do is to replace that void DebugBreak()
from: if (!BeingDebugged()) abort();
to: if (!BeingDebugged()) IMMEDIATE_CRASH(); (which in turn expands into a direct asm instruction like ud2/bkpt etc).

This has a bunch of side effects:
- will completely change all the signatures for LOG(FATAL).
- the signature will likely point to ~LogMessage -> BreakDebugger, unless we blacklist them in the crash/ processor.

But perhaps they are already busted. do we care? can we have somebody from stability to comment here?
(amineer@,yfriendman@ I am looking at you!)

Actually by looking at the processor [3] logging.cc is already blacklisted, but base/debug/debugger.h is not. 
I think that should get blacklisted regardless of all this (amineer@ can you please follow up on this? We need something similar to b/36845478)


Part 2: Do we really need CHECK(false) !== LOG(FATAL) ?
-------------------------------------------------------
Meanwhile a part of me wonders: do we really intend to go through all the LogMessage logic for LOG(FATAL) in the first place?

If I wear my general chromium developer hat, to me CHECK(false) and LOG(FATAL) are interchangeable. I am not sure how many people expect a difference in the behavior of the two.
After looking this code, instead, I see that LOG(FATAL) attempts to do smarter things, such as copying the message onto the stack so that can be resumed from the minidump*. So, there seem to be an intention in the current code to do something different with LOG(FATAL).

Thinking more this could be related with the clang switch.
rsesesk@: do you know if crash/ tries to magically resume the |str_stack| in logging.cc and use that to build the magic signature?
If it does, it is possible that the switch to clang might be interfering with that part (maybe over-smarting the base::debug::alias logic)
boliu@: do you have any example of a LOG(FATAL) crash decoding in "a good way" from crash/? That would help to figure out what's going on here.

Part 3: (mostly a tangent): do we need all these logging macros?
----------------------------------------------------------------
On top of all this, the cynical and pragmatic part of me feels like most of this logging stuff is lot of unnecessary complexity. I honestly wished that we had only non-fatal logging for debugging and CHECK() without any stream for official builds. I am not sure how these log statement are useful at all. I think that eventually what could be more useful is a macro that says "if this condition is not met, make sure that these variables make their way into the crash report"


[1] https://cs.chromium.org/chromium/src/base/logging.cc?rcl=5566a0f40fd651ef274f373058a5d37e58a8bd92&l=784
[2] https://cs.chromium.org/chromium/src/base/debug/debugger_posix.cc?rcl=1c51b4bc568c20d4b2cee805181d343bfaca0169&l=226
[3] kBlacklistedModules in http://go/link_for_crbug_744956

Comment 2 by boliu@chromium.org, Jul 18 2017

> boliu@: do you have any example of a LOG(FATAL) crash decoding in "a good way" from crash/? That would help to figure out what's going on here.

Before clang, signatures for crbug.com/680777 would work.

Did not find any after switch to clang and after  issue 735027  is fixed. But before 735027 is fixed, I found the signature for crbug.com/680777 became BreakDebugger, eg crash/097ee38088000000. That one decoded cleanly (modulo  issue 735027 )
Okay interesting, some other finding.
As I was speculating in "part 2" there is some actual code in the crash service "magic_signature.cc" that pretty much reads as: 
if func == "logging::LogMessage::~LogMessage":
  return "[Assert]" + NextFrame.func
In other words the crash server knows that LOG(FATAL) causes a ~LogMessage and knows that that is the signal to look forward and mark it as an assertion.

The thing that is going wrong in cases like crash/dffc00ce78000000 is the fact that there is no next frame.
The more interesting thing is that it seems that the CFI unwinding is now failing. If you tick "Show frame trust levels" you can see that reached logging::LogMessage::~LogMessage() we didn't recover to CFI unwinding and continued stack scanning. 
Actually by glancing at 61 canary crashes, it looks that most of the crash reports* are falling back on stack scanning, which is quite bad. 

In the light of this, my previous write on CHECK(false) !== LOG(FATAL) has really a minor importance.
The problem seem to be affecting mostly arm and not arm64.

I tried resymbolizing locally crash/dffc00ce78000000 first using the breakpad symbols and then redownloading the unstripped .so from the archives and generating the breakpad symbols myself.
In the first case I get, unsurprisingly, the same result of crash/dffc00ce78000000, that is: full stack scanning.
While running that I get some interesting errors like:
---
2017-07-18 17:08:34: postfix_evaluator-inl.h:106: ERROR: Could not PopValues to get two values for binary operation +: r7 16 +
2017-07-18 17:08:34: postfix_evaluator-inl.h:106: ERROR: Could not PopValues to get two values for binary operation +: r7 8 +
2017-07-18 17:08:34: postfix_evaluator-inl.h:106: ERROR: Could not PopValues to get two values for binary operation +: r7 24 +
---

Then I tried downloading the symbols from here (https://paste.googleplex.com/5815498914660352). I tried both "arm" and arm-next but cannot get dump_syms to give me the same module GUID of breakpad (E7E3135D3AA39BD2BC40DC2AED49873F0).
I guess this is again because I don't understand anymore how many binaries we have out there and where they are archived.
At this point I gave up my debugging as I ran out of time for this bug.


[1] http://go/crash_link_for_744956
yfriedman@, I'm swamped, can you handle the AIs primiano@ linked to me above?  Let me know if not and we can chat.

Comment 5 by thakis@chromium.org, Jul 18 2017

I started a thread with crash-team about this.
Thanks Nico.
Unfortunately I've been swamped with stuff for M61 and about to be on vacation through the rest of the week so haven't had a chance to look :/
Labels: ReleaseBlock-Beta
Owner: thakis@chromium.org
Status: Assigned (was: Untriaged)
Hmm Nico, I think this could be due to https://codereview.chromium.org/2959083002 (which is to fix  Issue 735027 ).
I got to that by wondering "hmm what happened to that -gdwarf-3 flag?" and eventually ended up there. I  remember that the dwarf level was impacting breakpad's behavior when we introduced that.

I looked at some crash data, interestingly:
That CL first appeared in 61.0.3143.0
crash/d9e5e9c268000000 @ 61.0.3142.0 (immediately before) has valid CFI (good) and short namespaces (just "MaybeAppendNavigationThrottles") (not so good)
crash/f2aaaec488000000 @ 61.0.3143.0 (same rev of yours) has no CFI (bad) and long namespaces ("subresource_filter::ContentSubresourceFilterThrottleManager::MaybeAppendNavigationThrottles") (good)

If you look at go/crashes_for_61_0_3142_0 and go/crashes_for_61_0_3143_0 the diff is pretty evident.
Marking as release blocker, as this will mess up crash reports for the other channels (feel free to remove if you think think this is not critical)

thakis@: let's chat offline maybe. looks like that here: "(CFI and non-descriptive function names) vs (non-CFI and descriptive function names)". Dunno if we can -fdebug-info-for-profiling and -gdwarf-3 ?

amineer@/yfriedman@: I think that stability folks should seriously prioritize having some testing on this stuff. I lost the count of how many times crash reports got accidentally screwed up for a reason or another. We can't keep scraping git and crash logs every single time it happens.

Comment 8 by thakis@chromium.org, Jul 19 2017

Labels: -ReleaseBlock-Beta
I removed the -gdwarf-3 because:
- we use this setup on linux and things seem to be fine there
- -gdwarf-3 in clang is a lot less well-tested than just -g
- it looked like -gdwarf-3 worked around some gcc issue

Let's wait for crash team to take a look at what's going on.

I don't think this needs to block beta. Please add back if you disagree.
Cc: mark@chromium.org
Not sure I decode #8 correctly. Are you sure that you don't think that crrev.com/2959083002 (#482772) caused the issue? Or are you saying that it did but this is sorta wai (i.e. somebody should fix breakpad instead?)?

If the former, I just double-checked that CL and the one immediately before, doing this:
-----
- Get the apks for that CL and the one immediately before from:
  gs://chrome-test-builds/official-by-commit/Android Builder/full-build-linux_482771.zip
  gs://chrome-test-builds/official-by-commit/Android Builder/full-build-linux_482772.zip

- Get the unstripped symbols for those:
  gs://chrome-perf/Android Builder/full-build-linux_71fd419f15a9fb890a6d01262efc88cdba086661.zip
  gs://chrome-perf/Android Builder/full-build-linux_120b37effd521f42b59184bb0abc9d8cebcafc0d.zip

- Install both APKs and go to chrome://crash, that gave:
  crash/8eeacfc268000000 for #482771
  crash/c8d2f3f288000000 for #482772

- Download the minidumps, use dump_dums to transform libchrome.so from the full_build*.zip into a breakpad symbol.
- See how minidump_stackwalk reacts.

For #482771 (the one before your cl) minidump_stackwalk uses CFI
For #482772 (your CL) minidump_stackwalk uses stack scanning
(see attachments)
------


> Let's wait for crash team to take a look at what's going on.

When you say "crash team" you actually want "crash client" here. Conveniently the "wide" crash-client team sits in like 1-desk distance from you: +rsesesk (I think he's OOO) +mark ;-)


> I don't think this needs to block beta. Please add back if you disagree.

I don't have an opinion, leave it to stability sheriffs.
The only thing I want to make clear is that if nobody takes any action here the quality of crash reports is going to go dramatically down. And the 61 is like... today.

stackwalk_for_482771.txt
223 KB View Download
stackwalk_for_482772.txt
228 KB View Download
I was just explaining why I removed the flag, it's possible that my assumption "things are fine on Linux" is wrong. Thanks for the detailed repro steps (do we have a md file explaining this); I'll check if adding back gdwarf3 alone changes anything (on both Android and Linux).

I'm confused by the "crash client" team comment. We symbolize on the server, right?
> I was just explaining why I removed the flag, it's possible that my assumption 
Ah okay. Btw just read now the "it looked like -gdwarf-3 worked around some gcc issue"
I think that was working around breakpad issues. crrev.com/2285723002 says "Breakpad dump_syms cannot currently parse DWARF 4 binaries correctly"

> I'm confused by the "crash client" team comment. We symbolize on the server, right?
Hmm good point. I'll be honest, don't know better checking this with mark@. Yes, we symbolize on the server, but that relies on the breakpad symbols being correct in the first place. In essence I think this boils down to "who owns/maintains dump_syms", and I am not sure I know the answer.

> do we have a md file explaining this
https://www.chromium.org/developers/decoding-crash-dumps

The tricky part is finding the right symbols. That is a somewhat stochastic process. I ranted about that not too long ago a while ago in http://go/groups_link_for_crbug_744956, without too much success. My current workflow is: is try a bunch of known locations and give up at the 3rd attempt.

Comment 12 by mark@chromium.org, Jul 20 2017

If you remove -gdwarf-3, you get DWARF 4, right? Breakpad dump_syms should generally be able to handle that since sometime last year.

But I'm on my phone and I can't tell from any of the above (without clicking links that I don't have access to here) whether you're suspecting a stackwalking problem, a symbolization problem, a bucketing problem, or if you're not sure either.

Comment 13 by mark@chromium.org, Jul 20 2017

Labels: OS-Chrome
Also, are these Android builds produced by clang or gcc? And is it a current version? The GCC that should with the NDK is pretty old, if that's what we're using.
> If you remove -gdwarf-3, you get DWARF 4, right?
Hmm good question. Not sure, Nico? I have both .so(s) right here, is there a objdump/readelf command I can use to find out?

> Breakpad dump_syms should generally be able to handle that since sometime last year.
I tried a dump_syms built from ToT with OS=android.

> whether you're suspecting a stackwalking problem, a symbolization problem, a bucketing problem, or if you're not sure either.
Yeah sorry the evolution of this bug was a bit non-linear and I found problems gradually. Generally skip my comment #1, the relevant part is in #3.
Going back to your question, I think this is either a stackwalking or a symbol-transformation problem. Not bucketing.

> Also, are these Android builds produced by clang or gcc? 
Clang. It became the default in #478517 and the regression point (removing -gdwarf-3) is #482772.
Further data-point, this seems to affect only arm.
On arm64 the only effect of  crrev.com/2959083002 has been positive (getting back the full qualified function names with namespaces). But CFI still works on arm64 after that CL (See crash/c6927e1578000000 vs crash/b445a28a78000000).
So I think this is not just "breakpad cannot handle DWARF-4" rather some subtlety affecting the combination of DWARF-4 + arm (or maybe dwarf-4 here is not related at all, but just passing -gdwarf-3 causes the compiler to emit things that don't trigger some weird breakpad issue)
I didn't find any x64 android crashes after Nico's CL, but the CrOS one also seem to correctly use CFI.

Comment 16 by torne@chromium.org, Jul 20 2017

"readelf --debug-dump=info" will list "Version: 4" (or another number) right at the top of the output; that's the dwarf version.

Comment 17 by mark@chromium.org, Jul 20 2017

> > If you remove -gdwarf-3, you get DWARF 4, right?
> Hmm good question. Not sure, Nico? I have both .so(s) right here, is there a objdump/readelf command I can use to find out?

The version number is in every compilation unit header, so “objdump --dwarf=info” or “readelf --debug-dump=info”.

But an easy way that I distinguish DWARF 3 from DWARF 4 is that the former uses DW_AT_MIPS_linkage_name, and the latter should just be using DW_AT_linkage_name.
Thanks, confirmed this is V3 in #482771 vs V4 in #482772. 

Comment 19 by mark@chromium.org, Jul 20 2017

If dffc00ce78000000 is representative, then it looks to me like we’ve already gone off the rails because we don’t have system symbols for libc, and LOG(FATAL) calls into libc’s abort() to do the dirty work. Crawling back out from there, we just do stack scanning, which is basically guessing. This is nothing new, it seems well-established above. By the time we “recover” a frame for our own code (base::debug::BreakDebugger()), we’re really just stabbing around in the dark. We don’t have enough valid context to make proper sense of the CFI that we’ve recorded for our own module.

When Primiano said he saw

2017-07-18 17:08:34: postfix_evaluator-inl.h:106: ERROR: Could not PopValues to get two values for binary operation +: r7 16 +

Breakpad’s probably conveying “I have no idea what’s in r7” (because stack scanning wouldn’t have done anything to try to recover it). It probably would have also logged “postfix_evaluator-inl.h:334: INFO: Identifier r7 not in dictionary” just before that, but if you weren’t seeing messages at INFO, you wouldn’t have known.

Comment 20 by mark@chromium.org, Jul 20 2017

OK, then I’m interested in the CFI for BreakDebugger() and ~LogMessage() between those two builds, and confirmation that Breakpad does a better job walking through the same LOG(FATAL) with DWARF v3 than v4 absent any other changes.
Now that I read comment 9 in more detail trying to follow it, I realize that it requires having an official build with the change I want to test.

Is there some way to test the whole crash pipeline locally?

Thanks for figuring out that this is arm-only; that explains why the linux builds are happy.

We could blindly add -gdwarf-3 if target_cpu == "arm" for m61, and then fix dump_syms to handle dwarf4+arm async. I'll prepare a CL for that.
Re #20. Mark, as you say, BreakDebuger is a bit of a too-complicated case to debug this as it has libc on the top. I find easier to see the problem when comparing  crash/f2aaaec488000000 vs crash/c8629afe40000000, which both start in libchrome.so. one keeps going with CFI, the other goes directly into stack scanning.

> Is there some way to test the whole crash pipeline locally?
- Build chrome_public_apk with is_official_build=true, it takes just an open-source checout.
- Start chrome with --enable-crash-reporter-for-testing (use build/android/adb_chrome_public_command_line)
- Navigate to chrome://crash
- At that point that should save a .dmp file somewhere in /data/data/...chrome...
- Use dump_dyms to transform the lib.unstripped/libchrome.so into a breakpad sym (see https://www.chromium.org/developers/decoding-crash-dumps). Make sure you build dump_syms from a checkout with target_os="android"
- Use minidump_stackwalk minidump.dmp path_to_symbols_from_previous_step to see what the unwinder does. 
Thanks for the detailed descriptions. I did the first few steps, but I don't see a dmp file in /data/data/org.chromium.chrome (or its subdirectories). I verified on about:version that --enable-crash-reporter-for-testing is present.
While I can't see a dmp file, chrome://crashes offered "upload now", so I did that:
https://crash.corp.google.com/browse?stbtiq=76cabdca88000000
Project Member

Comment 25 by bugdroid1@chromium.org, Jul 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/80ad313b09c1de7954088628bc2436a16e8eab00

commit 80ad313b09c1de7954088628bc2436a16e8eab00
Author: Nico Weber <thakis@chromium.org>
Date: Thu Jul 20 19:07:47 2017

android: Add back -gdwarf-3 when targeting 32-bit arm.

I had removed this in https://codereview.chromium.org/2959083002,
but it looks like dump_syms still needs it on arm for now.

Bug:  744956 , 735027 
Change-Id: Ia438c53abc6d6f7f39aada563bdcb68d328b1115
Reviewed-on: https://chromium-review.googlesource.com/579930
Reviewed-by: Mark Mentovai <mark@chromium.org>
Commit-Queue: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#488342}
[modify] https://crrev.com/80ad313b09c1de7954088628bc2436a16e8eab00/build/config/compiler/BUILD.gn

I downloaded that dump and ran:

out/gnand/dump_syms out/gnand/lib.unstripped/libchrome.so > libchrome.syms
out/gnand/minidump_stackwalk ~/Downloads/upload_file_minidump-76cabdca88000000.dmp libchrome.syms 

This doesn't give me any symbols, so I'm likely holding something wrong :-/

I speculatively landed the -g-dwarf-3 CL, so that in case it helps, we have it on the branch.

Comment 27 by mark@chromium.org, Jul 20 2017

The second argument to minidump_stackwalk isn’t a single dump_syms output file, it’s a directory with a silly structure.

https://chromium.googlesource.com/breakpad/breakpad/src/+/21724d4964680f64c8061800235c08dd8aac2f5b/processor/simple_symbol_supplier.h#48

So you want symbols/libchrome.so/(its_debug_id)/libchrome.so.sym

Its debug ID comes from the MODULE line at the top of libchrome.so.sym.

https://www.chromium.org/developers/decoding-crash-dumps suggests that you run minidump_stackwalk once, look at its logged messages to figure out where it wants to get the symbol file from, and then set up your hierarchy that way.
I suspect that the BP just happened *, so this might need a merge-request when it gets declared.


* An educated guess from https://bugs.chromium.org/p/chromium/issues/detail?id=742265#c12
Thanks!
https://www.chromium.org/developers/decoding-crash-dumps says

This will print out lines like:
[time stamp] simple_symbol_supplier.cc:150: INFO: No symbol file at /tmp/my_symbols/libfoo/hash/libfoo.sym.

but I instead get:

$ out/gnand/minidump_stackwalk ~/Downloads/upload_file_minidump-76cabdca88000000.dmp /tmp/my_symbols/ > /dev/null
2017-07-20 15:22:21: simple_symbol_supplier.cc:160: ERROR: Can't construct symbol file path without debug_file (code_file = dalvik-main space (deleted))
2017-07-20 15:22:21: simple_symbol_supplier.cc:160: ERROR: Can't construct symbol file path without debug_file (code_file = dalvik-LinearAlloc (deleted))
2017-07-20 15:22:21: simple_symbol_supplier.cc:160: ERROR: Can't construct symbol file path without debug_file (code_file = dalvik-zygote space (deleted))
2017-07-20 15:22:21: simple_symbol_supplier.cc:160: ERROR: Can't construct symbol file path without debug_file (code_file = ashmem (deleted))
2017-07-20 15:22:21: simple_symbol_supplier.cc:160: ERROR: Can't construct symbol file path without debug_file (code_file = dalvik-Jit thread pool worker thread 0 (deleted))
2017-07-20 15:22:21: simple_symbol_supplier.cc:160: ERROR: Can't construct symbol file path without debug_file (code_file = dalvik-main space 1 (deleted))
2017-07-20 15:22:21: simple_symbol_supplier.cc:160: ERROR: Can't construct symbol file path without debug_file (code_file = RELRO:libchrome.so (deleted))

Trying to do this manually:

$ head -2 libchrome.syms 
MODULE Linux arm 165B09E370282AEF53195DCE37F341FB0 libchrome.so
INFO CODE_ID E3095B162870EF2A53195DCE37F341FBC8BEE129

$ ls -R /tmp/my_symbols/
/tmp/my_symbols/:
libchrome  libchrome.so

/tmp/my_symbols/libchrome:
165B09E370282AEF53195DCE37F341FB0

/tmp/my_symbols/libchrome/165B09E370282AEF53195DCE37F341FB0:
libchrome.so.syms  libchrome.sym  libchrome.syms

/tmp/my_symbols/libchrome.so:
165B09E370282AEF53195DCE37F341FB0

/tmp/my_symbols/libchrome.so/165B09E370282AEF53195DCE37F341FB0:
libchrome.sym  libchrome.syms


~/Downloads/upload_file_minidump-76cabdca88000000.dmp /tmp/my_symbols/

But, still, no symbols.

Comment 30 by mark@chromium.org, Jul 20 2017

Go lowercase on that hashy thing. And go for libchrome.so/hashy/libchrome.so.sym.
Torne showed me a helper script for setting up the symbols:

components/crash/content/tools/generate_breakpad_symbols.py --build-dir=out/gnand --symbols-dir=/tmp/my_symbols/ --binary=out/gnand/lib.unstripped/libchrome.so --clear --verbose

With that:

out/gnand/minidump_stackwalk ~/Downloads/upload_file_minidump-76cabdca88000000.dmp /tmp/my_symbols/  | less

Thread 10 (crashed)
 0  libchrome.so!content::CrashIntentionally() [render_frame_impl.cc : 742 + 0x0]
     r0 = 0x00000000    r1 = 0xa70dd78e    r2 = 0x00000000    r3 = 0x82764fdf
     r4 = 0xa71b9af8    r5 = 0xa71b9878    r6 = 0xa71b987c    r7 = 0xa71b9a60
     r8 = 0xa71b9cd8    r9 = 0x9c274a00   r10 = 0xa71b9af8   r12 = 0x826e94db
     fp = 0xa71ba1f8    sp = 0xa71b9870    lr = 0x8492b207    pc = 0x8492b208
    Found by: given as instruction pointer in context
 1  libchrome.so!content::MaybeHandleDebugURL(GURL const&) [render_frame_impl.cc : 814 + 0x3]
     r4 = 0xa71b9af8    r5 = 0xa71b9878    r6 = 0xa71b987c    r7 = 0xa71b9a60
     r8 = 0xa71b9cd8    r9 = 0x9c274a00   r10 = 0xa71b9af8    fp = 0xa71ba1f8
     sp = 0xa71b9878    pc = 0x8492b2f7
    Found by: call frame info
 2  libchrome.so!content::RenderFrameImpl::PrepareRenderViewForNavigation(GURL const&, content::RequestNavigationParams const&) [render_frame_impl.cc : 6313 + 0x7]
     r4 = 0x9c274a00    r5 = 0xa71b9cc0    r6 = 0xa71b9af8    r7 = 0xa71b9a60
     r8 = 0xa71b9cd8    r9 = 0x9c274a00   r10 = 0xa71b9af8    fp = 0xa71ba1f8
     sp = 0xa71b9920    pc = 0x8493b11d
    Found by: call frame info
 3  libchrome.so!content::RenderFrameImpl::NavigateInternal(content::CommonNavigationParams const&, content::StartNavigationParams const&, content::RequestNavigationParams const&, std::__ndk1::unique_ptr<content::StreamOverrideParameters, std::__ndk1::default_delete<content::StreamOverrideParameters> >) [render_frame_impl.cc : 5977 + 0x9]
     r4 = 0x00000008    r5 = 0xa71b9cc0    r6 = 0xa71b9cd8    r7 = 0xa71b9a60
     r8 = 0xa71b9cd8    r9 = 0x9c274a00   r10 = 0xa71b9af8    fp = 0xa71ba1f8
     sp = 0xa71b9958    pc = 0x849339b3
    Found by: call frame info
 4  libchrome.so!content::RenderFrameImpl::OnNavigate(content::CommonNavigationParams const&, content::StartNavigationParams const&, content::RequestNavigationParams const&) [render_frame_impl.cc : 1721 + 0x3]
     r4 = 0x853f8438    r5 = 0x00000000    r6 = 0xa71b9af8    r7 = 0x9c274a00
     r8 = 0xa71b9cd8    r9 = 0x8dc9e404   r10 = 0xa71b9cc0    fp = 0xa71ba1f8
     sp = 0xa71b9a90    pc = 0x8492e9f1
    Found by: call frame info
 5  libchrome.so!bool IPC::MessageT<FrameMsg_Navigate_Meta, std::__ndk1::tuple<content::CommonNavigationParams, content::StartNavigationParams, content::RequestNavigationParams>, void>::Dispatch<content::RenderFrameImpl, content::RenderFrameImpl, void, void (content::RenderFrameImpl::*)(content::CommonNavigationParams const&, content::StartNavigationParams const&, content::RequestNavigationParams const&)>(IPC::Message const*, content::RenderFrameImpl*, content::RenderFrameImpl*, void*, void (content::RenderFrameImpl::*)(content::CommonNavigationParams const&, content::StartNavigationParams const&, content::RequestNavigationParams const&)) [tuple.h : 84 + 0x3]
     r4 = 0x9c274a00    r5 = 0x8dc9e404    r6 = 0xa71b9af8    r7 = 0xa71b9e58
     r8 = 0x9c274a00    r9 = 0x8dc9e404   r10 = 0x839dd921    fp = 0xa71ba1f8
     sp = 0xa71b9ad8    pc = 0x8492e92b
    Found by: call frame info
 6  libchrome.so!content::RenderFrameImpl::OnMessageReceived(IPC::Message const&) [render_frame_impl.cc : 1585 + 0x17]
     r4 = 0xa71b9df8    r5 = 0x00000001    r6 = 0xa71b9df8    r7 = 0xa71b9e58
     r8 = 0x9c274a00    r9 = 0x8dc9e404   r10 = 0x839dd921    fp = 0xa71ba1f8
     sp = 0xa71b9de8    pc = 0x8492ceaf


I think the "Found by: call frame info" lines mean that -gdwarf-3 did help?
thakis@thakis:~/src/chrome/src$ ls -R /tmp/my_symbols/ 
/tmp/my_symbols/:
libchrome.so

/tmp/my_symbols/libchrome.so:
165B09E370282AEF53195DCE37F341FB0

/tmp/my_symbols/libchrome.so/165B09E370282AEF53195DCE37F341FB0:
libchrome.so.sym


That was one combination that I didn't try manually in comment 29.  I'll update https://www.chromium.org/developers/decoding-crash-dumps .
ignore the dalvik, ashmem and RELRO warnings, they are WAI.
the hashy things should be uppercase, or at least for me it works uppercase.
You got quite close there, The file should be
/tmp/my_symbols/libchrome.so/165B09E370282AEF53195DCE37F341FB0/libchrome.so.sym

you had all possible combinations in #29, % the right one :)

Comment 34 by mark@chromium.org, Jul 20 2017

Yes, that actually looks great.

Looks like minidump_stackwalk didn’t tell you the paths it wanted because those are LOG(INFO), and someone put BPLOG_MINIMUM_SEVERITY=SEVERITY_ERROR in its BUILD.gn.
Owner: ----
Status: Available (was: Assigned)
Summary: dump_syms or minidump_stackwalk confused by clang's DWARF4 debug data when targeting arm (was: bad symbolization release check crash (since switching to clang?))
Cool. Retitling for the actual cause. I'll merge the change to m-61 if we missed that like primiano predicts.

Other than that, I'm marking this bug as available so someone can fix this for reals. (Who'd do this? crash team? mark? nobody?)

Comment 36 by mark@chromium.org, Jul 20 2017

Me or nobody, because I’m hoping to just retire all of that Breakpad processor stuff in favor of something that understands DWARF a bit better.
m61 branched at #488528, so it has the change from comment 25 already.
Project Member

Comment 38 by sheriffbot@chromium.org, Jul 23

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Components: Internals>CrashReporting
Cc: -amineer@chromium.org
No longer on the Chrome team, e-mail me @google.com if any attention still required from me here, otherwise good luck!
Status: WontFix (was: Untriaged)
This came back *again* in  issue 889937  because we switched to building both 32 and 64 bit on the arm64 builders for monochrome instead of merging separate APKs, and the condition for whether to use DWARF3 is incorrectly based on target_cpu instead of current_cpu, so the 32-bit library was built with DWARF4 and had no breakpad-parseable unwind info. :(

I'll fix the workaround, but I guess we just aren't going to fix breakpad at this stage due to the switch to crashpad being so close - I'm going to wontfix this unless anyone has objections.
Project Member

Comment 42 by bugdroid1@chromium.org, Sep 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/92013e54ab1d7a2e1bde349f36ad1f80f32f3112

commit 92013e54ab1d7a2e1bde349f36ad1f80f32f3112
Author: Torne (Richard Coles) <torne@google.com>
Date: Thu Sep 27 18:22:32 2018

Fix compiler flags incorrectly checking target_cpu.

Various GN conditions check target_cpu when deciding which compiler
flags to apply; they should be checking current_cpu instead so that
builds with more than one toolchain use the correct flags for each
toolchain. This was causing the DWARF version workaround for
 crbug.com/744956  to not be applied on the 64-bit build when using the
32-bit toolchain.

Bug:  744956 ,  889937 
Change-Id: Id2f4ecafd762e36ed3593fb45f6a6062bf8f6326
Reviewed-on: https://chromium-review.googlesource.com/1249393
Reviewed-by: Nico Weber <thakis@chromium.org>
Commit-Queue: Richard Coles <torne@chromium.org>
Cr-Commit-Position: refs/heads/master@{#594791}
[modify] https://crrev.com/92013e54ab1d7a2e1bde349f36ad1f80f32f3112/build/config/compiler/BUILD.gn

Sign in to add a comment