New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 864653 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 25
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Fuchsia
Pri: 3
Type: Bug



Sign in to add a comment

ValuesStructTraitsTest.SerializeInvalidDictionaryValue flakes now and again on Fuchsia/x64/Debug/FYI due to base::StackTrace()

Project Member Reported by w...@chromium.org, Jul 17

Issue description

In run https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/fuchsia-fyi-x64-dbg/298 this test flaked with:

[ RUN      ] ValuesStructTraitsTest.SerializeInvalidDictionaryValue
../../mojo/public/cpp/base/values_unittest.cc:117: Failure
Death test: mojo::test::SerializeAndDeserialize<mojom::DictionaryValue>(&in, &out)
    Result: died but not with expected error.
  Expected: Check failed
Actual msg:
[  DEATH   ] [65572:297500457:0717/090746.398514:30041585:WARNING:test_suite.cc(240)] Test launcher output path /tmp/.org.chromium.Chromium.Llfjgj/test_results.xml exists. Not adding test launcher result printer.
[  DEATH   ]
Stack trace:
bt#00: pc 0x4720c27b9b19 (libmessage_support.so,0x144f55afb19)
bt#01: pc 0x4720c27d5097 (libmessage_support.so,0x144f55cb097)
bt#02: pc 0x4720c27d4437 (libmessage_support.so,0x144f55ca437)
bt#03: pc 0x4720c22c0a5d (libmessage_support.so,0x144f50b6a5d)
bt#04: pc 0x4720c280d18e (libmessage_support.so,0x144f560318e)
bt#05: end
[  FAILED  ] ValuesStructTraitsTest.SerializeInvalidDictionaryValue (441 ms)

However, a crash was also logged, with:

#01: pc 0x5d1dfe321434 sp 0x49ffd84c3c08 (libc.so,0x8b434)
#02: void std::__2::__insertion_sort_3<base::debug::(anonymous namespace)::SymbolMap::Populate()::$_0&, base::debug::(anonymous namespace)::SymbolMap::Entry*>(base::debug::(anonymous namespace)::SymbolMap::Entry*, base::debug::(anonymous namespace)::SymbolMap::Entry*, base::debug::(anonymous namespace)::SymbolMap::Populate()::$_0&) at stack_trace_fuchsia.cc:?
#03: void std::__2::__sort<base::debug::(anonymous namespace)::SymbolMap::Populate()::$_0&, base::debug::(anonymous namespace)::SymbolMap::Entry*>(base::debug::(anonymous namespace)::SymbolMap::Entry*, base::debug::(anonymous namespace)::SymbolMap::Entry*, base::debug::(anonymous namespace)::SymbolMap::Populate()::$_0&) at stack_trace_fuchsia.cc:?
#04: void std::__2::__sort<base::debug::(anonymous namespace)::SymbolMap::Populate()::$_0&, base::debug::(anonymous namespace)::SymbolMap::Entry*>(base::debug::(anonymous namespace)::SymbolMap::Entry*, base::debug::(anonymous namespace)::SymbolMap::Entry*, base::debug::(anonymous namespace)::SymbolMap::Populate()::$_0&) at stack_trace_fuchsia.cc:?
#05: base::debug::(anonymous namespace)::SymbolMap::Populate() at stack_trace_fuchsia.cc:?
#06: base::debug::(anonymous namespace)::SymbolMap::SymbolMap() at stack_trace_fuchsia.cc:?
#07: base::debug::StackTrace::OutputToStream(std::__2::basic_ostream<char, std::__2::char_traits<char> >*) const at ??:?
#08: pc 0 sp 0x49ffd84cab00

suggesting that the death-test sub-process (which should have hit a DCHECK) crashed while trying to generate a backtrace.
 
Cc: jam...@chromium.org
Fault address is the program-counter, which is pointing into libc, at memcpy().

jamesr: Any ideas why trying to jump into libc would trigger page faults? OOM, perhaps?
That likely means either the input or output parameter pointers point to bad memory, which could be OOM or more likely an uninitialized parameter or heap corruption.  Do you have the register values in the crash report, and/or the faulting address?  Could be useful to see if the pointer being derefer'd looks like "0xa" or the like
According to the report (accessible via the link), RIP and PC are the same - doesn't that mean it's the code page that is faulting?
Where do you see a symbolized report from the crashing process?
Re #4: If you click through to the full output from the run then there is output with all the register states, unsymbolized stack, etc. I grabbed the address & ran it through addr2line on libc.so to get to memcpy.S
Another failure, with the my semi-manual SDK roll (so easier to find the right symbols) https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/fuchsia-fyi-x64-dbg/319
How is the child process attempting to unwind its stack?  Are we building with frame pointers enabled?  It's possible the unwind fails and jumps off into the weeds if it's relying entirely on the unwind tables, we haven't found those to be completely reliable.
Prior to the "unexpected output" log the crash output includes:

#01: pc 0x5c3545bf6434 sp 0x51a640e94c18 (libc.so,0x8b434)
#02: void std::__2::__insertion_sort_3<base::debug::(anonymous namespace)::SymbolMap::Populate()::$_0&, base::debug::(anonymous namespace)::SymbolMap::Entry*>(base::debug::(anonymous namespace)::SymbolMap::Entry*, base::debug::(anonymous namespace)::SymbolMap::Entry*, base::debug::(anonymous namespace)::SymbolMap::Populate()::$_0&) at stack_trace_fuchsia.cc:?
#03: void std::__2::__sort<base::debug::(anonymous namespace)::SymbolMap::Populate()::$_0&, base::debug::(anonymous namespace)::SymbolMap::Entry*>(base::debug::(anonymous namespace)::SymbolMap::Entry*, base::debug::(anonymous namespace)::SymbolMap::Entry*, base::debug::(anonymous namespace)::SymbolMap::Populate()::$_0&) at stack_trace_fuchsia.cc:?
#04: void std::__2::__sort<base::debug::(anonymous namespace)::SymbolMap::Populate()::$_0&, base::debug::(anonymous namespace)::SymbolMap::Entry*>(base::debug::(anonymous namespace)::SymbolMap::Entry*, base::debug::(anonymous namespace)::SymbolMap::Entry*, base::debug::(anonymous namespace)::SymbolMap::Populate()::$_0&) at stack_trace_fuchsia.cc:?
#05: base::debug::(anonymous namespace)::SymbolMap::Populate() at stack_trace_fuchsia.cc:?
#06: base::debug::(anonymous namespace)::SymbolMap::SymbolMap() at stack_trace_fuchsia.cc:?
#07: base::debug::StackTrace::OutputToStream(std::__2::basic_ostream<char, std::__2::char_traits<char> >*) const at ??:?

which suggests that something is going wrong while we're populating the SymbolMap. The only memcpy-ish things in there are some strcpy-ish calls for the process name, so perhaps we're overrunning a buffer somwhere? That wouldn't explain, though, why the failure address is the PC, which I assume indicates a code page fault.
We have an early-exit that just truncates the lookup table if we run out of space, though. This looks more like a rogue access while actually populating an Entry.
Owner: fdegans@chromium.org
Status: Started (was: Untriaged)

Sign in to add a comment