Issue metadata
Sign in to add a comment
|
MemoryDumpManagerTest.PostTaskForSequencedTaskRunner hang flake on Fuchsia/x64 |
||||||||||||||||||||||||
Issue descriptionIn FYI build https://build.chromium.org/p/chromium.fyi/builders/Fuchsia/builds/13225, the MemoryDumpManagerTest.PostTaskForSequencedTaskRunner hung, leading to the process timing-out. We captured a threads dump: [00537.357] 03902.04210> [3:1208766299:0124/020019.812458:537357663:ERROR:test_launcher.cc(376)] Dumping threads in process 45056 [00537.394] 62552.62578> Backtrace of threads of process 45056: base_unittests__exec [00537.394] 62552.62578> arch: x86_64 [00537.394] 62552.62578> 2 thread(s) ... [00537.401] 62552.62578> <== Thread initial-thread[45093] ==> ... #01: pc 0x7e29a2f2c84b sp 0x1085d3ffb6a8 (<vDSO>,0x684b) #02: pc 0x7e29a2f2c569 sp 0x1085d3ffb6b0 (<vDSO>,0x6569) #03: pc 0x410ed427e571 sp 0x1085d3ffb6d0 (libc.so,0x44571) #04: pc 0x410ed4251f88 sp 0x1085d3ffb710 (libc.so,0x17f88) #05: base::ConditionVariable::Wait() at ??:? #06: base::SequencedWorkerPool::Inner::Shutdown(int) at ??:? #07: base::SequencedWorkerPoolOwner::~SequencedWorkerPoolOwner() at ??:? #08: base::trace_event::(anonymous namespace)::TestSequencedTaskRunner::~TestSequencedTaskRunner() at ??:? #09: std::__1::__tree<scoped_refptr<base::trace_event::MemoryDumpProviderInfo>, base::trace_event::MemoryDumpProviderInfo::Comparator, std::__1::allocator<scoped_refptr<base::trace_event::MemoryDumpProviderInfo> > >::destroy(std::__1::__tree_node<scoped_refptr<base::trace_event::MemoryDumpProviderInfo>, void*>*) at ??:? #10: base::trace_event::MemoryDumpManager::~MemoryDumpManager() at ??:? #11: base::trace_event::MemoryDumpManagerTest::TearDown() at ??:? #12: testing::TestInfo::Run() at ??:? #13: testing::TestCase::Run() at ??:? #14: testing::internal::UnitTestImpl::RunAllTests() at ??:? #15: testing::UnitTest::Run() at ??:? #16: base::TestSuite::Run() at ??:? #17: base::(anonymous namespace)::LaunchUnitTestsInternal(base::RepeatingCallback<int ()> const&, unsigned long, int, bool, base::RepeatingCallback<void ()> const&) at ??:? #18: base::LaunchUnitTests(int, char**, base::RepeatingCallback<int ()> const&) at ??:? #19: main at ??:? #20: pc 0x410ed425514e sp 0x1085d3ffbfe0 (libc.so,0x1b14e) #21: pc 0 sp 0x1085d3ffc000 [00537.426] 62552.62578> <== Thread Test Task RunnerWorker1/1859451[45433] ==> ... #01: pc 0x410ed4252e50 sp 0x63c4ee42a7c8 (libc.so,0x18e50) #02: base::internal::MessageLoopTaskRunner::RunsTasksInCurrentSequence() const at ??:? #03: base::RunLoop::Quit() at ??:? #04: base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) at ??:? #05: base::MessageLoop::RunTask(base::PendingTask*) at ??:? #06: base::MessageLoop::DoWork() at ??:? #07: base::MessagePumpDefault::Run(base::MessagePump::Delegate*) at ??:? #08: base::RunLoop::Run() at ??:? #09: base::SequencedWorkerPoolOwner::~SequencedWorkerPoolOwner() at ??:? #10: base::trace_event::(anonymous namespace)::TestSequencedTaskRunner::~TestSequencedTaskRunner() at ??:? #11: base::trace_event::MemoryDumpManager::ContinueAsyncProcessDump(base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState*) at ??:? #12: base::SequencedWorkerPool::Inner::ThreadLoop(base::SequencedWorkerPool::Worker*) at ??:? #13: base::SequencedWorkerPool::Worker::Run() at ??:? #14: base::SimpleThread::ThreadMain() at ??:? #15: base::(anonymous namespace)::ThreadFunc(void*) at ??:? #16: pc 0x410ed42528b6 sp 0x63c4ee42afe0 (libc.so,0x188b6) #17: pc 0x410ed42c871a sp 0x63c4ee42aff0 (libc.so,0x8e71a) #18: pc 0 sp 0x63c4ee42b000 [00537.966] 03902.04210> [3:1208766299:0124/020020.420678:537965900:ERROR:test_launcher.cc(398)] going to zx_task_kill(job) for 4a4c56fb [00537.969] 03902.04210> [3:1208766299:0124/020020.423825:537969027:ERROR:test_launcher.cc(415)] removing 4a4c56fb from live process list ... [00538.000] 03902.03951> 1 test timed out: [00538.000] 03902.03951> MemoryDumpManagerTest.PostTaskForSequencedTaskRunner (../../base/trace_event/memory_dump_manager_unittest.cc:507) It looks (based on the stacks) like there's a MemoryDumpProvider in the manager, with the TestSequencedTaskRunner set on it, and we're managing to simultaneously touch the MDP on both the main thread and the SequencedWorkerPool thread, leading to a double-free, which in this case happens to deadlock things. O_o
,
Jan 24 2018
I want to be realistic on this. I am not going to look to an OS_FUCSIA flakiness. As per "[chromium-dev] PSA: New platform: OS_FUCHSIA, "you don't have to worry more than you are worried about OS_FREEBSD or OS_SOLARIS today" Patches from the Fuchsia team are more than welcome.
,
Jan 24 2018
Re #2: Thanks for pointing out the out-of-date PSA; we'll send out a refresh with the current state of things. This looks to be a deadlock between the two TestSequencedTaskRunner instances, arising from the underlying SequencedWorkerOwner being destroyed on a WorkerPool/TaskScheduler thread, rather than on the main test thread. I'll diagnose further to find the most appropriate owner.
,
Jan 25 2018
Based on the stack trace from the hang (see bug description), this is a bug in the TestSequencedTaskRunner, which is ref-counted and may end up being deleted on any thread, but contains the SequencedWorkerPoolOwner, which owns the SequencedWorkerPool, which must be Shutdown() on the same sequenced on which it was created. Based on the code itself, TestSequencedTaskRunner seems a peculiar helper; we create the underlying SequencedWorkerPool configured for two worker threads, and then pull out a single sequence token and use that to PostTask to it - we effectively get a single thread, but implemented using two underlying threads.
,
Jan 25 2018
ssid: Looks like you changed the thread-hopping logic in MDM in https://chromium-review.googlesource.com/c/chromium/src/+/847911 - any idea how ContinueAsyncProcessDump() could still be running _after_ we have return from RequestProcessDumpAndWait()?
,
Jan 25 2018
FYI SequencedWorkerPoolOwner is about to be removed https://chromium-review.googlesource.com/c/chromium/src/+/881532
,
Jan 25 2018
gab: Great; that CL should fix this test. Basically the current implementation runs a tiny risk of the TestSequencedTaskRunner, and hence the SequencedWorkerPoolOwner, being deleted off the main thread, but during teardown the Owner spins a RunLoop which was bound to the main thread at construction time, so it is actually processing tasks posted to the main thread, on that Worker thread -> live-lock.
,
Jan 25 2018
,
Feb 21 2018
Is this complete, now that https://chromium-review.googlesource.com/c/chromium/src/+/881532 has landed?
,
May 10 2018
Should be fixed. The issue was caused by the fact that SequencedWorkerPool was ref-counted. ScopedTaskEnvironment, now used in this test, doesn't have this issue. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by erikc...@chromium.org
, Jan 24 2018