Issue metadata
Sign in to add a comment
|
MemoryDumpManager::OnTraceLogDisabled crashe on ChromeOS |
||||||||||||||||||||||||||
Issue descriptionChrome Version: 57.0.2970.0 (Official Build) dev (64-bit) OS: ChromeOS What steps will reproduce the problem? I was using ChromeOS and it started kswapd0'ing frantically. The system hung shortly thereafter and then crashed out. The Chrome User Log contained: [1:2:0117/161643.863573:ERROR:memory_dump_manager.cc(561)] Disabling MemoryDumpProvider "V8Isolate". Failed to post task on the task runner provided. and [1:1:0117/161643.913547:ERROR:persistent_histogram_allocator.cc(411)] NOTREACHED() hit. at the end of it.
,
Jan 18 2017
The stack trace is: 0x00007f2a317f196b (chrome -memory_dump_manager.cc:807 ) base::trace_event::MemoryDumpManager::OnTraceLogDisabled() 0x00007f2a31800d7c (chrome -trace_log.cc:751 ) base::trace_event::TraceLog::SetDisabledWhileLocked(unsigned char) 0x00007f2a318013c2 (chrome -trace_log.cc:697 ) base::trace_event::TraceLog::SetDisabled() 0x00007f2a30d9acda (chrome -tracing_controller_impl.cc:248 ) content::TracingControllerImpl::SetDisabledOnFileThread(base::Callback<void (), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> const&) 0x00007f2a2fe614ec (chrome -callback.h:68 ) base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) 0x00007f2a2fe4b50d (chrome -message_loop.cc:421 ) base::MessageLoop::DoWork() 0x00007f2a2fe4bad2 (chrome -message_pump_libevent.cc:218 ) base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) 0x00007f2a317d3a17 (chrome -run_loop.cc:37 ) base::RunLoop::Run() 0x00007f2a30b07a44 (chrome -browser_thread_impl.cc:250 ) content::BrowserThreadImpl::FileThreadRun(base::RunLoop*) 0x00007f2a30b07b82 (chrome -browser_thread_impl.cc:303 ) content::BrowserThreadImpl::Run(base::RunLoop*) 0x00007f2a317ea728 (chrome -thread.cc:328 ) base::Thread::ThreadMain() 0x00007f2a317e7bd5 (chrome -platform_thread_posix.cc:71 ) ThreadFunc 0x00007f2a2ea80306 (libpthread-2.19.so -pthread_create.c:309 ) start_thread 0x00007f2a2d7d15cc (libc-2.19.so + 0x000f95cc ) clone
,
Jan 18 2017
I don't suppose you have the "error=X" value that should have been printed after the NOTREACHED? Most likely, it was 6 indicating that the persistent histogram memory was full. While that's not good, the system will continue to work by placing histograms in non-persistent memory... which is why a very recent change (i.e. yesterday) has disabled the NOTREACHED for the "full" condition. There was a recent CL to both stable and beta (Jan 5 and 10, respectively) that increases the size of the persistent histogram memory but I think 2970 is too old for that. See: https://bugs.chromium.org/p/chromium/issues/detail?id=675600
,
Jan 18 2017
There is no "error=X" value after the NOTREACHED - just "NOTREACHED() hit." is loggged. Note that this is a retail build, so presumably we don't have the verbose logging built-in?
,
Jan 18 2017
Why does a retail build have DCHECK enabled at all? Anyway, it's probably the "full" condition so will not occur with a build from HEAD or beta-HEAD.
,
Jan 18 2017
Re #5: ChromeOS has a special-case to treat NOTREACHED() as LOG(ERROR) in non-DCheck builds, for some reason.
,
Jan 18 2017
Re #5: ChromeOS has a special-case to treat NOTREACHED() as LOG(ERROR) in non-DCheck builds, for some reason.
,
Jan 18 2017
+primiano: I am surprised by this stack trace. The crash happened at this line: mdpinfo->dump_provider->SuspendFastMemoryPolling(); The crash was seg fault at address 0x0000002d. Which means either mdpinfo or dump_provider is NULL or corrupt? |dump_providers_for_polling_| is always accessed with lock and mdpinfo is scoped_refptr. I can't see why that would be null. The only way I can think of, why this crash would happen is: The RegisterDumpProvider was called with NULL dump provider. The only dump provider that sets |is_fast_polling_supported| to true is ProcessMetricsMemoryDumpProvider. This dump provider should never be unregistered on ChromeOS. The only code that unregisters process metrics provider should be happening only on OS_LINUX: https://chromium.googlesource.com/chromium/src/+/master/content/browser/tracing/tracing_controller_impl.cc#507 There is no way dump_provider can be null since MemoryDumpProviderInfo::dump_provider is constant, unless the Register was called with nullptr. I can't say why ProcessMetricsMemoryDumpProvider::RegisterForProcess would register a null mdp.
,
Jan 19 2017
wez: thanks for reporting this. RE #8: > The crash was seg fault at address 0x0000002d. Which means either mdpinfo or dump_provider is NULL or corrupt? Right. By look at the assembly in the minidump, it seems that crash happens at the virtual call site. 7f2a317f1960 00000035: 49 8b 77 20 mov rsi,QWORD PTR [r15+0x20] // this should be the beginning of for (const auto& mdpinfo) loop 7f2a317f1964 00000039: 48 8b 7e 08 mov rdi,QWORD PTR [rsi+0x8] 7f2a317f1968 0000003d: 4c 8b 07 mov r8,QWORD PTR [rdi] =>7f2a317f196b 00000040: 41 ff 50 28 call QWORD PTR [r8+0x28] // this looks like the vtable call 7f2a317f196f 00000044: 4c 89 ff mov rdi,r15 7f2a317f1972 00000047: e8 f9 36 40 fe call 0xfe403745 7f2a317f1977 0000004c: 49 89 c7 mov r15,rax 7f2a317f197a 0000004f: 49 39 c4 cmp r12,rax 7f2a317f197d 00000052: 75 e1 jne 0x35 // This should be the block-end of the for (const auto& mdpinfo) loop SuspendFastMemoryPolling is the 5th virtual method of MemoryDumpProvider, 5 * sizeof(PTR) = 40 -> 0x28, yes that crashed because the vptr on which we called SuspendFastMemoryPolling was invalid. So mdpinfo itself was either valid (likely), or was corrupt but still had a valid pointer (to garbage) at the position expected by |dump_provider| (which is very hard, though still possible) if mdpinfo was null or evidently invalid we would have crashed on the previous mov r8, [rdi]. Now the interesting thing is that it looks like (but here I might be overlooking some compiler trick) that mdpinfo->dump_provider wasn't really nullptr, but 5 (that also explains why the crash address is 0x2d -> 0x28 + 0x5). So, take this with a grain of salt, but there is a chance here that dump_provider = 0x05, which is even weirder. > The only code that unregisters process metrics provider should be happening only on OS_LINUX: > https://chromium.googlesource.com/chromium/src/+/master/content/browser/tracing/tracing_controller_impl.cc#507 OS_LINUX applies also to CrOS :) so, yup, here the fact that we unregister the MDP is not surprising. Very likely we are unregistering the MDP for a child process that is gone. Having said all this I don't have still a valid theory that explains why either mdpinfo or dump_provider would have been null or invalid. :-/ assigning this to ssid as the NOTREACHED thing seems settled, so this crash on memory-infra is the only thing left here.
,
Jan 19 2017
wez, just to clarify: where you using tracing (or trying to) when this crash happened?
,
Jan 19 2017
IIRC then yes, when the jank started I deliberately kicked off a Tracing-on-Tap run.
,
Mar 1 2017
Okay this is same as issue 695731 . The dump provider is invalid. |
|||||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||||
Comment 1 by w...@chromium.org
, Jan 18 2017