Stack sampler crashes on Mac Intel GPU bots |
|||||||||
Issue descriptionIn this tryjob: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/565544 WebglConformance_conformance_glsl_bugs_compound_assignment_type_combination failed because of a crash inside the NativeStackSamplerMac: Operating system: Mac OS X 10.12.6 16G29 CPU: amd64 family 6 model 69 stepping 1 4 CPUs GPU: UNKNOWN Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash address: 0xffffffffffffffda Process uptime: 5 seconds Thread 5 (crashed) 0 libunwind.dylib + 0x1195 rax = 0x00007fffdb97e195 rdx = 0x00000000000058d1 rcx = 0x000070000ffb3340 rbx = 0x000070000ffb32f0 rsi = 0xffffffffffffffda rdi = 0x0000000000000000 rbp = 0x000070000ffb32b0 rsp = 0x000070000ffb32a8 r8 = 0x0000000000000002 r9 = 0x00007fffdb97e214 r10 = 0x000000011973afb0 r11 = 0x000000011973ad70 r12 = 0x00007fc66c50ee20 r13 = 0x000070000ffb32f0 r14 = 0x0000000000000000 r15 = 0x000070000ffb3e50 rip = 0x00007fffdb97e195 Found by: given as instruction pointer in context 1 libunwind.dylib + 0x10d1 rbp = 0x000070000ffb32e0 rsp = 0x000070000ffb32c0 rip = 0x00007fffdb97e0d1 Found by: previous frame's frame pointer 2 Chromium Framework!__ZN4base12_GLOBAL__N_120WalkStackFromContextIZNS0_21NativeStackSamplerMac27SuspendThreadAndRecordStackEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleEE3$_1EEbP13unw_context_tmPmPNSt3__16vectorINS6_6ModuleENSD_9allocatorISF_EEEEPNSE_INS0_11ModuleIndexENSG_ISK_EEEERKT_ + 0x124 rbp = 0x000070000ffb37f0 rsp = 0x000070000ffb32f0 rip = 0x00000001196a3da4 Found by: previous frame's frame pointer 3 Chromium Framework!__ZN4base12_GLOBAL__N_121NativeStackSamplerMac17RecordStackSampleEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleE + 0x570 rbp = 0x000070000ffb3e90 rsp = 0x000070000ffb3800 rip = 0x00000001196a32e0 Found by: previous frame's frame pointer 4 Chromium Framework!__ZN4base21StackSamplingProfiler14SamplingThread12RecordSampleEPNS1_17CollectionContextE + 0x43f rbp = 0x000070000ffb4000 rsp = 0x000070000ffb3ea0 rip = 0x00000001196a6c4f Found by: previous frame's frame pointer 5 Chromium Framework!__ZN4base21StackSamplingProfiler14SamplingThread21PerformCollectionTaskEi + 0xec rbp = 0x000070000ffb41a0 rsp = 0x000070000ffb4010 rip = 0x00000001196a6fac Found by: previous frame's frame pointer 6 Chromium Framework!__ZN4base5debug13TaskAnnotator7RunTaskEPKcPNS_11PendingTaskE + 0x105 rbp = 0x000070000ffb4380 rsp = 0x000070000ffb41b0 rip = 0x000000011962e2d5 Found by: previous frame's frame pointer 7 Chromium Framework!__ZN4base8internal17IncomingTaskQueue7RunTaskEPNS_11PendingTaskE + 0x79 rbp = 0x000070000ffb44d0 rsp = 0x000070000ffb4390 rip = 0x00000001196696c9 Found by: previous frame's frame pointer 8 Chromium Framework!__ZN4base11MessageLoop7RunTaskEPNS_11PendingTaskE + 0x207 rbp = 0x000070000ffb4690 rsp = 0x000070000ffb44e0 rip = 0x000000011966dcd7 Found by: previous frame's frame pointer 9 Chromium Framework!__ZN4base11MessageLoop21DeferOrRunPendingTaskENS_11PendingTaskE + 0xca rbp = 0x000070000ffb4850 rsp = 0x000070000ffb46a0 rip = 0x000000011966e01a Found by: previous frame's frame pointer 10 Chromium Framework!__ZN4base11MessageLoop13DoDelayedWorkEPNS_9TimeTicksE + 0x2b8 rbp = 0x000070000ffb4a70 rsp = 0x000070000ffb4860 rip = 0x000000011966e5a8 Found by: previous frame's frame pointer 11 Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase7RunWorkEv + 0x43 rbp = 0x000070000ffb4aa0 rsp = 0x000070000ffb4a80 rip = 0x00000001196717c3 Found by: previous frame's frame pointer 12 Chromium Framework!__ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE + 0xa rbp = 0x000070000ffb4ab0 rsp = 0x000070000ffb4ab0 rip = 0x000000011965468a Found by: previous frame's frame pointer 13 Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase13RunWorkSourceEPv + 0x3f rbp = 0x000070000ffb4af0 rsp = 0x000070000ffb4ac0 rip = 0x000000011967108f Found by: previous frame's frame pointer 14 CoreFoundation + 0xa7321 rbp = 0x000070000ffb4b00 rsp = 0x000070000ffb4b00 rip = 0x00007fffc5fab321 Found by: previous frame's frame pointer lgrey@ or avi@, could you please investigate this failure mode of the stack walker? It's affected at least this job on the CQ and impacts browser stability. Thanks.
,
Oct 16 2017
,
Oct 16 2017
kbr@ have you seen this happen in other runs? I'm not getting very far with the minidump and trying to find a way to get a live repro
,
Oct 16 2017
It looks like this hasn't happened recently on either of these waterfall bots: https://ci.chromium.org/buildbot/chromium.gpu/Mac%20Release%20%28Intel%29/?limit=200 https://ci.chromium.org/buildbot/chromium.gpu/Mac%20Retina%20Release%20%28AMD%29/?limit=200 or tryservers: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/?limit=200 https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/?limit=200 Downgrading to P2. Is there something we should change about the minidump uploading to make these more debuggable after the fact?
,
Oct 17 2017
No, the minidump itself is fine, it's just been harder to extract the root cause than the other sampler crashes. Thanks for looking!
,
Oct 24 2017
There was a cluster of these in early October, last one was on the 14th I'm curious if this has some relation to the periodic sampling lifetime issue. I don't exactly see how, though. https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Mac%27%20OMIT%20RECORD%20IF%20SUM(CrashedStackTrace.StackFrame.FunctionName%3D%27libunwind%3A%3ACompactUnwinder_x86_64%3Clibunwind%3A%3ALocalAddressSpace%3E%3A%3AstepWithCompactEncodingRBPFrame(unsigned%20int%2C%20unsigned%20long%20long%2C%20libunwind%3A%3ALocalAddressSpace%26%2C%20libunwind%3A%3ARegisters_x86_64%26)%27)%20%3D%200&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=false&omit_field_name=&omit_field_value=&omit_field_opt=#samplereports,+day
,
Oct 24 2017
This doesn't look related to the the lifetime issue but the periodic sampling is likely contributing to more crashes simply because it's taking a lot more samples.
,
Oct 24 2017
Isn't it back on now, though? We haven't seen a crash for 10 days, so if it's the raised volume bringing the crashes out, we'd still be seeing them. Might be total coincidence. Or maybe, some function with tricky compact unwind info was being called a lot, and some non-sampler related change made it less so?
,
Oct 24 2017
You're right, I just happened to look at the crashes that occurred during periodic sampling. :) (Which is back on since 63.0.3232.0.) The number and rate of these crashes is quite low given the number of samples taken, so I'd be inclined to chalk it up to coincidence.
,
Jan 10 2018
While triaging recent crashes on the GPU trybots, this has been seen again, at least once: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19580 https://chromium-swarm.appspot.com/task?id=3add2463f443f510&refresh=10&show_raw=1 Two tests failed because of this crash in the stack sampler: WebglConformance_conformance2_context_constants_and_properties_2 WebglConformance_conformance2_glsl3_array_as_return_value Received signal 11 SEGV_MAPERR ffffffffffffffea 0 Chromium Framework 0x000000011cfa169c base::debug::StackTrace::StackTrace(unsigned long) + 28 1 Chromium Framework 0x000000011cfa14d1 base::debug::(anonymous namespace)::StackDumpSignalHandler(int, __siginfo*, void*) + 2401 2 libsystem_platform.dylib 0x00007fff920fbb3a _sigtramp + 26 3 ??? 0x0000000100000001 0x0 + 4294967297 4 libunwind.dylib 0x00007fff921300d1 libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::step() + 113 5 Chromium Framework 0x000000011d017ec4 _ZN4base12_GLOBAL__N_120WalkStackFromContextIZNS0_21NativeStackSamplerMac27SuspendThreadAndRecordStackEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleEE3$_1EEbP13unw_context_tmPmPNSt3__16vectorINS6_6ModuleENSD_9allocatorISF_EEEEPNSE_INS0_11ModuleIndexENSG_ISK_EEEERKT_ + 292 6 Chromium Framework 0x000000011d0171cc base::(anonymous namespace)::NativeStackSamplerMac::RecordStackSample(base::NativeStackSampler::StackBuffer*, base::StackSamplingProfiler::Sample*) + 924 7 Chromium Framework 0x000000011d01ad5f base::StackSamplingProfiler::SamplingThread::RecordSample(base::StackSamplingProfiler::SamplingThread::CollectionContext*) + 1087 8 Chromium Framework 0x000000011d01b0bc base::StackSamplingProfiler::SamplingThread::PerformCollectionTask(int) + 236 9 Chromium Framework 0x000000011cfa2015 base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) + 261 10 Chromium Framework 0x000000011cfdcc89 base::internal::IncomingTaskQueue::RunTask(base::PendingTask*) + 121 11 Chromium Framework 0x000000011cfe141a base::MessageLoop::RunTask(base::PendingTask*) + 618 12 Chromium Framework 0x000000011cfe1803 base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) + 195 13 Chromium Framework 0x000000011cfe1d97 base::MessageLoop::DoDelayedWork(base::TimeTicks*) + 679 14 Chromium Framework 0x000000011cfe4fb3 base::MessagePumpCFRunLoopBase::RunWork() + 67 15 Chromium Framework 0x000000011cfc8e5a base::mac::CallWithEHFrame(void () block_pointer) + 10 16 Chromium Framework 0x000000011cfe487f base::MessagePumpCFRunLoopBase::RunWorkSource(void*) + 63 17 CoreFoundation 0x00007fff7c75d321 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17 18 CoreFoundation 0x00007fff7c73e21d __CFRunLoopDoSources0 + 557 19 CoreFoundation 0x00007fff7c73d716 __CFRunLoopRun + 934 20 CoreFoundation 0x00007fff7c73d114 CFRunLoopRunSpecific + 420 21 Chromium Framework 0x000000011cfe543f base::MessagePumpCFRunLoop::DoRun(base::MessagePump::Delegate*) + 79 22 Chromium Framework 0x000000011cfe42ce base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*) + 110 23 Chromium Framework 0x000000011cfe0d09 base::MessageLoop::Run(bool) + 169 24 Chromium Framework 0x000000011d01e2c9 base::RunLoop::Run() + 249 25 Chromium Framework 0x000000011d06bfbe base::Thread::Run(base::RunLoop*) + 206 26 Chromium Framework 0x000000011d06c57c base::Thread::ThreadMain() + 908 27 Chromium Framework 0x000000011d0619af base::(anonymous namespace)::ThreadFunc(void*) + 95 28 libsystem_pthread.dylib 0x00007fff9210593b _pthread_body + 180 29 libsystem_pthread.dylib 0x00007fff92105887 _pthread_body + 0 30 libsystem_pthread.dylib 0x00007fff9210508d thread_start + 13 [end of stack trace] Can investigation on this bug be resumed? We are trying to stamp out the remaining sources of flakiness in these tests. Thanks.
,
Jan 11 2018
How can I get the minidump? I obviously found the original one somehow when this bug was filed but I'm coming up empty on the logs from comment #10.
,
Jan 11 2018
Unfortunately it looks like however that crash was detected, it didn't generate a minidump, so the harness didn't find one to upload to cloud storage.
,
Jan 18 2018
Here's another one where a minidump was uploaded to cloud storage: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/630418 https://chromium-swarm.appspot.com/task?id=3b1fe2a45643cf10&refresh=10&show_raw=1 The minidump should be here: gs://chrome-telemetry-output/minidump-2018-01-17_18-24-48-929460.dmp Here's the symbolized stack trace from the minidump: Thread 5 (crashed) 0 libunwind.dylib + 0x1195 rax = 0x00007fffb6bd2195 rdx = 0x0000000000000161 rcx = 0x000070000a601300 rbx = 0x000070000a6012b0 rsi = 0xffffffffffffffe8 rdi = 0x0000000000000000 rbp = 0x000070000a601270 rsp = 0x000070000a601268 r8 = 0x0000000000000000 r9 = 0x00007fffb6bd2214 r10 = 0x0000000110e69e20 r11 = 0x0000000110e69dc0 r12 = 0x00007fd5eac12420 r13 = 0x000070000a6012b0 r14 = 0x0000000000000000 r15 = 0x000070000a601e10 rip = 0x00007fffb6bd2195 Found by: given as instruction pointer in context 1 libunwind.dylib + 0x10d1 rbp = 0x000070000a6012a0 rsp = 0x000070000a601280 rip = 0x00007fffb6bd20d1 Found by: previous frame's frame pointer 2 Chromium Framework!__ZN4base12_GLOBAL__N_120WalkStackFromContextIZNS0_21NativeStackSamplerMac27SuspendThreadAndRecordStackEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleEE3$_1EEbP13unw_context_tmPmPNSt3__16vectorINS6_6ModuleENSD_9allocatorISF_EEEEPNSE_INS0_11ModuleIndexENSG_ISK_EEEERKT_ + 0x124 rbp = 0x000070000a6017b0 rsp = 0x000070000a6012b0 rip = 0x0000000110dcfda4 Found by: previous frame's frame pointer 3 Chromium Framework!__ZN4base12_GLOBAL__N_121NativeStackSamplerMac17RecordStackSampleEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleE + 0x39c rbp = 0x000070000a601e50 rsp = 0x000070000a6017c0 rip = 0x0000000110dcf0ac Found by: previous frame's frame pointer 4 Chromium Framework!__ZN4base21StackSamplingProfiler14SamplingThread12RecordSampleEPNS1_17CollectionContextE + 0x43f rbp = 0x000070000a601fc0 rsp = 0x000070000a601e60 rip = 0x0000000110dd2c1f Found by: previous frame's frame pointer 5 Chromium Framework!__ZN4base21StackSamplingProfiler14SamplingThread21PerformCollectionTaskEi + 0xec rbp = 0x000070000a602160 rsp = 0x000070000a601fd0 rip = 0x0000000110dd2f7c Found by: previous frame's frame pointer 6 Chromium Framework!__ZN4base5debug13TaskAnnotator7RunTaskEPKcPNS_11PendingTaskE + 0x105 rbp = 0x000070000a602340 rsp = 0x000070000a602170 rip = 0x0000000110d594e5 Found by: previous frame's frame pointer 7 Chromium Framework!__ZN4base8internal17IncomingTaskQueue7RunTaskEPNS_11PendingTaskE + 0x79 rbp = 0x000070000a602490 rsp = 0x000070000a602350 rip = 0x0000000110d943c9 Found by: previous frame's frame pointer 8 Chromium Framework!__ZN4base11MessageLoop7RunTaskEPNS_11PendingTaskE + 0x26a rbp = 0x000070000a602680 rsp = 0x000070000a6024a0 rip = 0x0000000110d98d2a Found by: previous frame's frame pointer 9 Chromium Framework!__ZN4base11MessageLoop21DeferOrRunPendingTaskENS_11PendingTaskE + 0xc3 rbp = 0x000070000a602840 rsp = 0x000070000a602690 rip = 0x0000000110d99113 Found by: previous frame's frame pointer 10 Chromium Framework!__ZN4base11MessageLoop13DoDelayedWorkEPNS_9TimeTicksE + 0x2a7 rbp = 0x000070000a602a60 rsp = 0x000070000a602850 rip = 0x0000000110d996a7 Found by: previous frame's frame pointer 11 Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase7RunWorkEv + 0x43 rbp = 0x000070000a602a90 rsp = 0x000070000a602a70 rip = 0x0000000110d9c8c3 Found by: previous frame's frame pointer 12 Chromium Framework!__ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE + 0xa rbp = 0x000070000a602aa0 rsp = 0x000070000a602aa0 rip = 0x0000000110d8048a Found by: previous frame's frame pointer 13 Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase13RunWorkSourceEPv + 0x3f rbp = 0x000070000a602ae0 rsp = 0x000070000a602ab0 rip = 0x0000000110d9c18f Found by: previous frame's frame pointer
,
Jan 18 2018
Thanks! It's unwinding _tiny_free_list_add_ptr in libsystem_malloc which sounds pretty familiar from the original bug. We have trouble unwinding in there in general but typically, we recover well (though with a truncated stack). I'm assuming the problem here is that we were in the epilogue and our mitigations didn't trigger. Might be able to come up with a new mitigation for this one if I can get it live in the debugger, but not so sure from the minidump. I'm very curious why the GPU tests seem to tickle this one so often, since I've never seen it in the wild. wittman@ what do you think about bailing if we're in malloc until we have some way of doing prologue analysis? kbr@ would a lever to turn off stack sampler for the GPU tests be an OK solution for you in the meantime?
,
Jan 18 2018
Re bailing in libsystem_malloc: It looks like 4.4% of non-idle stacks end in a frame in libsystem_malloc.dylib, so that's a pretty significant hit to take on recovery rate. Plus memory allocations and deallocations are more likely than other code to be a performance sink and therefore something we'd want to have insight on (see issue 786597 as an example). A recent release filtered to the reported libsystem_malloc functions shows we'd lose ~200ms of runtime: https://uma.googleplex.com/p/chrome/callstacks?sid=c8d7ccc562032a6901b7fab824300e25. This is a bit of an overestimate since we'd only lose stacks with the last frame in libsystem_malloc but probably not by that much. If this flakiness is sufficiently painful to the graphics team that we need a solution now, then deploying the libsystem_malloc workaround specifically for test execution seems like it would be OK. I don't think turning off profiling entirely on the GPU tests is a good idea because leaves us blind to issues in that process before they land in canary.
,
Jan 18 2018
I'm not in favor of turning off stack sampling in the GPU tests. These tests are some of the few on the waterfalls and commit queue which run the full browser, and we should be testing the configuration we ship. For this reason I'm also not in favor of deploying a libsystem_malloc filter just for test execution. The GPU tests run a lot of small web pages which presumably do a lot of memory allocations and deallocations in the same renderer process. Maybe that's why they stress this code path more than others on the waterfall. We should consider this as a real-world scenario. Any mitigation which can be considered for shipping in Chrome's default configuration is welcome.
,
Jan 18 2018
Do we know why we're not seeing these from the wild? Could it be some configuration or build settings difference between what we ship as Canary vs. what the GPU bots run?
,
Feb 24 2018
Seen again here: http://crbug.com/815319 https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20FYI%20Experimental%20Release%20%28Intel%29/274 https://chromium-swarm.appspot.com/task?id=3bdd5cb290edf910&refresh=10&show_raw=1 Minidump is here: Minidump found: /b/s/w/itULvpTL/tmpKbB6It/completed/bf846622-8967-4ccd-a5c2-b13b5d96cf7d.dmp Uploading /b/s/w/itULvpTL/tmpKbB6It/completed/bf846622-8967-4ccd-a5c2-b13b5d96cf7d.dmp to gs://chrome-telemetry-output/minidump-2018-02-23_13-24-31-123721.dmp Stack trace: Operating system: Mac OS X 10.13.4 17E139j CPU: amd64 family 6 model 69 stepping 1 4 CPUs GPU: UNKNOWN Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash address: 0x7fff52c97000 Process uptime: 2 seconds Thread 4 (crashed) 0 libunwind.dylib + 0x494c rax = 0x0000000000000001 rdx = 0x00007fff52c65a3c rcx = 0x00000000ffffffff rbx = 0x00007fff52c97000 rsi = 0x000070000597dfd0 rdi = 0x00007fff8acdca99 rbp = 0x000070000597e000 rsp = 0x000070000597dfb0 r8 = 0x00007fff52c96ff0 r9 = 0x0000000000000000 r10 = 0x0000000000000008 r11 = 0x00007fff52c96bf9 r12 = 0x00007fff52c8b969 r13 = 0x00007fff52c97000 r14 = 0x00007fff8acdca99 r15 = 0x00007fff52c980a0 rip = 0x00007fff52c6894c Found by: given as instruction pointer in context 1 libunwind.dylib + 0x4768 rbp = 0x000070000597e860 rsp = 0x000070000597e010 rip = 0x00007fff52c68768 Found by: previous frame's frame pointer 2 libunwind.dylib + 0xbe2 rbp = 0x000070000597f0b0 rsp = 0x000070000597e870 rip = 0x00007fff52c64be2 Found by: previous frame's frame pointer 3 libunwind.dylib + 0xb2b rbp = 0x000070000597f0d0 rsp = 0x000070000597f0c0 rip = 0x00007fff52c64b2b Found by: previous frame's frame pointer 4 Chromium Framework!__ZN4base12_GLOBAL__N_120WalkStackFromContextIZNS0_21NativeStackSamplerMac27SuspendThreadAndRecordStackEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleEE3$_1EEbP13unw_context_tmPmPNSt3__16vectorINS6_6ModuleENSD_9allocatorISF_EEEEPNSE_INS0_11ModuleIndexENSG_ISK_EEEERKT_ + 0x3b rbp = 0x000070000597f5e0 rsp = 0x000070000597f0e0 rip = 0x0000000118150d9b Found by: previous frame's frame pointer 5 Chromium Framework!__ZN4base12_GLOBAL__N_121NativeStackSamplerMac17RecordStackSampleEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleE + 0x39c rbp = 0x000070000597fc80 rsp = 0x000070000597f5f0 rip = 0x000000011815019c Found by: previous frame's frame pointer 6 Chromium Framework!__ZN4base21StackSamplingProfiler14SamplingThread12RecordSampleEPNS1_17CollectionContextE + 0x43f rbp = 0x000070000597fdf0 rsp = 0x000070000597fc90 rip = 0x0000000118153d1f Found by: previous frame's frame pointer 7 Chromium Framework!__ZN4base21StackSamplingProfiler14SamplingThread21PerformCollectionTaskEi + 0xea rbp = 0x000070000597ff90 rsp = 0x000070000597fe00 rip = 0x000000011815406a Found by: previous frame's frame pointer 8 Chromium Framework!__ZN4base5debug13TaskAnnotator7RunTaskEPKcPNS_11PendingTaskE + 0x105 rbp = 0x0000700005980170 rsp = 0x000070000597ffa0 rip = 0x00000001180d9a75 Found by: previous frame's frame pointer 9 Chromium Framework!__ZN4base8internal17IncomingTaskQueue7RunTaskEPNS_11PendingTaskE + 0x79 rbp = 0x00007000059802c0 rsp = 0x0000700005980180 rip = 0x00000001181146e9 Found by: previous frame's frame pointer 10 Chromium Framework!__ZN4base11MessageLoop7RunTaskEPNS_11PendingTaskE + 0x257 rbp = 0x00007000059804b0 rsp = 0x00007000059802d0 rip = 0x0000000118118f87 Found by: previous frame's frame pointer 11 Chromium Framework!__ZN4base11MessageLoop21DeferOrRunPendingTaskENS_11PendingTaskE + 0xba rbp = 0x0000700005980660 rsp = 0x00007000059804c0 rip = 0x000000011811934a Found by: previous frame's frame pointer 12 Chromium Framework!__ZN4base11MessageLoop6DoWorkEv + 0x23c rbp = 0x00007000059808f0 rsp = 0x0000700005980670 rip = 0x00000001181195bc Found by: previous frame's frame pointer 13 Chromium Framework!__ZN4base18MessagePumpDefault3RunEPNS_11MessagePump8DelegateE + 0xdc rbp = 0x0000700005980940 rsp = 0x0000700005980900 rip = 0x000000011811ab3c Found by: previous frame's frame pointer 14 Chromium Framework!__ZN4base11MessageLoop3RunEb + 0xa9 rbp = 0x0000700005980aa0 rsp = 0x0000700005980950 rip = 0x00000001181188a9 Found by: previous frame's frame pointer 15 Chromium Framework!__ZN4base7RunLoop3RunEv + 0xf9 rbp = 0x0000700005980bf0 rsp = 0x0000700005980ab0 rip = 0x00000001181570a9 Found by: previous frame's frame pointer 16 Chromium Framework!__ZN4base6Thread3RunEPNS_7RunLoopE + 0xce rbp = 0x0000700005980d40 rsp = 0x0000700005980c00 rip = 0x00000001181a2ece Found by: previous frame's frame pointer 17 Chromium Framework!__ZN4base6Thread10ThreadMainEv + 0x38c rbp = 0x0000700005980ec0 rsp = 0x0000700005980d50 rip = 0x00000001181a348c Found by: previous frame's frame pointer 18 Chromium Framework!__ZN4base12_GLOBAL__N_110ThreadFuncEPv + 0x5f rbp = 0x0000700005980ef0 rsp = 0x0000700005980ed0 rip = 0x000000011819e16f Found by: previous frame's frame pointer 19 libsystem_pthread.dylib + 0x36c1 rbp = 0x0000700005980f20 rsp = 0x0000700005980f00 rip = 0x00007fff52c386c1 Found by: previous frame's frame pointer 20 libsystem_pthread.dylib + 0x356d rbp = 0x0000700005980f50 rsp = 0x0000700005980f30 rip = 0x00007fff52c3856d Found by: previous frame's frame pointer 21 libsystem_pthread.dylib + 0x2c5d rbp = 0x0000700005980f78 rsp = 0x0000700005980f60 rip = 0x00007fff52c37c5d Found by: previous frame's frame pointer 22 Chromium Framework!__ZN4base14PlatformThread6DetachENS_20PlatformThreadHandleE + 0x70 rsp = 0x0000700005981028 rip = 0x000000011819e110 Found by: stack scanning
,
Feb 24 2018
,
Mar 6 2018
More occurrences: https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20FYI%20Retina%20Release%20%28AMD%29/753 minidump: gs://chrome-telemetry-output/minidump-2018-03-06_14-25-17-19742.dmp https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20FYI%20Experimental%20Retina%20Release%20%28AMD%29/522 minidump: gs://chrome-telemetry-output/minidump-2018-03-06_06-50-15-795793.dmp
,
Mar 7 2018
Stack with more symbols here: https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20FYI%20Experimental%20Release%20%28Intel%29/570 0 Chromium Framework 0x000000010a4d066c base::debug::StackTrace::StackTrace(unsigned long) + 28 1 Chromium Framework 0x000000010a4d04a1 base::debug::(anonymous namespace)::StackDumpSignalHandler(int, __siginfo*, void*) + 2401 2 libsystem_platform.dylib 0x00007fff71dc0f5a _sigtramp + 26 3 ??? 0x0000000000000000 0x0 + 0 4 libunwind.dylib 0x00007fff71dfa768 libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::getInfoFromDwarfSection(unsigned long long, unsigned long long, unsigned long long, unsigned int, unsigned int) + 86 5 libunwind.dylib 0x00007fff71df6be2 libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::setInfoBasedOnIPRegister(bool) + 172 6 libunwind.dylib 0x00007fff71df6b2b unw_init_local + 104 7 Chromium Framework 0x000000010a54834b _ZN4base12_GLOBAL__N_120WalkStackFromContextIZNS0_21NativeStackSamplerMac27SuspendThreadAndRecordStackEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleEE3$_1EEbP13unw_context_tmPmPNSt3__16vectorINS6_6ModuleENSD_9allocatorISF_EEEEPNSE_INS0_11ModuleIndexENSG_ISK_EEEERKT_ + 59 8 Chromium Framework 0x000000010a54774c base::(anonymous namespace)::NativeStackSamplerMac::RecordStackSample(base::NativeStackSampler::StackBuffer*, base::StackSamplingProfiler::Sample*) + 924 9 Chromium Framework 0x000000010a54b2cf base::StackSamplingProfiler::SamplingThread::RecordSample(base::StackSamplingProfiler::SamplingThread::CollectionContext*) + 1087 10 Chromium Framework 0x000000010a54b61a base::StackSamplingProfiler::SamplingThread::PerformCollectionTask(int) + 234 11 Chromium Framework 0x000000010a4d0fe4 base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) + 260 12 Chromium Framework 0x000000010a50bdf9 base::internal::IncomingTaskQueue::RunTask(base::PendingTask*) + 121 13 Chromium Framework 0x000000010a510697 base::MessageLoop::RunTask(base::PendingTask*) + 599 14 Chromium Framework 0x000000010a510a5a base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) + 186 15 Chromium Framework 0x000000010a510fb7 base::MessageLoop::DoDelayedWork(base::TimeTicks*) + 679 16 Chromium Framework 0x000000010a5121dc base::MessagePumpDefault::Run(base::MessagePump::Delegate*) + 108 17 Chromium Framework 0x000000010a50ffb9 base::MessageLoop::Run(bool) + 169 18 Chromium Framework 0x000000010a54e659 base::RunLoop::Run() + 249 19 Chromium Framework 0x000000010a59b3ce base::Thread::Run(base::RunLoop*) + 206 20 Chromium Framework 0x000000010a59b98c base::Thread::ThreadMain() + 908 21 Chromium Framework 0x000000010a59666f base::(anonymous namespace)::ThreadFunc(void*) + 95 22 libsystem_pthread.dylib 0x00007fff71dca6c1 _pthread_body + 340 23 libsystem_pthread.dylib 0x00007fff71dca56d _pthread_body + 0 24 libsystem_pthread.dylib 0x00007fff71dc9c5d thread_start + 13 [end of stack trace] Found crashpad_database_util Minidump found: /b/s/w/ityUiYAk/tmpygLy3b/completed/4b5dec2b-3c67-4b52-aff4-ce26380450a5.dmp Uploading /b/s/w/ityUiYAk/tmpygLy3b/completed/4b5dec2b-3c67-4b52-aff4-ce26380450a5.dmp to gs://chrome-telemetry-output/minidump-2018-03-07_14-41-21-769356.dmp
,
Mar 10 2018
,
Mar 10 2018
Note: more NativeStackSamplerMac crashes seen in Issue 820677 . A couple of minidumps are on that bug report. Not sure whether they're the cause of the test failure but they should be investigated.
,
Mar 13 2018
From offline discussion, Leonard currently doesn't have bandwidth to investigate this so I'll take it. I don't anticipate having a solution for this soon so I'll disable the profiler on the GPU main thread until it can be addressed. Hopefully we should have enough to go on now from the collected minidumps.
,
Mar 13 2018
Issue 820677 has been merged into this issue.
,
Mar 15 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/43a4f5d018220e3310001433f0e76eff146ef328 commit 43a4f5d018220e3310001433f0e76eff146ef328 Author: Mike Wittman <wittman@chromium.org> Date: Thu Mar 15 22:34:38 2018 Sampling profiler: disable GPU main thread profiling on OS X Disabling pending a resolution to crashes observed in the associated bug. This change also removes the unused GetSamplingParamsForCurrentProcess function and adapts IsProfilerEnabledForCurrentProcess to operate on the current process and specified thread. The call to IsProfilerEnabledForCurrentProcess is also removed from SetServiceManagerConnectorForChildProcess since that function is only invoked in processes supporting the profiler. Bug: 774682 Change-Id: Ibbc6f1bd9348ba09a3ee4db2e1595411617f1ccd Reviewed-on: https://chromium-review.googlesource.com/962937 Commit-Queue: Mike Wittman <wittman@chromium.org> Reviewed-by: Scott Violet <sky@chromium.org> Reviewed-by: Leonard Grey <lgrey@chromium.org> Cr-Commit-Position: refs/heads/master@{#543520} [modify] https://crrev.com/43a4f5d018220e3310001433f0e76eff146ef328/chrome/common/stack_sampling_configuration.cc [modify] https://crrev.com/43a4f5d018220e3310001433f0e76eff146ef328/chrome/common/stack_sampling_configuration.h [modify] https://crrev.com/43a4f5d018220e3310001433f0e76eff146ef328/chrome/common/thread_profiler.cc
,
Mar 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/150adef2ab46eca8c1cd37567fb4ad5686e66506 commit 150adef2ab46eca8c1cd37567fb4ad5686e66506 Author: David Trainor <dtrainor@chromium.org> Date: Fri Mar 16 04:07:25 2018 Revert "Sampling profiler: disable GPU main thread profiling on OS X" This reverts commit 43a4f5d018220e3310001433f0e76eff146ef328. Reason for revert: Looks like the changes to thread_profiler broke some Android tests? (ToastHWATest): https://ci.chromium.org/buildbot/chromium.android/Lollipop%20Phone%20Tester/19475 [FATAL:thread_profiler.cc(154)] Check failed: metrics::CallStackProfileParams::BROWSER_PROCESS != GetProcess() (1 vs. 1) Reverting. Sorry! Original change's description: > Sampling profiler: disable GPU main thread profiling on OS X > > Disabling pending a resolution to crashes observed in the associated > bug. > > This change also removes the unused > GetSamplingParamsForCurrentProcess function and adapts > IsProfilerEnabledForCurrentProcess to operate on the current process > and specified thread. > > The call to IsProfilerEnabledForCurrentProcess is also removed > from SetServiceManagerConnectorForChildProcess since that function is > only invoked in processes supporting the profiler. > > Bug: 774682 > Change-Id: Ibbc6f1bd9348ba09a3ee4db2e1595411617f1ccd > Reviewed-on: https://chromium-review.googlesource.com/962937 > Commit-Queue: Mike Wittman <wittman@chromium.org> > Reviewed-by: Scott Violet <sky@chromium.org> > Reviewed-by: Leonard Grey <lgrey@chromium.org> > Cr-Commit-Position: refs/heads/master@{#543520} TBR=sky@chromium.org,wittman@chromium.org,lgrey@chromium.org Change-Id: I2ecac49a6e14de7f620dc0096ce229f25057c810 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 774682 Reviewed-on: https://chromium-review.googlesource.com/965821 Reviewed-by: David Trainor <dtrainor@chromium.org> Commit-Queue: David Trainor <dtrainor@chromium.org> Cr-Commit-Position: refs/heads/master@{#543613} [modify] https://crrev.com/150adef2ab46eca8c1cd37567fb4ad5686e66506/chrome/common/stack_sampling_configuration.cc [modify] https://crrev.com/150adef2ab46eca8c1cd37567fb4ad5686e66506/chrome/common/stack_sampling_configuration.h [modify] https://crrev.com/150adef2ab46eca8c1cd37567fb4ad5686e66506/chrome/common/thread_profiler.cc
,
Mar 19 2018
I think we have a solution for the crashes now, so rather than trying to fix the reverted disable CL I will implement that.
,
Mar 29 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/556b70ba7c6c0ca2a87e6e68196e4c13acd11af7 commit 556b70ba7c6c0ca2a87e6e68196e4c13acd11af7 Author: Mike Wittman <wittman@chromium.org> Date: Thu Mar 29 23:19:28 2018 Work around libunwind crash accessing memory past mapped libraries In some unwinds seen on recent OS X beta versions unw_init_local attempts to access memory past the end of the mapped libraries. This region appears to be protected or unmapped, resulting in crashes. Getting a fix for libunwind into an OS X release may take 18 months (if accepted at all). This workaround will avoid the crashes in the mean time. 57% of the Mac profiler crashes seen over the last five months were due to this bug. Bug: 774682 Change-Id: I260b03adea433f93871c7628eb38886aa877e549 Reviewed-on: https://chromium-review.googlesource.com/969830 Commit-Queue: Mike Wittman <wittman@chromium.org> Reviewed-by: Mark Mentovai <mark@chromium.org> Reviewed-by: Robert Sesek <rsesek@chromium.org> Reviewed-by: Leonard Grey <lgrey@chromium.org> Cr-Commit-Position: refs/heads/master@{#547012} [modify] https://crrev.com/556b70ba7c6c0ca2a87e6e68196e4c13acd11af7/base/profiler/native_stack_sampler_mac.cc
,
Apr 2 2018
The previous change should address the main crasher that was causing test flakes. If we see additional profiler issues please open a new bug and assign to me.
,
Apr 11 2018
,
Nov 28
,
Nov 28
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by lgrey@chromium.org
, Oct 13 2017Owner: lgrey@chromium.org