New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 774682 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug

Blocked on:
issue 820677

Blocking:
issue 531673
issue 831448



Sign in to add a comment

Stack sampler crashes on Mac Intel GPU bots

Project Member Reported by kbr@chromium.org, Oct 13 2017

Issue description

In this tryjob:
https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/565544

WebglConformance_conformance_glsl_bugs_compound_assignment_type_combination failed because of a crash inside the NativeStackSamplerMac:

  	Operating system: Mac OS X
  	                  10.12.6 16G29
  	CPU: amd64
  	     family 6 model 69 stepping 1
  	     4 CPUs
  	
  	GPU: UNKNOWN
  	
  	Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
  	Crash address: 0xffffffffffffffda
  	Process uptime: 5 seconds
  	
  	Thread 5 (crashed)
  	 0  libunwind.dylib + 0x1195
  	    rax = 0x00007fffdb97e195   rdx = 0x00000000000058d1
  	    rcx = 0x000070000ffb3340   rbx = 0x000070000ffb32f0
  	    rsi = 0xffffffffffffffda   rdi = 0x0000000000000000
  	    rbp = 0x000070000ffb32b0   rsp = 0x000070000ffb32a8
  	     r8 = 0x0000000000000002    r9 = 0x00007fffdb97e214
  	    r10 = 0x000000011973afb0   r11 = 0x000000011973ad70
  	    r12 = 0x00007fc66c50ee20   r13 = 0x000070000ffb32f0
  	    r14 = 0x0000000000000000   r15 = 0x000070000ffb3e50
  	    rip = 0x00007fffdb97e195
  	    Found by: given as instruction pointer in context
  	 1  libunwind.dylib + 0x10d1
  	    rbp = 0x000070000ffb32e0   rsp = 0x000070000ffb32c0
  	    rip = 0x00007fffdb97e0d1
  	    Found by: previous frame's frame pointer
  	 2  Chromium Framework!__ZN4base12_GLOBAL__N_120WalkStackFromContextIZNS0_21NativeStackSamplerMac27SuspendThreadAndRecordStackEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleEE3$_1EEbP13unw_context_tmPmPNSt3__16vectorINS6_6ModuleENSD_9allocatorISF_EEEEPNSE_INS0_11ModuleIndexENSG_ISK_EEEERKT_ + 0x124
  	    rbp = 0x000070000ffb37f0   rsp = 0x000070000ffb32f0
  	    rip = 0x00000001196a3da4
  	    Found by: previous frame's frame pointer
  	 3  Chromium Framework!__ZN4base12_GLOBAL__N_121NativeStackSamplerMac17RecordStackSampleEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleE + 0x570
  	    rbp = 0x000070000ffb3e90   rsp = 0x000070000ffb3800
  	    rip = 0x00000001196a32e0
  	    Found by: previous frame's frame pointer
  	 4  Chromium Framework!__ZN4base21StackSamplingProfiler14SamplingThread12RecordSampleEPNS1_17CollectionContextE + 0x43f
  	    rbp = 0x000070000ffb4000   rsp = 0x000070000ffb3ea0
  	    rip = 0x00000001196a6c4f
  	    Found by: previous frame's frame pointer
  	 5  Chromium Framework!__ZN4base21StackSamplingProfiler14SamplingThread21PerformCollectionTaskEi + 0xec
  	    rbp = 0x000070000ffb41a0   rsp = 0x000070000ffb4010
  	    rip = 0x00000001196a6fac
  	    Found by: previous frame's frame pointer
  	 6  Chromium Framework!__ZN4base5debug13TaskAnnotator7RunTaskEPKcPNS_11PendingTaskE + 0x105
  	    rbp = 0x000070000ffb4380   rsp = 0x000070000ffb41b0
  	    rip = 0x000000011962e2d5
  	    Found by: previous frame's frame pointer
  	 7  Chromium Framework!__ZN4base8internal17IncomingTaskQueue7RunTaskEPNS_11PendingTaskE + 0x79
  	    rbp = 0x000070000ffb44d0   rsp = 0x000070000ffb4390
  	    rip = 0x00000001196696c9
  	    Found by: previous frame's frame pointer
  	 8  Chromium Framework!__ZN4base11MessageLoop7RunTaskEPNS_11PendingTaskE + 0x207
  	    rbp = 0x000070000ffb4690   rsp = 0x000070000ffb44e0
  	    rip = 0x000000011966dcd7
  	    Found by: previous frame's frame pointer
  	 9  Chromium Framework!__ZN4base11MessageLoop21DeferOrRunPendingTaskENS_11PendingTaskE + 0xca
  	    rbp = 0x000070000ffb4850   rsp = 0x000070000ffb46a0
  	    rip = 0x000000011966e01a
  	    Found by: previous frame's frame pointer
  	10  Chromium Framework!__ZN4base11MessageLoop13DoDelayedWorkEPNS_9TimeTicksE + 0x2b8
  	    rbp = 0x000070000ffb4a70   rsp = 0x000070000ffb4860
  	    rip = 0x000000011966e5a8
  	    Found by: previous frame's frame pointer
  	11  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase7RunWorkEv + 0x43
  	    rbp = 0x000070000ffb4aa0   rsp = 0x000070000ffb4a80
  	    rip = 0x00000001196717c3
  	    Found by: previous frame's frame pointer
  	12  Chromium Framework!__ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE + 0xa
  	    rbp = 0x000070000ffb4ab0   rsp = 0x000070000ffb4ab0
  	    rip = 0x000000011965468a
  	    Found by: previous frame's frame pointer
  	13  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase13RunWorkSourceEPv + 0x3f
  	    rbp = 0x000070000ffb4af0   rsp = 0x000070000ffb4ac0
  	    rip = 0x000000011967108f
  	    Found by: previous frame's frame pointer
  	14  CoreFoundation + 0xa7321
  	    rbp = 0x000070000ffb4b00   rsp = 0x000070000ffb4b00
  	    rip = 0x00007fffc5fab321
  	    Found by: previous frame's frame pointer

lgrey@ or avi@, could you please investigate this failure mode of the stack walker? It's affected at least this job on the CQ and impacts browser stability. Thanks.

 

Comment 1 by lgrey@chromium.org, Oct 13 2017

Cc: -lgrey@chromium.org
Owner: lgrey@chromium.org

Comment 2 by lgrey@chromium.org, Oct 16 2017

Cc: wittman@chromium.org
Status: Assigned (was: Untriaged)

Comment 3 by lgrey@chromium.org, Oct 16 2017

kbr@ have you seen this happen in other runs? I'm not getting very far with the minidump and trying to find a way to get a live repro

Comment 4 by kbr@chromium.org, Oct 16 2017

Labels: -Pri-1 Pri-2
It looks like this hasn't happened recently on either of these waterfall bots:

https://ci.chromium.org/buildbot/chromium.gpu/Mac%20Release%20%28Intel%29/?limit=200
https://ci.chromium.org/buildbot/chromium.gpu/Mac%20Retina%20Release%20%28AMD%29/?limit=200

or tryservers:

https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/?limit=200
https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/?limit=200

Downgrading to P2. Is there something we should change about the minidump uploading to make these more debuggable after the fact?

Comment 5 by lgrey@chromium.org, Oct 17 2017

No, the minidump itself is fine, it's just been harder to extract the root cause than the other sampler crashes. Thanks for looking!
This doesn't look related to the the lifetime issue but the periodic sampling is likely contributing to more crashes simply because it's taking a lot more samples. 

Comment 8 by lgrey@chromium.org, Oct 24 2017

Isn't it back on now, though? We haven't seen a crash for 10 days, so if it's the raised volume bringing the crashes out, we'd still be seeing them.

Might be total coincidence. Or maybe, some function with tricky compact unwind info was being called a lot, and some non-sampler related change made it less so?
You're right, I just happened to look at the crashes that occurred during periodic sampling. :) (Which is back on since 63.0.3232.0.)

The number and rate of these crashes is quite low given the number of samples taken, so I'd be inclined to chalk it up to coincidence. 

Comment 10 by kbr@chromium.org, Jan 10 2018

While triaging recent crashes on the GPU trybots, this has been seen again, at least once:

https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19580

https://chromium-swarm.appspot.com/task?id=3add2463f443f510&refresh=10&show_raw=1

Two tests failed because of this crash in the stack sampler:
WebglConformance_conformance2_context_constants_and_properties_2
WebglConformance_conformance2_glsl3_array_as_return_value

Received signal 11 SEGV_MAPERR ffffffffffffffea
0   Chromium Framework                  0x000000011cfa169c base::debug::StackTrace::StackTrace(unsigned long) + 28
1   Chromium Framework                  0x000000011cfa14d1 base::debug::(anonymous namespace)::StackDumpSignalHandler(int, __siginfo*, void*) + 2401
2   libsystem_platform.dylib            0x00007fff920fbb3a _sigtramp + 26
3   ???                                 0x0000000100000001 0x0 + 4294967297
4   libunwind.dylib                     0x00007fff921300d1 libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::step() + 113
5   Chromium Framework                  0x000000011d017ec4 _ZN4base12_GLOBAL__N_120WalkStackFromContextIZNS0_21NativeStackSamplerMac27SuspendThreadAndRecordStackEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleEE3$_1EEbP13unw_context_tmPmPNSt3__16vectorINS6_6ModuleENSD_9allocatorISF_EEEEPNSE_INS0_11ModuleIndexENSG_ISK_EEEERKT_ + 292
6   Chromium Framework                  0x000000011d0171cc base::(anonymous namespace)::NativeStackSamplerMac::RecordStackSample(base::NativeStackSampler::StackBuffer*, base::StackSamplingProfiler::Sample*) + 924
7   Chromium Framework                  0x000000011d01ad5f base::StackSamplingProfiler::SamplingThread::RecordSample(base::StackSamplingProfiler::SamplingThread::CollectionContext*) + 1087
8   Chromium Framework                  0x000000011d01b0bc base::StackSamplingProfiler::SamplingThread::PerformCollectionTask(int) + 236
9   Chromium Framework                  0x000000011cfa2015 base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) + 261
10  Chromium Framework                  0x000000011cfdcc89 base::internal::IncomingTaskQueue::RunTask(base::PendingTask*) + 121
11  Chromium Framework                  0x000000011cfe141a base::MessageLoop::RunTask(base::PendingTask*) + 618
12  Chromium Framework                  0x000000011cfe1803 base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) + 195
13  Chromium Framework                  0x000000011cfe1d97 base::MessageLoop::DoDelayedWork(base::TimeTicks*) + 679
14  Chromium Framework                  0x000000011cfe4fb3 base::MessagePumpCFRunLoopBase::RunWork() + 67
15  Chromium Framework                  0x000000011cfc8e5a base::mac::CallWithEHFrame(void () block_pointer) + 10
16  Chromium Framework                  0x000000011cfe487f base::MessagePumpCFRunLoopBase::RunWorkSource(void*) + 63
17  CoreFoundation                      0x00007fff7c75d321 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
18  CoreFoundation                      0x00007fff7c73e21d __CFRunLoopDoSources0 + 557
19  CoreFoundation                      0x00007fff7c73d716 __CFRunLoopRun + 934
20  CoreFoundation                      0x00007fff7c73d114 CFRunLoopRunSpecific + 420
21  Chromium Framework                  0x000000011cfe543f base::MessagePumpCFRunLoop::DoRun(base::MessagePump::Delegate*) + 79
22  Chromium Framework                  0x000000011cfe42ce base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*) + 110
23  Chromium Framework                  0x000000011cfe0d09 base::MessageLoop::Run(bool) + 169
24  Chromium Framework                  0x000000011d01e2c9 base::RunLoop::Run() + 249
25  Chromium Framework                  0x000000011d06bfbe base::Thread::Run(base::RunLoop*) + 206
26  Chromium Framework                  0x000000011d06c57c base::Thread::ThreadMain() + 908
27  Chromium Framework                  0x000000011d0619af base::(anonymous namespace)::ThreadFunc(void*) + 95
28  libsystem_pthread.dylib             0x00007fff9210593b _pthread_body + 180
29  libsystem_pthread.dylib             0x00007fff92105887 _pthread_body + 0
30  libsystem_pthread.dylib             0x00007fff9210508d thread_start + 13
[end of stack trace]

Can investigation on this bug be resumed? We are trying to stamp out the remaining sources of flakiness in these tests. Thanks.

Comment 11 by lgrey@chromium.org, Jan 11 2018

How can I get the minidump? I obviously found the original one somehow when this bug was filed but I'm coming up empty on the logs from comment #10.

Comment 12 by kbr@chromium.org, Jan 11 2018

Unfortunately it looks like however that crash was detected, it didn't generate a minidump, so the harness didn't find one to upload to cloud storage.

Comment 13 by kbr@chromium.org, Jan 18 2018

Here's another one where a minidump was uploaded to cloud storage:

https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/630418
https://chromium-swarm.appspot.com/task?id=3b1fe2a45643cf10&refresh=10&show_raw=1

The minidump should be here:

gs://chrome-telemetry-output/minidump-2018-01-17_18-24-48-929460.dmp

Here's the symbolized stack trace from the minidump:

  	Thread 5 (crashed)
  	 0  libunwind.dylib + 0x1195
  	    rax = 0x00007fffb6bd2195   rdx = 0x0000000000000161
  	    rcx = 0x000070000a601300   rbx = 0x000070000a6012b0
  	    rsi = 0xffffffffffffffe8   rdi = 0x0000000000000000
  	    rbp = 0x000070000a601270   rsp = 0x000070000a601268
  	     r8 = 0x0000000000000000    r9 = 0x00007fffb6bd2214
  	    r10 = 0x0000000110e69e20   r11 = 0x0000000110e69dc0
  	    r12 = 0x00007fd5eac12420   r13 = 0x000070000a6012b0
  	    r14 = 0x0000000000000000   r15 = 0x000070000a601e10
  	    rip = 0x00007fffb6bd2195
  	    Found by: given as instruction pointer in context
  	 1  libunwind.dylib + 0x10d1
  	    rbp = 0x000070000a6012a0   rsp = 0x000070000a601280
  	    rip = 0x00007fffb6bd20d1
  	    Found by: previous frame's frame pointer
  	 2  Chromium Framework!__ZN4base12_GLOBAL__N_120WalkStackFromContextIZNS0_21NativeStackSamplerMac27SuspendThreadAndRecordStackEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleEE3$_1EEbP13unw_context_tmPmPNSt3__16vectorINS6_6ModuleENSD_9allocatorISF_EEEEPNSE_INS0_11ModuleIndexENSG_ISK_EEEERKT_ + 0x124
  	    rbp = 0x000070000a6017b0   rsp = 0x000070000a6012b0
  	    rip = 0x0000000110dcfda4
  	    Found by: previous frame's frame pointer
  	 3  Chromium Framework!__ZN4base12_GLOBAL__N_121NativeStackSamplerMac17RecordStackSampleEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleE + 0x39c
  	    rbp = 0x000070000a601e50   rsp = 0x000070000a6017c0
  	    rip = 0x0000000110dcf0ac
  	    Found by: previous frame's frame pointer
  	 4  Chromium Framework!__ZN4base21StackSamplingProfiler14SamplingThread12RecordSampleEPNS1_17CollectionContextE + 0x43f
  	    rbp = 0x000070000a601fc0   rsp = 0x000070000a601e60
  	    rip = 0x0000000110dd2c1f
  	    Found by: previous frame's frame pointer
  	 5  Chromium Framework!__ZN4base21StackSamplingProfiler14SamplingThread21PerformCollectionTaskEi + 0xec
  	    rbp = 0x000070000a602160   rsp = 0x000070000a601fd0
  	    rip = 0x0000000110dd2f7c
  	    Found by: previous frame's frame pointer
  	 6  Chromium Framework!__ZN4base5debug13TaskAnnotator7RunTaskEPKcPNS_11PendingTaskE + 0x105
  	    rbp = 0x000070000a602340   rsp = 0x000070000a602170
  	    rip = 0x0000000110d594e5
  	    Found by: previous frame's frame pointer
  	 7  Chromium Framework!__ZN4base8internal17IncomingTaskQueue7RunTaskEPNS_11PendingTaskE + 0x79
  	    rbp = 0x000070000a602490   rsp = 0x000070000a602350
  	    rip = 0x0000000110d943c9
  	    Found by: previous frame's frame pointer
  	 8  Chromium Framework!__ZN4base11MessageLoop7RunTaskEPNS_11PendingTaskE + 0x26a
  	    rbp = 0x000070000a602680   rsp = 0x000070000a6024a0
  	    rip = 0x0000000110d98d2a
  	    Found by: previous frame's frame pointer
  	 9  Chromium Framework!__ZN4base11MessageLoop21DeferOrRunPendingTaskENS_11PendingTaskE + 0xc3
  	    rbp = 0x000070000a602840   rsp = 0x000070000a602690
  	    rip = 0x0000000110d99113
  	    Found by: previous frame's frame pointer
  	10  Chromium Framework!__ZN4base11MessageLoop13DoDelayedWorkEPNS_9TimeTicksE + 0x2a7
  	    rbp = 0x000070000a602a60   rsp = 0x000070000a602850
  	    rip = 0x0000000110d996a7
  	    Found by: previous frame's frame pointer
  	11  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase7RunWorkEv + 0x43
  	    rbp = 0x000070000a602a90   rsp = 0x000070000a602a70
  	    rip = 0x0000000110d9c8c3
  	    Found by: previous frame's frame pointer
  	12  Chromium Framework!__ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE + 0xa
  	    rbp = 0x000070000a602aa0   rsp = 0x000070000a602aa0
  	    rip = 0x0000000110d8048a
  	    Found by: previous frame's frame pointer
  	13  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase13RunWorkSourceEPv + 0x3f
  	    rbp = 0x000070000a602ae0   rsp = 0x000070000a602ab0
  	    rip = 0x0000000110d9c18f
  	    Found by: previous frame's frame pointer

Comment 14 by lgrey@chromium.org, Jan 18 2018

Thanks!

It's unwinding _tiny_free_list_add_ptr in libsystem_malloc which sounds pretty familiar from the original bug.

We have trouble unwinding in there in general but typically, we recover well (though  with a truncated stack). I'm assuming the problem here is that we were in the epilogue and our mitigations didn't trigger. Might be able to come up with a new mitigation for this one if I can get it live in the debugger, but not so sure from the minidump.

I'm very curious why the GPU tests seem to tickle this one so often, since I've never seen it in the wild.

wittman@ what do you think about bailing if we're in malloc until we have some way of doing prologue analysis?

kbr@ would a lever to turn off stack sampler for the GPU tests be an OK solution for you in the meantime?
Re bailing in libsystem_malloc:

It looks like 4.4% of non-idle stacks end in a frame in libsystem_malloc.dylib, so that's a pretty significant hit to take on recovery rate. Plus memory allocations and deallocations are more likely than other code to be a performance sink and therefore something we'd want to have insight on (see issue 786597 as an example).

A recent release filtered to the reported libsystem_malloc functions shows we'd lose ~200ms of runtime: https://uma.googleplex.com/p/chrome/callstacks?sid=c8d7ccc562032a6901b7fab824300e25. This is a bit of an overestimate since we'd only lose stacks with the last frame in libsystem_malloc but probably not by that much.

If this flakiness is sufficiently painful to the graphics team that we need a solution now, then deploying the libsystem_malloc workaround specifically for test execution seems like it would be OK. I don't think turning off profiling entirely on the GPU tests is a good idea because leaves us blind to issues in that process before they land in canary.

Comment 16 by kbr@chromium.org, Jan 18 2018

I'm not in favor of turning off stack sampling in the GPU tests. These tests are some of the few on the waterfalls and commit queue which run the full browser, and we should be testing the configuration we ship. For this reason I'm also not in favor of deploying a libsystem_malloc filter just for test execution.

The GPU tests run a lot of small web pages which presumably do a lot of memory allocations and deallocations in the same renderer process. Maybe that's why they stress this code path more than others on the waterfall. We should consider this as a real-world scenario.

Any mitigation which can be considered for shipping in Chrome's default configuration is welcome.

Comment 17 Deleted

Do we know why we're not seeing these from the wild? Could it be some configuration or build settings difference between what we ship as Canary vs. what the GPU bots run?

Comment 19 by kbr@chromium.org, Feb 24 2018

Seen again here:
 http://crbug.com/815319 
https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20FYI%20Experimental%20Release%20%28Intel%29/274
https://chromium-swarm.appspot.com/task?id=3bdd5cb290edf910&refresh=10&show_raw=1

Minidump is here:

Minidump found: /b/s/w/itULvpTL/tmpKbB6It/completed/bf846622-8967-4ccd-a5c2-b13b5d96cf7d.dmp
Uploading /b/s/w/itULvpTL/tmpKbB6It/completed/bf846622-8967-4ccd-a5c2-b13b5d96cf7d.dmp to gs://chrome-telemetry-output/minidump-2018-02-23_13-24-31-123721.dmp

Stack trace:

  	Operating system: Mac OS X
  	                  10.13.4 17E139j
  	CPU: amd64
  	     family 6 model 69 stepping 1
  	     4 CPUs
  	
  	GPU: UNKNOWN
  	
  	Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
  	Crash address: 0x7fff52c97000
  	Process uptime: 2 seconds
  	
  	Thread 4 (crashed)
  	 0  libunwind.dylib + 0x494c
  	    rax = 0x0000000000000001   rdx = 0x00007fff52c65a3c
  	    rcx = 0x00000000ffffffff   rbx = 0x00007fff52c97000
  	    rsi = 0x000070000597dfd0   rdi = 0x00007fff8acdca99
  	    rbp = 0x000070000597e000   rsp = 0x000070000597dfb0
  	     r8 = 0x00007fff52c96ff0    r9 = 0x0000000000000000
  	    r10 = 0x0000000000000008   r11 = 0x00007fff52c96bf9
  	    r12 = 0x00007fff52c8b969   r13 = 0x00007fff52c97000
  	    r14 = 0x00007fff8acdca99   r15 = 0x00007fff52c980a0
  	    rip = 0x00007fff52c6894c
  	    Found by: given as instruction pointer in context
  	 1  libunwind.dylib + 0x4768
  	    rbp = 0x000070000597e860   rsp = 0x000070000597e010
  	    rip = 0x00007fff52c68768
  	    Found by: previous frame's frame pointer
  	 2  libunwind.dylib + 0xbe2
  	    rbp = 0x000070000597f0b0   rsp = 0x000070000597e870
  	    rip = 0x00007fff52c64be2
  	    Found by: previous frame's frame pointer
  	 3  libunwind.dylib + 0xb2b
  	    rbp = 0x000070000597f0d0   rsp = 0x000070000597f0c0
  	    rip = 0x00007fff52c64b2b
  	    Found by: previous frame's frame pointer
  	 4  Chromium Framework!__ZN4base12_GLOBAL__N_120WalkStackFromContextIZNS0_21NativeStackSamplerMac27SuspendThreadAndRecordStackEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleEE3$_1EEbP13unw_context_tmPmPNSt3__16vectorINS6_6ModuleENSD_9allocatorISF_EEEEPNSE_INS0_11ModuleIndexENSG_ISK_EEEERKT_ + 0x3b
  	    rbp = 0x000070000597f5e0   rsp = 0x000070000597f0e0
  	    rip = 0x0000000118150d9b
  	    Found by: previous frame's frame pointer
  	 5  Chromium Framework!__ZN4base12_GLOBAL__N_121NativeStackSamplerMac17RecordStackSampleEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleE + 0x39c
  	    rbp = 0x000070000597fc80   rsp = 0x000070000597f5f0
  	    rip = 0x000000011815019c
  	    Found by: previous frame's frame pointer
  	 6  Chromium Framework!__ZN4base21StackSamplingProfiler14SamplingThread12RecordSampleEPNS1_17CollectionContextE + 0x43f
  	    rbp = 0x000070000597fdf0   rsp = 0x000070000597fc90
  	    rip = 0x0000000118153d1f
  	    Found by: previous frame's frame pointer
  	 7  Chromium Framework!__ZN4base21StackSamplingProfiler14SamplingThread21PerformCollectionTaskEi + 0xea
  	    rbp = 0x000070000597ff90   rsp = 0x000070000597fe00
  	    rip = 0x000000011815406a
  	    Found by: previous frame's frame pointer
  	 8  Chromium Framework!__ZN4base5debug13TaskAnnotator7RunTaskEPKcPNS_11PendingTaskE + 0x105
  	    rbp = 0x0000700005980170   rsp = 0x000070000597ffa0
  	    rip = 0x00000001180d9a75
  	    Found by: previous frame's frame pointer
  	 9  Chromium Framework!__ZN4base8internal17IncomingTaskQueue7RunTaskEPNS_11PendingTaskE + 0x79
  	    rbp = 0x00007000059802c0   rsp = 0x0000700005980180
  	    rip = 0x00000001181146e9
  	    Found by: previous frame's frame pointer
  	10  Chromium Framework!__ZN4base11MessageLoop7RunTaskEPNS_11PendingTaskE + 0x257
  	    rbp = 0x00007000059804b0   rsp = 0x00007000059802d0
  	    rip = 0x0000000118118f87
  	    Found by: previous frame's frame pointer
  	11  Chromium Framework!__ZN4base11MessageLoop21DeferOrRunPendingTaskENS_11PendingTaskE + 0xba
  	    rbp = 0x0000700005980660   rsp = 0x00007000059804c0
  	    rip = 0x000000011811934a
  	    Found by: previous frame's frame pointer
  	12  Chromium Framework!__ZN4base11MessageLoop6DoWorkEv + 0x23c
  	    rbp = 0x00007000059808f0   rsp = 0x0000700005980670
  	    rip = 0x00000001181195bc
  	    Found by: previous frame's frame pointer
  	13  Chromium Framework!__ZN4base18MessagePumpDefault3RunEPNS_11MessagePump8DelegateE + 0xdc
  	    rbp = 0x0000700005980940   rsp = 0x0000700005980900
  	    rip = 0x000000011811ab3c
  	    Found by: previous frame's frame pointer
  	14  Chromium Framework!__ZN4base11MessageLoop3RunEb + 0xa9
  	    rbp = 0x0000700005980aa0   rsp = 0x0000700005980950
  	    rip = 0x00000001181188a9
  	    Found by: previous frame's frame pointer
  	15  Chromium Framework!__ZN4base7RunLoop3RunEv + 0xf9
  	    rbp = 0x0000700005980bf0   rsp = 0x0000700005980ab0
  	    rip = 0x00000001181570a9
  	    Found by: previous frame's frame pointer
  	16  Chromium Framework!__ZN4base6Thread3RunEPNS_7RunLoopE + 0xce
  	    rbp = 0x0000700005980d40   rsp = 0x0000700005980c00
  	    rip = 0x00000001181a2ece
  	    Found by: previous frame's frame pointer
  	17  Chromium Framework!__ZN4base6Thread10ThreadMainEv + 0x38c
  	    rbp = 0x0000700005980ec0   rsp = 0x0000700005980d50
  	    rip = 0x00000001181a348c
  	    Found by: previous frame's frame pointer
  	18  Chromium Framework!__ZN4base12_GLOBAL__N_110ThreadFuncEPv + 0x5f
  	    rbp = 0x0000700005980ef0   rsp = 0x0000700005980ed0
  	    rip = 0x000000011819e16f
  	    Found by: previous frame's frame pointer
  	19  libsystem_pthread.dylib + 0x36c1
  	    rbp = 0x0000700005980f20   rsp = 0x0000700005980f00
  	    rip = 0x00007fff52c386c1
  	    Found by: previous frame's frame pointer
  	20  libsystem_pthread.dylib + 0x356d
  	    rbp = 0x0000700005980f50   rsp = 0x0000700005980f30
  	    rip = 0x00007fff52c3856d
  	    Found by: previous frame's frame pointer
  	21  libsystem_pthread.dylib + 0x2c5d
  	    rbp = 0x0000700005980f78   rsp = 0x0000700005980f60
  	    rip = 0x00007fff52c37c5d
  	    Found by: previous frame's frame pointer
  	22  Chromium Framework!__ZN4base14PlatformThread6DetachENS_20PlatformThreadHandleE + 0x70
  	    rsp = 0x0000700005981028   rip = 0x000000011819e110
  	    Found by: stack scanning



Comment 20 by kbr@chromium.org, Feb 24 2018

Cc: kbr@chromium.org cblume@google.com
 Issue 815319  has been merged into this issue.
More occurrences:
https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20FYI%20Retina%20Release%20%28AMD%29/753
minidump: gs://chrome-telemetry-output/minidump-2018-03-06_14-25-17-19742.dmp

https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20FYI%20Experimental%20Retina%20Release%20%28AMD%29/522
minidump: gs://chrome-telemetry-output/minidump-2018-03-06_06-50-15-795793.dmp
Stack with more symbols here:
https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20FYI%20Experimental%20Release%20%28Intel%29/570

0   Chromium Framework                  0x000000010a4d066c base::debug::StackTrace::StackTrace(unsigned long) + 28
1   Chromium Framework                  0x000000010a4d04a1 base::debug::(anonymous namespace)::StackDumpSignalHandler(int, __siginfo*, void*) + 2401
2   libsystem_platform.dylib            0x00007fff71dc0f5a _sigtramp + 26
3   ???                                 0x0000000000000000 0x0 + 0
4   libunwind.dylib                     0x00007fff71dfa768 libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::getInfoFromDwarfSection(unsigned long long, unsigned long long, unsigned long long, unsigned int, unsigned int) + 86
5   libunwind.dylib                     0x00007fff71df6be2 libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::setInfoBasedOnIPRegister(bool) + 172
6   libunwind.dylib                     0x00007fff71df6b2b unw_init_local + 104
7   Chromium Framework                  0x000000010a54834b _ZN4base12_GLOBAL__N_120WalkStackFromContextIZNS0_21NativeStackSamplerMac27SuspendThreadAndRecordStackEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleEE3$_1EEbP13unw_context_tmPmPNSt3__16vectorINS6_6ModuleENSD_9allocatorISF_EEEEPNSE_INS0_11ModuleIndexENSG_ISK_EEEERKT_ + 59
8   Chromium Framework                  0x000000010a54774c base::(anonymous namespace)::NativeStackSamplerMac::RecordStackSample(base::NativeStackSampler::StackBuffer*, base::StackSamplingProfiler::Sample*) + 924
9   Chromium Framework                  0x000000010a54b2cf base::StackSamplingProfiler::SamplingThread::RecordSample(base::StackSamplingProfiler::SamplingThread::CollectionContext*) + 1087
10  Chromium Framework                  0x000000010a54b61a base::StackSamplingProfiler::SamplingThread::PerformCollectionTask(int) + 234
11  Chromium Framework                  0x000000010a4d0fe4 base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) + 260
12  Chromium Framework                  0x000000010a50bdf9 base::internal::IncomingTaskQueue::RunTask(base::PendingTask*) + 121
13  Chromium Framework                  0x000000010a510697 base::MessageLoop::RunTask(base::PendingTask*) + 599
14  Chromium Framework                  0x000000010a510a5a base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) + 186
15  Chromium Framework                  0x000000010a510fb7 base::MessageLoop::DoDelayedWork(base::TimeTicks*) + 679
16  Chromium Framework                  0x000000010a5121dc base::MessagePumpDefault::Run(base::MessagePump::Delegate*) + 108
17  Chromium Framework                  0x000000010a50ffb9 base::MessageLoop::Run(bool) + 169
18  Chromium Framework                  0x000000010a54e659 base::RunLoop::Run() + 249
19  Chromium Framework                  0x000000010a59b3ce base::Thread::Run(base::RunLoop*) + 206
20  Chromium Framework                  0x000000010a59b98c base::Thread::ThreadMain() + 908
21  Chromium Framework                  0x000000010a59666f base::(anonymous namespace)::ThreadFunc(void*) + 95
22  libsystem_pthread.dylib             0x00007fff71dca6c1 _pthread_body + 340
23  libsystem_pthread.dylib             0x00007fff71dca56d _pthread_body + 0
24  libsystem_pthread.dylib             0x00007fff71dc9c5d thread_start + 13
[end of stack trace]
Found crashpad_database_util
Minidump found: /b/s/w/ityUiYAk/tmpygLy3b/completed/4b5dec2b-3c67-4b52-aff4-ce26380450a5.dmp
Uploading /b/s/w/ityUiYAk/tmpygLy3b/completed/4b5dec2b-3c67-4b52-aff4-ce26380450a5.dmp to gs://chrome-telemetry-output/minidump-2018-03-07_14-41-21-769356.dmp

Comment 23 by kbr@chromium.org, Mar 10 2018

Blockedon: 820677

Comment 24 by kbr@chromium.org, Mar 10 2018

Note: more NativeStackSamplerMac crashes seen in  Issue 820677 . A couple of minidumps are on that bug report. Not sure whether they're the cause of the test failure but they should be investigated.

Cc: -wittman@chromium.org lgrey@chromium.org
Owner: wittman@chromium.org
Summary: Stack sampler crashes on Mac Intel GPU bots (was: Stack sampler crash on macOS )
From offline discussion, Leonard currently doesn't have bandwidth to investigate this so I'll take it.

I don't anticipate having a solution for this soon so I'll disable the profiler on the GPU main thread until it can be addressed. Hopefully we should have enough to go on now from the collected minidumps.
 Issue 820677  has been merged into this issue.
Project Member

Comment 27 by bugdroid1@chromium.org, Mar 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/43a4f5d018220e3310001433f0e76eff146ef328

commit 43a4f5d018220e3310001433f0e76eff146ef328
Author: Mike Wittman <wittman@chromium.org>
Date: Thu Mar 15 22:34:38 2018

Sampling profiler: disable GPU main thread profiling on OS X

Disabling pending a resolution to crashes observed in the associated
bug.

This change also removes the unused
GetSamplingParamsForCurrentProcess function and adapts
IsProfilerEnabledForCurrentProcess to operate on the current process
and specified thread.

The call to IsProfilerEnabledForCurrentProcess is also removed
from SetServiceManagerConnectorForChildProcess since that function is
only invoked in processes supporting the profiler.

Bug:  774682 
Change-Id: Ibbc6f1bd9348ba09a3ee4db2e1595411617f1ccd
Reviewed-on: https://chromium-review.googlesource.com/962937
Commit-Queue: Mike Wittman <wittman@chromium.org>
Reviewed-by: Scott Violet <sky@chromium.org>
Reviewed-by: Leonard Grey <lgrey@chromium.org>
Cr-Commit-Position: refs/heads/master@{#543520}
[modify] https://crrev.com/43a4f5d018220e3310001433f0e76eff146ef328/chrome/common/stack_sampling_configuration.cc
[modify] https://crrev.com/43a4f5d018220e3310001433f0e76eff146ef328/chrome/common/stack_sampling_configuration.h
[modify] https://crrev.com/43a4f5d018220e3310001433f0e76eff146ef328/chrome/common/thread_profiler.cc

Project Member

Comment 28 by bugdroid1@chromium.org, Mar 16 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/150adef2ab46eca8c1cd37567fb4ad5686e66506

commit 150adef2ab46eca8c1cd37567fb4ad5686e66506
Author: David Trainor <dtrainor@chromium.org>
Date: Fri Mar 16 04:07:25 2018

Revert "Sampling profiler: disable GPU main thread profiling on OS X"

This reverts commit 43a4f5d018220e3310001433f0e76eff146ef328.

Reason for revert: Looks like the changes to thread_profiler broke some Android tests? (ToastHWATest): https://ci.chromium.org/buildbot/chromium.android/Lollipop%20Phone%20Tester/19475

[FATAL:thread_profiler.cc(154)] Check failed: metrics::CallStackProfileParams::BROWSER_PROCESS != GetProcess() (1 vs. 1)

Reverting.  Sorry!


Original change's description:
> Sampling profiler: disable GPU main thread profiling on OS X
> 
> Disabling pending a resolution to crashes observed in the associated
> bug.
> 
> This change also removes the unused
> GetSamplingParamsForCurrentProcess function and adapts
> IsProfilerEnabledForCurrentProcess to operate on the current process
> and specified thread.
> 
> The call to IsProfilerEnabledForCurrentProcess is also removed
> from SetServiceManagerConnectorForChildProcess since that function is
> only invoked in processes supporting the profiler.
> 
> Bug:  774682 
> Change-Id: Ibbc6f1bd9348ba09a3ee4db2e1595411617f1ccd
> Reviewed-on: https://chromium-review.googlesource.com/962937
> Commit-Queue: Mike Wittman <wittman@chromium.org>
> Reviewed-by: Scott Violet <sky@chromium.org>
> Reviewed-by: Leonard Grey <lgrey@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#543520}

TBR=sky@chromium.org,wittman@chromium.org,lgrey@chromium.org

Change-Id: I2ecac49a6e14de7f620dc0096ce229f25057c810
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  774682 
Reviewed-on: https://chromium-review.googlesource.com/965821
Reviewed-by: David Trainor <dtrainor@chromium.org>
Commit-Queue: David Trainor <dtrainor@chromium.org>
Cr-Commit-Position: refs/heads/master@{#543613}
[modify] https://crrev.com/150adef2ab46eca8c1cd37567fb4ad5686e66506/chrome/common/stack_sampling_configuration.cc
[modify] https://crrev.com/150adef2ab46eca8c1cd37567fb4ad5686e66506/chrome/common/stack_sampling_configuration.h
[modify] https://crrev.com/150adef2ab46eca8c1cd37567fb4ad5686e66506/chrome/common/thread_profiler.cc

I think we have a solution for the crashes now, so rather than trying to fix the reverted disable CL I will implement that.
Project Member

Comment 30 by bugdroid1@chromium.org, Mar 29 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/556b70ba7c6c0ca2a87e6e68196e4c13acd11af7

commit 556b70ba7c6c0ca2a87e6e68196e4c13acd11af7
Author: Mike Wittman <wittman@chromium.org>
Date: Thu Mar 29 23:19:28 2018

Work around libunwind crash accessing memory past mapped libraries

In some unwinds seen on recent OS X beta versions unw_init_local
attempts to access memory past the end of the mapped libraries. This
region appears to be protected or unmapped, resulting in crashes.
Getting a fix for libunwind into an OS X release may take 18 months
(if accepted at all). This workaround will avoid the crashes in the
mean time.

57% of the Mac profiler crashes seen over the last five months were
due to this bug.

Bug:  774682 
Change-Id: I260b03adea433f93871c7628eb38886aa877e549
Reviewed-on: https://chromium-review.googlesource.com/969830
Commit-Queue: Mike Wittman <wittman@chromium.org>
Reviewed-by: Mark Mentovai <mark@chromium.org>
Reviewed-by: Robert Sesek <rsesek@chromium.org>
Reviewed-by: Leonard Grey <lgrey@chromium.org>
Cr-Commit-Position: refs/heads/master@{#547012}
[modify] https://crrev.com/556b70ba7c6c0ca2a87e6e68196e4c13acd11af7/base/profiler/native_stack_sampler_mac.cc

Status: Fixed (was: Assigned)
The previous change should address the main crasher that was causing test flakes. If we see additional profiler issues please open a new bug and assign to me.

Comment 32 by kbr@chromium.org, Apr 11 2018

Blocking: 831448
Blocking: 907258
Blocking: -907258

Sign in to add a comment