Chrome leaks Mach ports, resulting in a crash after exhausting the port space. |
|||||
Issue descriptionA user is reporting a daily Chrome crash on OSX. It may be related to leaking port. The crash seems to happen in com.apple.NSEventThread This Twitter convo includes details: https://twitter.com/IncendiaryMedia/status/855286283844820992 CCing Erik to route
,
Apr 21 2017
Do you want it here or to file a fresh ticket via the crashes page? Crash ID: crash/4f33e42630000000
,
Apr 21 2017
+ rsesek I don't se anything out of the ordinary in the crash. 10k Mach ports, while high, isn't particularly outrageous - on 10.12 I believe the max is somewhere around ~250k.
,
Apr 21 2017
I'm not entirely sure what the "ports" column of activity monitor is actually tracking, but it was up to 140k before the SF Power Outage at 9am today.
,
Apr 21 2017
Yes, that is an inappropriately high number, and it seems likely we're hitting the Mach port limit for the process.
,
Apr 21 2017
As a first step, perhaps we should add an UMA metric that tracks total # of mach ports, so we can get an idea for the number of affected users. The fact that the browser is crashed when you come back suggests that there's something in the background constantly leaking mach ports. Looking at other crashes with this signature, ti seems like the proble mfirst showed up ~M54/M55, and affects all macOS versions, suggesting a Chrome bug introduced sometime in that time period. The easiest way to debug this problem is probably to throw a dtrace script that tracks all mach port allocations/deallocations in the browser process. I don't know off the top of my head what that would look like, since I expect the kernel can just insert mach ports into the browser process's name space. rsesek, ideas?
,
Apr 22 2017
Occurred again this morning while I was not at my device: Crash ID 8b26e8ab-fa19-4f37-9091-b366440865cf (Server ID: f4c1b74990000000) I've also attached the Apple crash report as well, though I'm not sure how useful it is.
,
Apr 24 2017
Thanks for the crash report. Note that this is using the x86_64-Haswell variant of CoreFoundation. The crash is a software breakpoint:
loc_885ca:
00000000000885ca int3 ; CODE XREF=sub_88470+274
Which is caused when the result of mach_msg is not what is expected:
00000000000884df call imp___stubs__mach_msg
00000000000884e4 mov r14d, eax
[snip]
loc_8852b:
000000000008852b movsxd rbx, r14d ; CODE XREF=sub_88470+147, sub_88470+167
000000000008852e mov rdi, rbx
0000000000088531 nop
0000000000088532 nop dword [rax]
0000000000088536 cmp ebx, 0x10004004
000000000008853c jne loc_88576
loc_88576:
0000000000088576 cmp r14d, 0x10004003 ; CODE XREF=sub_88470+204
000000000008857d je loc_88597
000000000008857f test r14d, r14d
0000000000088582 jne loc_885ca
The two kern_return_t's that it compares against are 0x10004004 ((ipc/rcv) msg too large) and 0x10004003 ((ipc/rcv) timed out).
Looking at the value of %r14 in the crash dump though, we have 0x000000001000600b. 1000600b is in error system 4, subsystem 1, but that return code isn't defined in <mach/message.h>.
,
Apr 24 2017
Looking at 10.12/xnu-3789.1.32/osfmk/ipc/ipc_kmsg.c, the comment at ipc_kmsg_copyout_header() indicates that two codes get OR'd together: * MACH_RCV_HEADER_ERROR|MACH_MSG_IPC_SPACE * The space is dead. * MACH_RCV_HEADER_ERROR|MACH_MSG_IPC_SPACE * No room in space for another name. So that would produce the 1000600b code (0x1000400b|0x00002000). That does indicate that we've hit port exhaustion.
,
Apr 24 2017
,
Apr 24 2017
Re: #7: One other thing I notice in your crash report is this:
External Modification Summary:
Calls made by other processes targeting this process:
task_for_pid: 35265
thread_create: 0
thread_set_state: 0
Calls made by this process:
task_for_pid: 0
thread_create: 0
thread_set_state: 0
Calls made by all processes on this machine:
task_for_pid: 12668899
thread_create: 0
thread_set_state: 0
Those numbers for task_for_pid are *extremely* high. When we've seen things like this before, it's actually other software on the system leaking references to Chrome's Mach task port. I have a suspicion that may be the case here. Do you run any kind of monitoring software on your Mac?
,
Apr 24 2017
Unfortunately no, the built in Activity Monitor is the only thing close to monitoring software I've used. Is there a way to trace this?
,
Apr 24 2017
You can try running this script; it will print every task_for_pid() call on the system. To end it, press Ctrl-C and it'll print a summary of the total number of calls by process. sudo dtrace -s task_for_pid.d Once that script is running, just make a note of Chrome's PID. The output will look like: 20 105 task_for_pid:entry by ps(70447) -> 608 The number after the "->" is the PID that is being accessed. If you see Chrome's there a lot, then the "by ps" will indicate the process doing it. I'm not sure if this will work completely if you have SIP turned on, but it'll be a start.
,
Apr 24 2017
Running the dtrace script returns no output(1) by default: CPU ID FUNCTION:NAME 2 1 :BEGIN Monitoring calls to task_for_pid() 0 2 :END 1: However when activity monitor is running it hits everything ( I think this is expected? ), see attachment
,
Apr 24 2017
You may need to let it run until Chrome crashes to show anything useful. sysmond is just the helper process that Activity Monitor uses to gather data.
,
Oct 31
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by erikc...@chromium.org
, Apr 21 2017