New issue
Advanced search Search tips

Issue 714057 link

Starred by 4 users

Issue metadata

Status: Archived
Owner:
Closed: Oct 31
Cc:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 3
Type: Bug



Sign in to add a comment

Chrome leaks Mach ports, resulting in a crash after exhausting the port space.

Project Member Reported by cblume@chromium.org, Apr 21 2017

Issue description

A user is reporting a daily Chrome crash on OSX.

It may be related to leaking port. The crash seems to happen in com.apple.NSEventThread

This Twitter convo includes details:
https://twitter.com/IncendiaryMedia/status/855286283844820992

CCing Erik to route
 
Please ask the user to navigate to chrome://crashes and report the server ID of a recent crash.

Comment 2 by dadkis...@justin.tv, Apr 21 2017

Do you want it here or to file a fresh ticket via the crashes page?

Crash ID: crash/4f33e42630000000
Cc: rsesek@chromium.org
+ rsesek

I don't se anything out of the ordinary in the crash. 10k Mach ports, while high, isn't particularly outrageous - on 10.12 I believe the max is somewhere around ~250k. 

Comment 4 by dadkis...@justin.tv, Apr 21 2017

I'm not entirely sure what the "ports" column of activity monitor is actually tracking, but it was up to 140k before the SF Power Outage at 9am today.
Summary: Chrome leaks Mach ports, resulting in a crash after exhausting the port space. (was: Crash in com.apple.NSEventTrhead -- leaking ports?)
Yes, that is an inappropriately high number, and it seems likely we're hitting the Mach port limit for the process.
As a first step, perhaps we should add an UMA metric that tracks total # of mach ports, so we can get an idea for the number of affected users.

The fact that the browser is crashed when you come back suggests that there's something in the background constantly leaking mach ports.

Looking at other crashes with this signature, ti seems like the proble mfirst showed up ~M54/M55, and affects all macOS versions, suggesting a Chrome bug introduced sometime in that time period.

The easiest way to debug this problem is probably to throw a dtrace script that tracks all mach port allocations/deallocations in the browser process. 

I don't know off the top of my head what that would look like, since I expect the kernel can just insert mach ports into the browser process's name space. rsesek, ideas?

Comment 7 by dadkis...@justin.tv, Apr 22 2017

Occurred again this morning while I was not at my device:

Crash ID 8b26e8ab-fa19-4f37-9091-b366440865cf (Server ID: f4c1b74990000000)

I've also attached the Apple crash report as well, though I'm not sure how useful it is.
crash:f4c1b74990000000.log
109 KB View Download

Comment 8 by rsesek@chromium.org, Apr 24 2017

Thanks for the crash report. Note that this is using the x86_64-Haswell variant of CoreFoundation. The crash is a software breakpoint:

                     loc_885ca:
00000000000885ca         int3                 ; CODE XREF=sub_88470+274

Which is caused when the result of mach_msg is not what is expected:

00000000000884df         call       imp___stubs__mach_msg
00000000000884e4         mov        r14d, eax

[snip]

                     loc_8852b:
000000000008852b         movsxd     rbx, r14d             ; CODE XREF=sub_88470+147, sub_88470+167
000000000008852e         mov        rdi, rbx
0000000000088531         nop
0000000000088532         nop        dword [rax]
0000000000088536         cmp        ebx, 0x10004004
000000000008853c         jne        loc_88576

                     loc_88576:
0000000000088576         cmp        r14d, 0x10004003                ; CODE XREF=sub_88470+204
000000000008857d         je         loc_88597

000000000008857f         test       r14d, r14d
0000000000088582         jne        loc_885ca

The two kern_return_t's that it compares against are 0x10004004 ((ipc/rcv) msg too large) and 0x10004003 ((ipc/rcv) timed out).

Looking at the value of %r14 in the crash dump though, we have 0x000000001000600b. 1000600b is in error system 4, subsystem 1, but that return code isn't defined in <mach/message.h>.

Comment 9 by rsesek@chromium.org, Apr 24 2017

Looking at 10.12/xnu-3789.1.32/osfmk/ipc/ipc_kmsg.c, the comment at ipc_kmsg_copyout_header() indicates that two codes get OR'd together:

 *              MACH_RCV_HEADER_ERROR|MACH_MSG_IPC_SPACE
 *                      The space is dead.
 *              MACH_RCV_HEADER_ERROR|MACH_MSG_IPC_SPACE
 *                      No room in space for another name.


So that would produce the 1000600b code (0x1000400b|0x00002000). That does indicate that we've hit port exhaustion.
Cc: -rsesek@chromium.org
Owner: rsesek@chromium.org
Status: Assigned (was: Untriaged)
Re: #7: One other thing I notice in your crash report is this:

External Modification Summary:
  Calls made by other processes targeting this process:
    task_for_pid: 35265
    thread_create: 0
    thread_set_state: 0
  Calls made by this process:
    task_for_pid: 0
    thread_create: 0
    thread_set_state: 0
  Calls made by all processes on this machine:
    task_for_pid: 12668899
    thread_create: 0
    thread_set_state: 0

Those numbers for task_for_pid are *extremely* high. When we've seen things like this before, it's actually other software on the system leaking references to Chrome's Mach task port. I have a suspicion that may be the case here. Do you run any kind of monitoring software on your Mac?
Unfortunately no, the built in Activity Monitor is the only thing close to monitoring software I've used. Is there a way to trace this?
You can try running this script; it will print every task_for_pid() call on the system. To end it, press Ctrl-C and it'll print a summary of the total number of calls by process.

sudo dtrace -s task_for_pid.d

Once that script is running, just make a note of Chrome's PID. The output will look like:
   20    105               task_for_pid:entry by ps(70447) -> 608

The number after the "->" is the PID that is being accessed. If you see Chrome's there a lot, then the "by ps" will indicate the process doing it.

I'm not sure if this will work completely if you have SIP turned on, but it'll be a start.
task_for_pid.d
203 bytes View Download
Running the dtrace script returns no output(1) by default:

CPU     ID                    FUNCTION:NAME
  2      1                           :BEGIN Monitoring calls to task_for_pid()

  0      2                             :END 

1: However when activity monitor is running it hits everything ( I think this is expected? ), see attachment
dtrace_output.log
292 KB View Download
You may need to let it run until Chrome crashes to show anything useful. sysmond is just the helper process that Activity Monitor uses to gather data.
Status: Archived (was: Assigned)

Sign in to add a comment