New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 681167 link

Starred by 7 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

WaitableEvent is much slower on POSIX than Windows

Project Member Reported by jam@chromium.org, Jan 13 2017

Issue description

Two surprising issues:
-windows times are very stable across runs, while linux runs sometime have a 50% difference in subsequent runs
-even the fastest linux runs are twice as slow as Windows

I'm more concerned about the second issue. I even tried adding is_official_build but it didn't make a difference. Everything is slower across all the tests, i.e. ipc across processes, in-process, and even just callback calls. Since the calls are amortized over 50K runs (for most tests), I don't think proces setup should factor in (and regardless, Linux is faster for that).

Thoughts?
 

Comment 1 by roc...@chromium.org, Jan 14 2017

Persisting what we just discussed offline: the discrepancy is not seen for the in-process case which does no real I/O. That implicates something in the POSIX I/O stack, and based on a recent observation regarding BattOr, I've been wondering if FileDescriptorWatcher usage might impose a higher read latency than we'd like, even when there's zero IO thread contention.

We should start by running a simple latency benchmarks over ChannelPosix in a single-process environment and go from there.

Comment 2 by yzshen@chromium.org, Jan 14 2017

Out of curiosity: could you please talk more about the observation regarding BattOr? Thanks!

Comment 3 by roc...@chromium.org, Jan 14 2017

They are observing unexpectedly high latency between writing to the serial
port and having the FileDescriptorWatcher wake up to read a corresponding
response from the device. If the device is instead polled synchronously on
a background thread, the latency is much lower (~10x difference).

Comment 4 by jam@chromium.org, Jan 14 2017

ipc_perftests has a bunch of different tests that can help isolate this (sync to head).

Looking at the sending 144 bytes roundtrip case (same ballpark as 12 or 1728 bytes), here's a comparison between windows and linux. My Windows machine si z840 while my Linux is Z620, so while I would expect a small (maybe 10-20% at most) difference, the results are more like 2-3x.


					Windows	Linux
IPC					35.7	62.5
Mojo cross-process			33	53.8
Mojo in-process multi-thread		10.2	28.7
Mojo in-process single-thread		8.9	14.5
Callback multi-thread			6.9	21.4
Callback single-thread with PostTask	2	2.84
Callback single-thread with no PostTask	0.13	0.156

We can even take out the Mojo part of it by looking at the callback variants, just to reduce factors. Two callbacks cost 21us on Linux vs 7 on Windows which is really hard to believe. Seems like if we could track down a way to improve this it would have big improvements on more than just Mojo :)

Comment 5 by roc...@chromium.org, Jan 14 2017

What kind of MessagePump is used in the multi-thread case?

Comment 6 by jam@chromium.org, Jan 14 2017

Looks like our  messages crossed :)

I was tracing this code and the callback tests just use default message pump. so libevent isn't used. Since MessagePumpDefault is trivial, looks like it's Waitable event posix implementation that is the culprit.

Comment 7 by roc...@chromium.org, Jan 14 2017

I don't expect us to be able to improve much on the default pump, but I guess it's still worth investigating what exactly is the issue there.

I'd be interested to know what happens if the target thread is switched to an IO pump instead. i.e. do we see a much higher latency gap between Win and Linux in that case? That would pretty effectively test the speculation about libevent.

Comment 8 by jam@chromium.org, Jan 14 2017

Owner: jam@chromium.org
Status: Started (was: Unconfirmed)
I'm going to try some experiments on waitable event's posix implementation.

Comment 9 by jam@chromium.org, Apr 5 2017

Cc: jam@chromium.org
Owner: ----
Status: Available (was: Started)
I couldn't find time to work on this unfortunately.
Cc: -roc...@chromium.org
Owner: roc...@chromium.org
Status: Assigned (was: Available)
Adding it to my queue since low-level performance in general is something I plan to take a closer look at this quarter.
Cc: pasko@chromium.org

Comment 12 by jam@chromium.org, Jun 8 2017

Summary: WaitableEvent is much slower on POSIX than Windows (was: Investigate why ipc_perftests is much slower on Linux than on Windows)
Cc: roc...@chromium.org
Owner: rsesek@chromium.org
I was looking at WaitableEvent on Mac and noticed that it's implemented using pthread_cond_t and pthread_mutex_t, which are pretty heavyweight for what this does. Using an async waiter adds an additional pthread_mutex_t.

I've wrote a new implementation on Mac using kqueue() and zero locks. (Though more improvements are possible if we could switch MessagePumpDefault on Mac to not be a CFRunLoop, which is also quite heavyweight.)

On Linux I've also started working on an implementation that uses eventfd(), which works very similar to the Mac kqueue() implementation.
Project Member

Comment 14 by bugdroid1@chromium.org, Jun 30 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3b33337785b9c039c15ffb136d5ac2a4ad93ba80

commit 3b33337785b9c039c15ffb136d5ac2a4ad93ba80
Author: Robert Sesek <rsesek@chromium.org>
Date: Fri Jun 30 18:21:56 2017

Convert WaitableEventWatcherTest to use parameterized tests.

Bug: 681167
Change-Id: Id647559bb83db517315c3b0a28649bf4e0d1450f
Reviewed-on: https://chromium-review.googlesource.com/556519
Reviewed-by: danakj <danakj@chromium.org>
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Cr-Commit-Position: refs/heads/master@{#483764}
[modify] https://crrev.com/3b33337785b9c039c15ffb136d5ac2a4ad93ba80/base/synchronization/waitable_event_watcher_unittest.cc

Comment 15 by jam@chromium.org, Jun 30 2017

Thanks for taking this on!

It would be great if you can post before/after numbers from ipc_perftests to see how this impact mojo.

For mac, you'll have to patch https://codereview.chromium.org/2932923002/. I haven't had time to make the fix work with ipc_tests
Re: #15: Yes I will post perftest numbers. I did notice that ipc_perftests was failing on Mac so I needed to look into that -- I will try your CL next week. I'm also going to add a dedicated base_perftest for WaitableEvent, so as to not exercise any other machinery. As I noted in #13, MessagePumpDefault on Mac is pretty slow, so I think ipc_perftests may be dominated by that at the moment. (I'll work on that too next :)
Project Member

Comment 17 by bugdroid1@chromium.org, Jun 30 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b44f896ea1f81f87f51be2d95e962bf21859ab0d

commit b44f896ea1f81f87f51be2d95e962bf21859ab0d
Author: Robert Sesek <rsesek@chromium.org>
Date: Fri Jun 30 21:21:13 2017

Add additional tests for WaitableEvent and WaitableEventWatcher.

These cover behaviors that clients of these classes were relying upon, but which
were not tested by base_unittests.

Bug: 681167
Change-Id: Ib07ef1387bfe653560aeda6affc5b56ab8e54d85
Reviewed-on: https://chromium-review.googlesource.com/558505
Reviewed-by: danakj <danakj@chromium.org>
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Cr-Commit-Position: refs/heads/master@{#483821}
[modify] https://crrev.com/b44f896ea1f81f87f51be2d95e962bf21859ab0d/base/synchronization/waitable_event_unittest.cc
[modify] https://crrev.com/b44f896ea1f81f87f51be2d95e962bf21859ab0d/base/synchronization/waitable_event_watcher_unittest.cc

Project Member

Comment 18 by bugdroid1@chromium.org, Jul 5 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/da4be990616dee7746ab65aab2026c10a61d1586

commit da4be990616dee7746ab65aab2026c10a61d1586
Author: Robert Sesek <rsesek@chromium.org>
Date: Wed Jul 05 17:38:14 2017

Do not Reset the waitable events in TraceEventTestFixture.ThreadOnceBlocking.

The events are created as automatic-reset, so it is not necessary to reset them
manually. Doing so for the |task_start_event| and |task_stop_event| is racy,
since the test main thread could signal and then immediately reset the event.
If the second thread does not make progress between those two calls, then it
will miss observing the signal and the test will hang forever.

Bug: 681167
Change-Id: Ia65eccd5fbbb83af53f4501ce4abb99ed3c9131c
Reviewed-on: https://chromium-review.googlesource.com/559510
Reviewed-by: Primiano Tucci <primiano@chromium.org>
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Cr-Commit-Position: refs/heads/master@{#484301}
[modify] https://crrev.com/da4be990616dee7746ab65aab2026c10a61d1586/base/trace_event/trace_event_unittest.cc

As I suspected, ipc_perftests doesn't really show improvement because it's dominated by CFRunLoop internals. But I ran base_perftests and there is significant improvement:

Existing implementation using existing test:
[ RUN      ] WaitableEventPerfTest.EventPingPong
*RESULT task: 4_WaitableEvent_Threads_time = 22.55947 us/hop
*RESULT task: 4_WaitableEvent_Threads_cpu = 11.68782 us/hop
[       OK ] WaitableEventPerfTest.EventPingPong (2259 ms)

Kqueue-based implementation:
[ RUN      ] WaitableEventPerfTest.EventPingPong
*RESULT task: 4_WaitableEvent_Threads_time = 4.03602 us/hop
*RESULT task: 4_WaitableEvent_Threads_cpu = 4.12879 us/hop
[       OK ] WaitableEventPerfTest.EventPingPong (407 ms)



Separately, my new perf test shows kqueue is slightly slower than the existing implementation for a contrived Signal/Wait in a tight loop on a single thread. (Shot-in-the-dark guess: the pthread internals aren't even parking the lock because it's hot and un-contended on a single thread, but kqueue is always going into a system call). All other metrics show good improvement. The first two tests time how long it takes to do 1000 Signals/Waits; the third test sees how many Signals/Waits it can do in one second. The kqueue implementation can do 3.3x more in the same amount of time.

Before:
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from WaitableEventPerfTest
[ RUN      ] WaitableEventPerfTest.SingleThread
*RESULT signal_time: singlethread-1000-samples= 80 ns/sample
*RESULT wait_time: singlethread-1000-samples= 82 ns/sample
[       OK ] WaitableEventPerfTest.SingleThread (0 ms)
[ RUN      ] WaitableEventPerfTest.MultipleThreads
*RESULT signal_time_waiter: multithread-1000-samples= 3091 ns/sample
*RESULT wait_time_waiter: multithread-1000-samples= 43894 ns/sample
*RESULT signal_time_signaler: multithread-1000-samples= 3103 ns/sample
*RESULT wait_time_signaler: multithread-1000-samples= 43757 ns/sample
[       OK ] WaitableEventPerfTest.MultipleThreads (48 ms)
[ RUN      ] WaitableEventPerfTest.Throughput
*RESULT counts: throughput= 91153 signals
*RESULT signal_time: throughput= 97929 ns/sample
*RESULT wait_time: throughput= 98955 ns/sample
[       OK ] WaitableEventPerfTest.Throughput (1001 ms)
[----------] 3 tests from WaitableEventPerfTest (1049 ms total)

After:
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from WaitableEventPerfTest
[ RUN      ] WaitableEventPerfTest.SingleThread
*RESULT signal_time: singlethread-1000-samples= 528 ns/sample
*RESULT wait_time: singlethread-1000-samples= 644 ns/sample
[       OK ] WaitableEventPerfTest.SingleThread (1 ms)
[ RUN      ] WaitableEventPerfTest.MultipleThreads
*RESULT signal_time_waiter: multithread-1000-samples= 2164 ns/sample
*RESULT wait_time_waiter: multithread-1000-samples= 11330 ns/sample
*RESULT signal_time_signaler: multithread-1000-samples= 2876 ns/sample
*RESULT wait_time_signaler: multithread-1000-samples= 10920 ns/sample
[       OK ] WaitableEventPerfTest.MultipleThreads (15 ms)
[ RUN      ] WaitableEventPerfTest.Throughput
*RESULT counts: throughput= 308677 signals
*RESULT signal_time: throughput= 66393 ns/sample
*RESULT wait_time: throughput= 96630 ns/sample
[       OK ] WaitableEventPerfTest.Throughput (1004 ms)
[----------] 3 tests from WaitableEventPerfTest (1020 ms total)
Project Member

Comment 20 by bugdroid1@chromium.org, Jul 7 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/431258198d910ec58ffa149817c680a8ab5ddc61

commit 431258198d910ec58ffa149817c680a8ab5ddc61
Author: Robert Sesek <rsesek@chromium.org>
Date: Fri Jul 07 17:15:39 2017

Add a dedicated WaitableEventPerfTest.

This adds tests for three cases:
- Timing how long it takes to Signal/Wait 1000x in a tight loop.
- Timing how long it takes to Signal/Wait 1000x between threads.
- Counting how many Signal/Waits can be done in 1s between threads.

This also renames the existing WaitableEventPerfTest to
WaitableEventThreadPerfTest, since it's part of thread_perftest.cc.

Bug: 681167
Change-Id: I5180f69b5901da3d1ee808eac90f1b455b1e970d
Reviewed-on: https://chromium-review.googlesource.com/560152
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Reviewed-by: danakj <danakj@chromium.org>
Cr-Commit-Position: refs/heads/master@{#484963}
[modify] https://crrev.com/431258198d910ec58ffa149817c680a8ab5ddc61/base/BUILD.gn
[add] https://crrev.com/431258198d910ec58ffa149817c680a8ab5ddc61/base/synchronization/waitable_event_perftest.cc
[modify] https://crrev.com/431258198d910ec58ffa149817c680a8ab5ddc61/base/threading/thread_perftest.cc

Project Member

Comment 21 by bugdroid1@chromium.org, Jul 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2096391cdcc3bdf963e68977e9015aa8cdbe85c6

commit 2096391cdcc3bdf963e68977e9015aa8cdbe85c6
Author: Robert Sesek <rsesek@chromium.org>
Date: Wed Jul 12 01:54:21 2017

Reimplement base::WaitableEvent with a kqueue on Mac.

For a single WaitableEvent, a custom EVFILT_USER kevent is used to wait and
signal. This replaces the default POSIX implementation that uses a
pthread_cond_t and a boolean flag.

To implement WaitMany, a new kqueue is created to wait on all the individual
WaitableEvent's kqueue descriptor. This replaces a complex locking algorithm
used in the default POSIX implementation.

For the asynchronous WaitableEventWatcher, a TYPE_READ dispatch_source_t is
used to watch the WaitableEvent's kqueue. This replaces the POSIX
implementation of a reference-counted list of async watchers guarded by a lock.

Microbenchmarks show that the kqueue implementation is significantly faster in
most cases. The one potential drawback is hitting the low RLIMIT_NOFILE on
macOS, since each WaitableEvent and WaitableEventWatcher requires a new
descriptor.

Bug: 681167
Change-Id: I135012fdd25e547ffb911fc7adc97c203df38241
Reviewed-on: https://chromium-review.googlesource.com/553497
Reviewed-by: Robert Liao <robliao@chromium.org>
Reviewed-by: Mark Mentovai <mark@chromium.org>
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Cr-Commit-Position: refs/heads/master@{#485788}
[modify] https://crrev.com/2096391cdcc3bdf963e68977e9015aa8cdbe85c6/base/BUILD.gn
[modify] https://crrev.com/2096391cdcc3bdf963e68977e9015aa8cdbe85c6/base/synchronization/waitable_event.h
[add] https://crrev.com/2096391cdcc3bdf963e68977e9015aa8cdbe85c6/base/synchronization/waitable_event_mac.cc
[modify] https://crrev.com/2096391cdcc3bdf963e68977e9015aa8cdbe85c6/base/synchronization/waitable_event_watcher.h
[add] https://crrev.com/2096391cdcc3bdf963e68977e9015aa8cdbe85c6/base/synchronization/waitable_event_watcher_mac.cc
[modify] https://crrev.com/2096391cdcc3bdf963e68977e9015aa8cdbe85c6/base/task_scheduler/task_tracker_unittest.cc

Project Member

Comment 22 by bugdroid1@chromium.org, Jul 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3b99851e887099660ee0f602582501f2c5827d41

commit 3b99851e887099660ee0f602582501f2c5827d41
Author: Keishi Hattori <keishi@chromium.org>
Date: Wed Jul 12 04:19:47 2017

Revert "Reimplement base::WaitableEvent with a kqueue on Mac."

This reverts commit 2096391cdcc3bdf963e68977e9015aa8cdbe85c6.

Reason for revert: Maybe caused jingle_unittests to fail  crbug.com/741256 

Original change's description:
> Reimplement base::WaitableEvent with a kqueue on Mac.
> 
> For a single WaitableEvent, a custom EVFILT_USER kevent is used to wait and
> signal. This replaces the default POSIX implementation that uses a
> pthread_cond_t and a boolean flag.
> 
> To implement WaitMany, a new kqueue is created to wait on all the individual
> WaitableEvent's kqueue descriptor. This replaces a complex locking algorithm
> used in the default POSIX implementation.
> 
> For the asynchronous WaitableEventWatcher, a TYPE_READ dispatch_source_t is
> used to watch the WaitableEvent's kqueue. This replaces the POSIX
> implementation of a reference-counted list of async watchers guarded by a lock.
> 
> Microbenchmarks show that the kqueue implementation is significantly faster in
> most cases. The one potential drawback is hitting the low RLIMIT_NOFILE on
> macOS, since each WaitableEvent and WaitableEventWatcher requires a new
> descriptor.
> 
> Bug: 681167
> Change-Id: I135012fdd25e547ffb911fc7adc97c203df38241
> Reviewed-on: https://chromium-review.googlesource.com/553497
> Reviewed-by: Robert Liao <robliao@chromium.org>
> Reviewed-by: Mark Mentovai <mark@chromium.org>
> Commit-Queue: Robert Sesek <rsesek@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#485788}

TBR=robliao@chromium.org,rsesek@chromium.org,mark@chromium.org

Change-Id: I5b8740bf353f86b0ac5b600793826cd7724cd784
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: 681167
Reviewed-on: https://chromium-review.googlesource.com/566700
Reviewed-by: Keishi Hattori <keishi@chromium.org>
Commit-Queue: Keishi Hattori <keishi@chromium.org>
Cr-Commit-Position: refs/heads/master@{#485853}
[modify] https://crrev.com/3b99851e887099660ee0f602582501f2c5827d41/base/BUILD.gn
[modify] https://crrev.com/3b99851e887099660ee0f602582501f2c5827d41/base/synchronization/waitable_event.h
[delete] https://crrev.com/623921bf99007b51cd9652a9adff459da54f9519/base/synchronization/waitable_event_mac.cc
[modify] https://crrev.com/3b99851e887099660ee0f602582501f2c5827d41/base/synchronization/waitable_event_watcher.h
[delete] https://crrev.com/623921bf99007b51cd9652a9adff459da54f9519/base/synchronization/waitable_event_watcher_mac.cc
[modify] https://crrev.com/3b99851e887099660ee0f602582501f2c5827d41/base/task_scheduler/task_tracker_unittest.cc

Project Member

Comment 23 by bugdroid1@chromium.org, Jul 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6d38e78f585fc84c2757e9f09ac09f50615c8438

commit 6d38e78f585fc84c2757e9f09ac09f50615c8438
Author: Robert Sesek <rsesek@chromium.org>
Date: Thu Jul 13 00:46:50 2017

Add new test WaitableEventWatcherDeletionTest.DeleteWatcherBeforeCallback.

This verifies the potential for use-after-free that can occur if the
WaitableEventWatcher is deleted after the event is signaled but before the
callback is run.

Bug: 681167
Bug:  741256 
Change-Id: Ic5ca2bc48cb51d5f50be7e67773f8213fbbc80d3
Reviewed-on: https://chromium-review.googlesource.com/568371
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Cr-Commit-Position: refs/heads/master@{#486191}
[modify] https://crrev.com/6d38e78f585fc84c2757e9f09ac09f50615c8438/base/synchronization/waitable_event_watcher_unittest.cc

Project Member

Comment 24 by bugdroid1@chromium.org, Jul 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b6bd20af95a73a79ba0d9ae0af95c2882abf0802

commit b6bd20af95a73a79ba0d9ae0af95c2882abf0802
Author: Robert Sesek <rsesek@chromium.org>
Date: Thu Jul 13 18:35:18 2017

Use the threadsafe GTest death test style in two //jingle tests.

Using MessageLoop across the death test fork does not work well on Mac. The
threadsafe death test style solves this issue.

Bug:  741256 
Bug: 681167
Change-Id: I301981a704c284d4264642d4a553d3cc2570e0fa
Reviewed-on: https://chromium-review.googlesource.com/568800
Reviewed-by: Sergey Ulanov <sergeyu@chromium.org>
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Cr-Commit-Position: refs/heads/master@{#486445}
[modify] https://crrev.com/b6bd20af95a73a79ba0d9ae0af95c2882abf0802/jingle/glue/chrome_async_socket_unittest.cc
[modify] https://crrev.com/b6bd20af95a73a79ba0d9ae0af95c2882abf0802/jingle/notifier/base/xmpp_connection_unittest.cc

Project Member

Comment 25 by bugdroid1@chromium.org, Jul 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/bb3210d3952c668037f92d7336be6cbe04fa396d

commit bb3210d3952c668037f92d7336be6cbe04fa396d
Author: Robert Sesek <rsesek@chromium.org>
Date: Thu Jul 13 18:48:15 2017

Reland "Reimplement base::WaitableEvent with a kqueue on Mac."

This is a reland of 2096391cdcc3bdf963e68977e9015aa8cdbe85c6
Original change's description:
> Reimplement base::WaitableEvent with a kqueue on Mac.
> 
> For a single WaitableEvent, a custom EVFILT_USER kevent is used to wait and
> signal. This replaces the default POSIX implementation that uses a
> pthread_cond_t and a boolean flag.
> 
> To implement WaitMany, a new kqueue is created to wait on all the individual
> WaitableEvent's kqueue descriptor. This replaces a complex locking algorithm
> used in the default POSIX implementation.
> 
> For the asynchronous WaitableEventWatcher, a TYPE_READ dispatch_source_t is
> used to watch the WaitableEvent's kqueue. This replaces the POSIX
> implementation of a reference-counted list of async watchers guarded by a lock.
> 
> Microbenchmarks show that the kqueue implementation is significantly faster in
> most cases. The one potential drawback is hitting the low RLIMIT_NOFILE on
> macOS, since each WaitableEvent and WaitableEventWatcher requires a new
> descriptor.
> 
> Bug: 681167
> Change-Id: I135012fdd25e547ffb911fc7adc97c203df38241
> Reviewed-on: https://chromium-review.googlesource.com/553497
> Reviewed-by: Robert Liao <robliao@chromium.org>
> Reviewed-by: Mark Mentovai <mark@chromium.org>
> Commit-Queue: Robert Sesek <rsesek@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#485788}

Bug: 681167
Change-Id: I1b0296022f7bd1a31b074fbbb551bccc3b28d05b
Reviewed-on: https://chromium-review.googlesource.com/568642
Reviewed-by: Mark Mentovai <mark@chromium.org>
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Cr-Commit-Position: refs/heads/master@{#486449}
[modify] https://crrev.com/bb3210d3952c668037f92d7336be6cbe04fa396d/base/BUILD.gn
[modify] https://crrev.com/bb3210d3952c668037f92d7336be6cbe04fa396d/base/synchronization/waitable_event.h
[add] https://crrev.com/bb3210d3952c668037f92d7336be6cbe04fa396d/base/synchronization/waitable_event_mac.cc
[modify] https://crrev.com/bb3210d3952c668037f92d7336be6cbe04fa396d/base/synchronization/waitable_event_watcher.h
[add] https://crrev.com/bb3210d3952c668037f92d7336be6cbe04fa396d/base/synchronization/waitable_event_watcher_mac.cc
[modify] https://crrev.com/bb3210d3952c668037f92d7336be6cbe04fa396d/base/task_scheduler/task_tracker_unittest.cc

From a week ago:

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/20b379ba1a5ce5ae093386c7ffe764d63246b66f

commit 20b379ba1a5ce5ae093386c7ffe764d63246b66f
Author: Michael Nordman <michaeln@google.com>
Date: Fri Jul 21 18:08:09 2017

Revert "Reland "Reimplement base::WaitableEvent with a kqueue on Mac.""

This reverts commit bb3210d3952c668037f92d7336be6cbe04fa396d.

Speculative revert for crbug/744550

Bug:  744550 
Change-Id: I0b8a2464c2932d00349d3fd9a80ba7595c929ff7

TBR=mark,rsesek

Change-Id: I0b8a2464c2932d00349d3fd9a80ba7595c929ff7
Reviewed-on: https://chromium-review.googlesource.com/580639
Reviewed-by: Michael Nordman <michaeln@chromium.org>
Commit-Queue: Michael Nordman <michaeln@chromium.org>
Cr-Commit-Position: refs/heads/master@{#488706}
[modify] https://crrev.com/20b379ba1a5ce5ae093386c7ffe764d63246b66f/base/BUILD.gn
[modify] https://crrev.com/20b379ba1a5ce5ae093386c7ffe764d63246b66f/base/synchronization/waitable_event.h
[delete] https://crrev.com/d702c3905dbe9ef36bfda544228b53bf073702f4/base/synchronization/waitable_event_mac.cc
[modify] https://crrev.com/20b379ba1a5ce5ae093386c7ffe764d63246b66f/base/synchronization/waitable_event_watcher.h
[delete] https://crrev.com/d702c3905dbe9ef36bfda544228b53bf073702f4/base/synchronization/waitable_event_watcher_mac.cc
[modify] https://crrev.com/20b379ba1a5ce5ae093386c7ffe764d63246b66f/base/task_scheduler/task_tracker_unittest.cc
I've been digging into the crash data and have some results to report. There are a handful of issues. Unfortunately, the PCHECKs in the code didn't preserve errno, so it's not easy to get aggregated data about why the assertions are tripping. I just fixed  issue 708245 , which will resolve that going forward.

The crashes bucket into these broad categories:

=== base::WaitableEvent::WaitableEvent (issue 744556)
=== base::WaitableEvent::WaitMany
These is failure to create a kqueue. I pulled a handful of reports and manually extracted the errno value from the TEB in the minidump and found these errnos:

23 ENFILE - Too many open files in system
24 EMFILE - Too many open files

Due to the way errno is stored on Mac, I could only examine crashes not on the main thread and recover errno, which significantly limits the sample size. I don't think it'll change the ultimate conclusion, though. Further corroborating evidence is that many of the crash reports are from macOS "version 0.0.0.0", which is what would be set if crashpad_handler were not able to open the SystemVersion.plist file (i.e., ENFILE).

In dev channel 61.0.3159.5 on my Mac, with 79 tabs open, the browser process is consuming 533 kqueues, the GPU process has 187, and renderers have 45 each, which is a grand total of 4275 kqueues for Chrome. That's a *lot* of FDs (not including all the other FDs used for files and sockets), so it's pretty clear why we're hitting ENFILE/EMFILE. I also noticed that with this instance of Chrome running, my normal `ninja -j250`, Goma exits/crashes with ENFILE.

=== base::WaitableEvent::Signal
This is failing kevent64() which has to be done on a valid kqueue object (since the above crashes account for failure to create one). The crash I looked at was in the disk_cache code, so it may just be a case of issue 638122.

9 EBADF - Bad file descriptor

=== ___ZN4base20WaitableEventWatcher13StartWatchingEPNS_13WaitableEventENS_8CallbackIFvS2_ELNS_8internal8CopyModeE0ELNS5_10RepeatModeE0EEE_block_invoke
This is a subtle bug in the WaitableEventWatcher implementation that is easy to fix. The |source_| in the dispatch event handler was being captured through the |this| pointer; copying the |source_| pointer to a local and capturing that fixes this.

=== base::Process::Terminate (issue 744550)
I think this is actually a bug in the hung renderer dialog, where it's trying to kill an invalid process. But it points to a child process hang. I was able to reproduce this locally when using the Secure Shell extension, which uses NaCl. I captured part of the trace:

  11 base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)  (in Google Chrome Framework)  load address 0x113921000 + 0x1aae722  [callback_forward.h:29]
    11 IPC::ChannelProxy::Context::OnDispatchMessage(IPC::Message const&)  (in Google Chrome Framework)  load address 0x113921000 + 0x1d16e5b  [ipc_message.h:126]
      11 nacl::ManifestServiceChannel::OnMessageReceived(IPC::Message const&)  (in Google Chrome Framework)  load address 0x113921000 + 0x5bdeeb7  [manifest_service_channel.cc:53]
        11 bool IPC::MessageT<PpapiHostMsg_StartupInitializationComplete_Meta::Dispatch<>()  (in Google Chrome Framework)  load address 0x113921000 + 0x5bdf016  [tuple.h:77]
          11 nacl::(anonymous namespace)::ManifestServiceProxy::StartupInitializationComplete()  (in Google Chrome Framework)  load address 0x113921000 + 0x5be7498  [ppb_nacl_private_impl.cc:594]
            11 content::PepperPluginInstanceImpl::SwitchToOutOfProcessProxy(base::FilePath const&, ppapi::PpapiPermissions, IPC::ChannelHandle const&, int, int)  (in Google Chrome Framework)  load address 0x113921000 + 0x5b5a1ba  [pepper_plugin_instance_impl.cc:3159]
              11 content::PluginModule::InitAsProxiedExternalPlugin(content::PepperPluginInstanceImpl*)  (in Google Chrome Framework)  load address 0x113921000 + 0x5b69626  [plugin_module.cc:612]
                11 content::PepperPluginInstanceImpl::ResetAsProxied(scoped_refptr<content::PluginModule>)  (in Google Chrome Framework)  load address 0x113921000 + 0x5b59dd0  [pepper_plugin_instance_impl.cc:3051]
                  11 ppapi::PPP_Instance_Combined::DidCreate(int, unsigned int, char const**, char const**)  (in Google Chrome Framework)  load address 0x113921000 + 0x3019def  [proxy_lock.h:154]
                    11 ppapi::proxy::(anonymous namespace)::DidCreate(int, unsigned int, char const**, char const**)  (in Google Chrome Framework)  load address 0x113921000 + 0x58c69d8  [ppp_instance_proxy.cc:61]
                      11 ppapi::proxy::HostDispatcher::Send(IPC::Message*)  (in Google Chrome Framework)  load address 0x113921000 + 0x58fd1b0  [host_dispatcher.cc:161]
                        11 IPC::SyncChannel::Send(IPC::Message*)  (in Google Chrome Framework)  load address 0x113921000 + 0x1d22387  [atomic:922]
                          11 IPC::SyncChannel::WaitForReply(mojo::SyncHandleRegistry*, IPC::SyncChannel::SyncContext*, bool)  (in Google Chrome Framework)  load address 0x113921000 + 0x1d228e3  [ref_counted.h:515]
                            11 mojo::SyncHandleRegistry::Wait(bool const**, unsigned long)  (in Google Chrome Framework)  load address 0x113921000 + 0x1b770c1  [sync_handle_registry.cc:103]
                              11 mojo::WaitSet::State::Wait(base::WaitableEvent**, unsigned long*, mojo::Handle*, unsigned int*, MojoHandleSignalsState*)  (in Google Chrome Framework)  load address 0x113921000 + 0x1b7a3fc  [wait_set.cc:205]
                                11 base::WaitableEvent::WaitMany(base::WaitableEvent**, unsigned long)  (in Google Chrome Framework)  load address 0x113921000 + 0x1b0d55e  [scoped_generic.h:146]
                                  11 base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&)  (in Google Chrome Framework)  load address 0x113921000 + 0x1b0d283  [waitable_event_mac.cc:86]
                                    11 kevent64  (in libsystem_kernel.dylib) + 10  [0x7fffa4c62dae]

I think there's probably a subtle bug in waitable_event_mac with the way SyncChannel uses it that needs to be investigated.



My conclusion is that the existing waitable_event_mac.cc I landed needs some significant changes. We either need to coalesce WaitableEvents onto a single kqueue to reduce usage (and then dispatch appropriately), or look for an alternative implementation and hope it provides the same performance boost. I do have some ideas for the latter that I'll try first.
Project Member

Comment 28 by bugdroid1@chromium.org, Aug 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/cfab3ada781b1dcd673fb256bc02354d24124f1e

commit cfab3ada781b1dcd673fb256bc02354d24124f1e
Author: Robert Sesek <rsesek@chromium.org>
Date: Tue Aug 01 16:25:43 2017

Add new WaitableEventWatcherTests for WaitableEvent::ResetPolicy::AUTOMATIC.

These verify that WaitableEventWatcher consumes an auto-reset event signal.

Bug: 681167
Change-Id: I610715ceb882da19bd42b7097675d051fb3fa57a
Reviewed-on: https://chromium-review.googlesource.com/592518
Reviewed-by: danakj <danakj@chromium.org>
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Cr-Commit-Position: refs/heads/master@{#491007}
[modify] https://crrev.com/cfab3ada781b1dcd673fb256bc02354d24124f1e/base/synchronization/waitable_event_watcher_unittest.cc

Project Member

Comment 29 by bugdroid1@chromium.org, Aug 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d54c0c1eccbddf90ea69bd14ca1290d721eee9b9

commit d54c0c1eccbddf90ea69bd14ca1290d721eee9b9
Author: Robert Sesek <rsesek@chromium.org>
Date: Thu Aug 03 22:18:28 2017

Reimplement base::WaitableEvent with Mach messaging on Mac.

A WaitableEvent is now a Mach port with a message queue length of one.
Signaling the event enqueues an empty message on the port, or times out
if one is already queued. Waiting on the event receives a message on the
port, dequeuing it if the event is auto-reset, or just peeking it if the
event is manual-reset.

WaitMany is implemented by adding all the events' ports to a port set and
receiving on it.

WaitableEventWatcher is implemented using a TYPE_MACH_RECV dispatch source
for auto-reset events and manual-reset events on 10.12. For manual-reset
events on macOS prior to 10.12, a lock-protected list of watcher callbacks is
used instead of dispatch, because dispatch does not provide timely and
reliable invocation callbacks if the event were to be Reset() immediately after
Signal().

Bug: 681167
Change-Id: I22a9294ad0ae8900d16716d8033285fe91510eda
Reviewed-on: https://chromium-review.googlesource.com/592516
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Reviewed-by: Mark Mentovai <mark@chromium.org>
Cr-Commit-Position: refs/heads/master@{#491860}
[modify] https://crrev.com/d54c0c1eccbddf90ea69bd14ca1290d721eee9b9/base/BUILD.gn
[modify] https://crrev.com/d54c0c1eccbddf90ea69bd14ca1290d721eee9b9/base/synchronization/waitable_event.h
[add] https://crrev.com/d54c0c1eccbddf90ea69bd14ca1290d721eee9b9/base/synchronization/waitable_event_mac.cc
[modify] https://crrev.com/d54c0c1eccbddf90ea69bd14ca1290d721eee9b9/base/synchronization/waitable_event_watcher.h
[add] https://crrev.com/d54c0c1eccbddf90ea69bd14ca1290d721eee9b9/base/synchronization/waitable_event_watcher_mac.cc

Performance data for the Mach messaging implementation.

[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from WaitableEventPerfTest
[ RUN      ] WaitableEventPerfTest.SingleThread
*RESULT signal_time: singlethread-1000-samples= 630 ns/sample
*RESULT wait_time: singlethread-1000-samples= 594 ns/sample
[       OK ] WaitableEventPerfTest.SingleThread (1 ms)
[ RUN      ] WaitableEventPerfTest.MultipleThreads
*RESULT signal_time_waiter: multithread-1000-samples= 1799 ns/sample
*RESULT wait_time_waiter: multithread-1000-samples= 6659 ns/sample
*RESULT signal_time_signaler: multithread-1000-samples= 1658 ns/sample
*RESULT wait_time_signaler: multithread-1000-samples= 4327 ns/sample
[       OK ] WaitableEventPerfTest.MultipleThreads (9 ms)
[ RUN      ] WaitableEventPerfTest.Throughput
*RESULT counts: throughput= 292588 signals
*RESULT signal_time: throughput= 776 ns/sample
*RESULT wait_time: throughput= 1944 ns/sample
[       OK ] WaitableEventPerfTest.Throughput (1003 ms)
[----------] 3 tests from WaitableEventPerfTest (1013 ms total)

[----------] 1 test from WaitableEventThreadPerfTest
[ RUN      ] WaitableEventThreadPerfTest.EventPingPong
*RESULT task: 4_WaitableEvent_Threads_time = 4.24148 us/hop
*RESULT task: 4_WaitableEvent_Threads_cpu = 4.54588 us/hop
[       OK ] WaitableEventThreadPerfTest.EventPingPong (426 ms)
[----------] 1 test from WaitableEventThreadPerfTest (426 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (1439 ms total)
[  PASSED  ] 4 tests.

Also have real-world impact data from https://chromeperf.appspot.com/group_report?rev=491860:

- 17-245% increase on various media_perftests
- 10-30% increase on blink_perf.canvas
- 16-20% decrease on power.trivial_pages idle_wakeups_total!
- 6-13% increase on blob_storage perf tests
- 20% increase on media.tough_video_cases
Project Member

Comment 31 by bugdroid1@chromium.org, Aug 24 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/61f09f37c6d4a16378cf7de73a94bf2562a73232

commit 61f09f37c6d4a16378cf7de73a94bf2562a73232
Author: Robert Sesek <rsesek@chromium.org>
Date: Thu Aug 24 22:32:56 2017

[Mac] Implement WaitableEvent::WaitMany with a kqueue on 10.12+.

The current implementation using Mach port sets may cause system instability
on macOS 10.11+. The kqueue implementation does not work until 10.12, so this
leaves 10.11 potentially still affected. 10.9, 10.10, 10.12, and 10.13 are
okay after this change.

Bug:  756102 
Bug: 681167
Change-Id: I031437cf414c71863c59d0726f8fdf7e4810a365
Reviewed-on: https://chromium-review.googlesource.com/624188
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Reviewed-by: Mark Mentovai <mark@chromium.org>
Cr-Commit-Position: refs/heads/master@{#497219}
[modify] https://crrev.com/61f09f37c6d4a16378cf7de73a94bf2562a73232/base/synchronization/waitable_event_mac.cc

Project Member

Comment 32 by bugdroid1@chromium.org, Aug 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5cd8eef498e28e1aacb9109645c001bbbfe1bc26

commit 5cd8eef498e28e1aacb9109645c001bbbfe1bc26
Author: Robert Sesek <rsesek@chromium.org>
Date: Mon Aug 28 19:51:25 2017

[Mac] A potential fix for WaitableEvent::WaitMany on macOS 10.11.

This avoids the creation of port sets, which could cause system instability.
Instead, this uses libdispatch sources to watch the event ports.

Bug:  756102 
Bug: 681167
Change-Id: Ib6fe015802dbe988d740b8319dd4a92601f17b26
Reviewed-on: https://chromium-review.googlesource.com/634924
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Reviewed-by: Mark Mentovai <mark@chromium.org>
Cr-Commit-Position: refs/heads/master@{#497850}
[modify] https://crrev.com/5cd8eef498e28e1aacb9109645c001bbbfe1bc26/base/synchronization/waitable_event_mac.cc

Cc: -roc...@chromium.org rockot@google.com

Sign in to add a comment