WaitableEvent is much slower on POSIX than Windows |
||||||||
Issue descriptionTwo surprising issues: -windows times are very stable across runs, while linux runs sometime have a 50% difference in subsequent runs -even the fastest linux runs are twice as slow as Windows I'm more concerned about the second issue. I even tried adding is_official_build but it didn't make a difference. Everything is slower across all the tests, i.e. ipc across processes, in-process, and even just callback calls. Since the calls are amortized over 50K runs (for most tests), I don't think proces setup should factor in (and regardless, Linux is faster for that). Thoughts?
,
Jan 14 2017
Out of curiosity: could you please talk more about the observation regarding BattOr? Thanks!
,
Jan 14 2017
They are observing unexpectedly high latency between writing to the serial port and having the FileDescriptorWatcher wake up to read a corresponding response from the device. If the device is instead polled synchronously on a background thread, the latency is much lower (~10x difference).
,
Jan 14 2017
ipc_perftests has a bunch of different tests that can help isolate this (sync to head). Looking at the sending 144 bytes roundtrip case (same ballpark as 12 or 1728 bytes), here's a comparison between windows and linux. My Windows machine si z840 while my Linux is Z620, so while I would expect a small (maybe 10-20% at most) difference, the results are more like 2-3x. Windows Linux IPC 35.7 62.5 Mojo cross-process 33 53.8 Mojo in-process multi-thread 10.2 28.7 Mojo in-process single-thread 8.9 14.5 Callback multi-thread 6.9 21.4 Callback single-thread with PostTask 2 2.84 Callback single-thread with no PostTask 0.13 0.156 We can even take out the Mojo part of it by looking at the callback variants, just to reduce factors. Two callbacks cost 21us on Linux vs 7 on Windows which is really hard to believe. Seems like if we could track down a way to improve this it would have big improvements on more than just Mojo :)
,
Jan 14 2017
What kind of MessagePump is used in the multi-thread case?
,
Jan 14 2017
Looks like our messages crossed :) I was tracing this code and the callback tests just use default message pump. so libevent isn't used. Since MessagePumpDefault is trivial, looks like it's Waitable event posix implementation that is the culprit.
,
Jan 14 2017
I don't expect us to be able to improve much on the default pump, but I guess it's still worth investigating what exactly is the issue there. I'd be interested to know what happens if the target thread is switched to an IO pump instead. i.e. do we see a much higher latency gap between Win and Linux in that case? That would pretty effectively test the speculation about libevent.
,
Jan 14 2017
I'm going to try some experiments on waitable event's posix implementation.
,
Apr 5 2017
I couldn't find time to work on this unfortunately.
,
Apr 10 2017
Adding it to my queue since low-level performance in general is something I plan to take a closer look at this quarter.
,
May 3 2017
,
Jun 8 2017
,
Jun 29 2017
I was looking at WaitableEvent on Mac and noticed that it's implemented using pthread_cond_t and pthread_mutex_t, which are pretty heavyweight for what this does. Using an async waiter adds an additional pthread_mutex_t. I've wrote a new implementation on Mac using kqueue() and zero locks. (Though more improvements are possible if we could switch MessagePumpDefault on Mac to not be a CFRunLoop, which is also quite heavyweight.) On Linux I've also started working on an implementation that uses eventfd(), which works very similar to the Mac kqueue() implementation.
,
Jun 30 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3b33337785b9c039c15ffb136d5ac2a4ad93ba80 commit 3b33337785b9c039c15ffb136d5ac2a4ad93ba80 Author: Robert Sesek <rsesek@chromium.org> Date: Fri Jun 30 18:21:56 2017 Convert WaitableEventWatcherTest to use parameterized tests. Bug: 681167 Change-Id: Id647559bb83db517315c3b0a28649bf4e0d1450f Reviewed-on: https://chromium-review.googlesource.com/556519 Reviewed-by: danakj <danakj@chromium.org> Commit-Queue: Robert Sesek <rsesek@chromium.org> Cr-Commit-Position: refs/heads/master@{#483764} [modify] https://crrev.com/3b33337785b9c039c15ffb136d5ac2a4ad93ba80/base/synchronization/waitable_event_watcher_unittest.cc
,
Jun 30 2017
Thanks for taking this on! It would be great if you can post before/after numbers from ipc_perftests to see how this impact mojo. For mac, you'll have to patch https://codereview.chromium.org/2932923002/. I haven't had time to make the fix work with ipc_tests
,
Jun 30 2017
Re: #15: Yes I will post perftest numbers. I did notice that ipc_perftests was failing on Mac so I needed to look into that -- I will try your CL next week. I'm also going to add a dedicated base_perftest for WaitableEvent, so as to not exercise any other machinery. As I noted in #13, MessagePumpDefault on Mac is pretty slow, so I think ipc_perftests may be dominated by that at the moment. (I'll work on that too next :)
,
Jun 30 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b44f896ea1f81f87f51be2d95e962bf21859ab0d commit b44f896ea1f81f87f51be2d95e962bf21859ab0d Author: Robert Sesek <rsesek@chromium.org> Date: Fri Jun 30 21:21:13 2017 Add additional tests for WaitableEvent and WaitableEventWatcher. These cover behaviors that clients of these classes were relying upon, but which were not tested by base_unittests. Bug: 681167 Change-Id: Ib07ef1387bfe653560aeda6affc5b56ab8e54d85 Reviewed-on: https://chromium-review.googlesource.com/558505 Reviewed-by: danakj <danakj@chromium.org> Commit-Queue: Robert Sesek <rsesek@chromium.org> Cr-Commit-Position: refs/heads/master@{#483821} [modify] https://crrev.com/b44f896ea1f81f87f51be2d95e962bf21859ab0d/base/synchronization/waitable_event_unittest.cc [modify] https://crrev.com/b44f896ea1f81f87f51be2d95e962bf21859ab0d/base/synchronization/waitable_event_watcher_unittest.cc
,
Jul 5 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/da4be990616dee7746ab65aab2026c10a61d1586 commit da4be990616dee7746ab65aab2026c10a61d1586 Author: Robert Sesek <rsesek@chromium.org> Date: Wed Jul 05 17:38:14 2017 Do not Reset the waitable events in TraceEventTestFixture.ThreadOnceBlocking. The events are created as automatic-reset, so it is not necessary to reset them manually. Doing so for the |task_start_event| and |task_stop_event| is racy, since the test main thread could signal and then immediately reset the event. If the second thread does not make progress between those two calls, then it will miss observing the signal and the test will hang forever. Bug: 681167 Change-Id: Ia65eccd5fbbb83af53f4501ce4abb99ed3c9131c Reviewed-on: https://chromium-review.googlesource.com/559510 Reviewed-by: Primiano Tucci <primiano@chromium.org> Commit-Queue: Robert Sesek <rsesek@chromium.org> Cr-Commit-Position: refs/heads/master@{#484301} [modify] https://crrev.com/da4be990616dee7746ab65aab2026c10a61d1586/base/trace_event/trace_event_unittest.cc
,
Jul 5 2017
As I suspected, ipc_perftests doesn't really show improvement because it's dominated by CFRunLoop internals. But I ran base_perftests and there is significant improvement: Existing implementation using existing test: [ RUN ] WaitableEventPerfTest.EventPingPong *RESULT task: 4_WaitableEvent_Threads_time = 22.55947 us/hop *RESULT task: 4_WaitableEvent_Threads_cpu = 11.68782 us/hop [ OK ] WaitableEventPerfTest.EventPingPong (2259 ms) Kqueue-based implementation: [ RUN ] WaitableEventPerfTest.EventPingPong *RESULT task: 4_WaitableEvent_Threads_time = 4.03602 us/hop *RESULT task: 4_WaitableEvent_Threads_cpu = 4.12879 us/hop [ OK ] WaitableEventPerfTest.EventPingPong (407 ms) Separately, my new perf test shows kqueue is slightly slower than the existing implementation for a contrived Signal/Wait in a tight loop on a single thread. (Shot-in-the-dark guess: the pthread internals aren't even parking the lock because it's hot and un-contended on a single thread, but kqueue is always going into a system call). All other metrics show good improvement. The first two tests time how long it takes to do 1000 Signals/Waits; the third test sees how many Signals/Waits it can do in one second. The kqueue implementation can do 3.3x more in the same amount of time. Before: [==========] Running 4 tests from 2 test cases. [----------] Global test environment set-up. [----------] 3 tests from WaitableEventPerfTest [ RUN ] WaitableEventPerfTest.SingleThread *RESULT signal_time: singlethread-1000-samples= 80 ns/sample *RESULT wait_time: singlethread-1000-samples= 82 ns/sample [ OK ] WaitableEventPerfTest.SingleThread (0 ms) [ RUN ] WaitableEventPerfTest.MultipleThreads *RESULT signal_time_waiter: multithread-1000-samples= 3091 ns/sample *RESULT wait_time_waiter: multithread-1000-samples= 43894 ns/sample *RESULT signal_time_signaler: multithread-1000-samples= 3103 ns/sample *RESULT wait_time_signaler: multithread-1000-samples= 43757 ns/sample [ OK ] WaitableEventPerfTest.MultipleThreads (48 ms) [ RUN ] WaitableEventPerfTest.Throughput *RESULT counts: throughput= 91153 signals *RESULT signal_time: throughput= 97929 ns/sample *RESULT wait_time: throughput= 98955 ns/sample [ OK ] WaitableEventPerfTest.Throughput (1001 ms) [----------] 3 tests from WaitableEventPerfTest (1049 ms total) After: [==========] Running 4 tests from 2 test cases. [----------] Global test environment set-up. [----------] 3 tests from WaitableEventPerfTest [ RUN ] WaitableEventPerfTest.SingleThread *RESULT signal_time: singlethread-1000-samples= 528 ns/sample *RESULT wait_time: singlethread-1000-samples= 644 ns/sample [ OK ] WaitableEventPerfTest.SingleThread (1 ms) [ RUN ] WaitableEventPerfTest.MultipleThreads *RESULT signal_time_waiter: multithread-1000-samples= 2164 ns/sample *RESULT wait_time_waiter: multithread-1000-samples= 11330 ns/sample *RESULT signal_time_signaler: multithread-1000-samples= 2876 ns/sample *RESULT wait_time_signaler: multithread-1000-samples= 10920 ns/sample [ OK ] WaitableEventPerfTest.MultipleThreads (15 ms) [ RUN ] WaitableEventPerfTest.Throughput *RESULT counts: throughput= 308677 signals *RESULT signal_time: throughput= 66393 ns/sample *RESULT wait_time: throughput= 96630 ns/sample [ OK ] WaitableEventPerfTest.Throughput (1004 ms) [----------] 3 tests from WaitableEventPerfTest (1020 ms total)
,
Jul 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/431258198d910ec58ffa149817c680a8ab5ddc61 commit 431258198d910ec58ffa149817c680a8ab5ddc61 Author: Robert Sesek <rsesek@chromium.org> Date: Fri Jul 07 17:15:39 2017 Add a dedicated WaitableEventPerfTest. This adds tests for three cases: - Timing how long it takes to Signal/Wait 1000x in a tight loop. - Timing how long it takes to Signal/Wait 1000x between threads. - Counting how many Signal/Waits can be done in 1s between threads. This also renames the existing WaitableEventPerfTest to WaitableEventThreadPerfTest, since it's part of thread_perftest.cc. Bug: 681167 Change-Id: I5180f69b5901da3d1ee808eac90f1b455b1e970d Reviewed-on: https://chromium-review.googlesource.com/560152 Commit-Queue: Robert Sesek <rsesek@chromium.org> Reviewed-by: danakj <danakj@chromium.org> Cr-Commit-Position: refs/heads/master@{#484963} [modify] https://crrev.com/431258198d910ec58ffa149817c680a8ab5ddc61/base/BUILD.gn [add] https://crrev.com/431258198d910ec58ffa149817c680a8ab5ddc61/base/synchronization/waitable_event_perftest.cc [modify] https://crrev.com/431258198d910ec58ffa149817c680a8ab5ddc61/base/threading/thread_perftest.cc
,
Jul 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2096391cdcc3bdf963e68977e9015aa8cdbe85c6 commit 2096391cdcc3bdf963e68977e9015aa8cdbe85c6 Author: Robert Sesek <rsesek@chromium.org> Date: Wed Jul 12 01:54:21 2017 Reimplement base::WaitableEvent with a kqueue on Mac. For a single WaitableEvent, a custom EVFILT_USER kevent is used to wait and signal. This replaces the default POSIX implementation that uses a pthread_cond_t and a boolean flag. To implement WaitMany, a new kqueue is created to wait on all the individual WaitableEvent's kqueue descriptor. This replaces a complex locking algorithm used in the default POSIX implementation. For the asynchronous WaitableEventWatcher, a TYPE_READ dispatch_source_t is used to watch the WaitableEvent's kqueue. This replaces the POSIX implementation of a reference-counted list of async watchers guarded by a lock. Microbenchmarks show that the kqueue implementation is significantly faster in most cases. The one potential drawback is hitting the low RLIMIT_NOFILE on macOS, since each WaitableEvent and WaitableEventWatcher requires a new descriptor. Bug: 681167 Change-Id: I135012fdd25e547ffb911fc7adc97c203df38241 Reviewed-on: https://chromium-review.googlesource.com/553497 Reviewed-by: Robert Liao <robliao@chromium.org> Reviewed-by: Mark Mentovai <mark@chromium.org> Commit-Queue: Robert Sesek <rsesek@chromium.org> Cr-Commit-Position: refs/heads/master@{#485788} [modify] https://crrev.com/2096391cdcc3bdf963e68977e9015aa8cdbe85c6/base/BUILD.gn [modify] https://crrev.com/2096391cdcc3bdf963e68977e9015aa8cdbe85c6/base/synchronization/waitable_event.h [add] https://crrev.com/2096391cdcc3bdf963e68977e9015aa8cdbe85c6/base/synchronization/waitable_event_mac.cc [modify] https://crrev.com/2096391cdcc3bdf963e68977e9015aa8cdbe85c6/base/synchronization/waitable_event_watcher.h [add] https://crrev.com/2096391cdcc3bdf963e68977e9015aa8cdbe85c6/base/synchronization/waitable_event_watcher_mac.cc [modify] https://crrev.com/2096391cdcc3bdf963e68977e9015aa8cdbe85c6/base/task_scheduler/task_tracker_unittest.cc
,
Jul 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3b99851e887099660ee0f602582501f2c5827d41 commit 3b99851e887099660ee0f602582501f2c5827d41 Author: Keishi Hattori <keishi@chromium.org> Date: Wed Jul 12 04:19:47 2017 Revert "Reimplement base::WaitableEvent with a kqueue on Mac." This reverts commit 2096391cdcc3bdf963e68977e9015aa8cdbe85c6. Reason for revert: Maybe caused jingle_unittests to fail crbug.com/741256 Original change's description: > Reimplement base::WaitableEvent with a kqueue on Mac. > > For a single WaitableEvent, a custom EVFILT_USER kevent is used to wait and > signal. This replaces the default POSIX implementation that uses a > pthread_cond_t and a boolean flag. > > To implement WaitMany, a new kqueue is created to wait on all the individual > WaitableEvent's kqueue descriptor. This replaces a complex locking algorithm > used in the default POSIX implementation. > > For the asynchronous WaitableEventWatcher, a TYPE_READ dispatch_source_t is > used to watch the WaitableEvent's kqueue. This replaces the POSIX > implementation of a reference-counted list of async watchers guarded by a lock. > > Microbenchmarks show that the kqueue implementation is significantly faster in > most cases. The one potential drawback is hitting the low RLIMIT_NOFILE on > macOS, since each WaitableEvent and WaitableEventWatcher requires a new > descriptor. > > Bug: 681167 > Change-Id: I135012fdd25e547ffb911fc7adc97c203df38241 > Reviewed-on: https://chromium-review.googlesource.com/553497 > Reviewed-by: Robert Liao <robliao@chromium.org> > Reviewed-by: Mark Mentovai <mark@chromium.org> > Commit-Queue: Robert Sesek <rsesek@chromium.org> > Cr-Commit-Position: refs/heads/master@{#485788} TBR=robliao@chromium.org,rsesek@chromium.org,mark@chromium.org Change-Id: I5b8740bf353f86b0ac5b600793826cd7724cd784 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 681167 Reviewed-on: https://chromium-review.googlesource.com/566700 Reviewed-by: Keishi Hattori <keishi@chromium.org> Commit-Queue: Keishi Hattori <keishi@chromium.org> Cr-Commit-Position: refs/heads/master@{#485853} [modify] https://crrev.com/3b99851e887099660ee0f602582501f2c5827d41/base/BUILD.gn [modify] https://crrev.com/3b99851e887099660ee0f602582501f2c5827d41/base/synchronization/waitable_event.h [delete] https://crrev.com/623921bf99007b51cd9652a9adff459da54f9519/base/synchronization/waitable_event_mac.cc [modify] https://crrev.com/3b99851e887099660ee0f602582501f2c5827d41/base/synchronization/waitable_event_watcher.h [delete] https://crrev.com/623921bf99007b51cd9652a9adff459da54f9519/base/synchronization/waitable_event_watcher_mac.cc [modify] https://crrev.com/3b99851e887099660ee0f602582501f2c5827d41/base/task_scheduler/task_tracker_unittest.cc
,
Jul 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6d38e78f585fc84c2757e9f09ac09f50615c8438 commit 6d38e78f585fc84c2757e9f09ac09f50615c8438 Author: Robert Sesek <rsesek@chromium.org> Date: Thu Jul 13 00:46:50 2017 Add new test WaitableEventWatcherDeletionTest.DeleteWatcherBeforeCallback. This verifies the potential for use-after-free that can occur if the WaitableEventWatcher is deleted after the event is signaled but before the callback is run. Bug: 681167 Bug: 741256 Change-Id: Ic5ca2bc48cb51d5f50be7e67773f8213fbbc80d3 Reviewed-on: https://chromium-review.googlesource.com/568371 Commit-Queue: Robert Sesek <rsesek@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Cr-Commit-Position: refs/heads/master@{#486191} [modify] https://crrev.com/6d38e78f585fc84c2757e9f09ac09f50615c8438/base/synchronization/waitable_event_watcher_unittest.cc
,
Jul 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b6bd20af95a73a79ba0d9ae0af95c2882abf0802 commit b6bd20af95a73a79ba0d9ae0af95c2882abf0802 Author: Robert Sesek <rsesek@chromium.org> Date: Thu Jul 13 18:35:18 2017 Use the threadsafe GTest death test style in two //jingle tests. Using MessageLoop across the death test fork does not work well on Mac. The threadsafe death test style solves this issue. Bug: 741256 Bug: 681167 Change-Id: I301981a704c284d4264642d4a553d3cc2570e0fa Reviewed-on: https://chromium-review.googlesource.com/568800 Reviewed-by: Sergey Ulanov <sergeyu@chromium.org> Commit-Queue: Robert Sesek <rsesek@chromium.org> Cr-Commit-Position: refs/heads/master@{#486445} [modify] https://crrev.com/b6bd20af95a73a79ba0d9ae0af95c2882abf0802/jingle/glue/chrome_async_socket_unittest.cc [modify] https://crrev.com/b6bd20af95a73a79ba0d9ae0af95c2882abf0802/jingle/notifier/base/xmpp_connection_unittest.cc
,
Jul 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/bb3210d3952c668037f92d7336be6cbe04fa396d commit bb3210d3952c668037f92d7336be6cbe04fa396d Author: Robert Sesek <rsesek@chromium.org> Date: Thu Jul 13 18:48:15 2017 Reland "Reimplement base::WaitableEvent with a kqueue on Mac." This is a reland of 2096391cdcc3bdf963e68977e9015aa8cdbe85c6 Original change's description: > Reimplement base::WaitableEvent with a kqueue on Mac. > > For a single WaitableEvent, a custom EVFILT_USER kevent is used to wait and > signal. This replaces the default POSIX implementation that uses a > pthread_cond_t and a boolean flag. > > To implement WaitMany, a new kqueue is created to wait on all the individual > WaitableEvent's kqueue descriptor. This replaces a complex locking algorithm > used in the default POSIX implementation. > > For the asynchronous WaitableEventWatcher, a TYPE_READ dispatch_source_t is > used to watch the WaitableEvent's kqueue. This replaces the POSIX > implementation of a reference-counted list of async watchers guarded by a lock. > > Microbenchmarks show that the kqueue implementation is significantly faster in > most cases. The one potential drawback is hitting the low RLIMIT_NOFILE on > macOS, since each WaitableEvent and WaitableEventWatcher requires a new > descriptor. > > Bug: 681167 > Change-Id: I135012fdd25e547ffb911fc7adc97c203df38241 > Reviewed-on: https://chromium-review.googlesource.com/553497 > Reviewed-by: Robert Liao <robliao@chromium.org> > Reviewed-by: Mark Mentovai <mark@chromium.org> > Commit-Queue: Robert Sesek <rsesek@chromium.org> > Cr-Commit-Position: refs/heads/master@{#485788} Bug: 681167 Change-Id: I1b0296022f7bd1a31b074fbbb551bccc3b28d05b Reviewed-on: https://chromium-review.googlesource.com/568642 Reviewed-by: Mark Mentovai <mark@chromium.org> Commit-Queue: Robert Sesek <rsesek@chromium.org> Cr-Commit-Position: refs/heads/master@{#486449} [modify] https://crrev.com/bb3210d3952c668037f92d7336be6cbe04fa396d/base/BUILD.gn [modify] https://crrev.com/bb3210d3952c668037f92d7336be6cbe04fa396d/base/synchronization/waitable_event.h [add] https://crrev.com/bb3210d3952c668037f92d7336be6cbe04fa396d/base/synchronization/waitable_event_mac.cc [modify] https://crrev.com/bb3210d3952c668037f92d7336be6cbe04fa396d/base/synchronization/waitable_event_watcher.h [add] https://crrev.com/bb3210d3952c668037f92d7336be6cbe04fa396d/base/synchronization/waitable_event_watcher_mac.cc [modify] https://crrev.com/bb3210d3952c668037f92d7336be6cbe04fa396d/base/task_scheduler/task_tracker_unittest.cc
,
Jul 26 2017
From a week ago: The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/20b379ba1a5ce5ae093386c7ffe764d63246b66f commit 20b379ba1a5ce5ae093386c7ffe764d63246b66f Author: Michael Nordman <michaeln@google.com> Date: Fri Jul 21 18:08:09 2017 Revert "Reland "Reimplement base::WaitableEvent with a kqueue on Mac."" This reverts commit bb3210d3952c668037f92d7336be6cbe04fa396d. Speculative revert for crbug/744550 Bug: 744550 Change-Id: I0b8a2464c2932d00349d3fd9a80ba7595c929ff7 TBR=mark,rsesek Change-Id: I0b8a2464c2932d00349d3fd9a80ba7595c929ff7 Reviewed-on: https://chromium-review.googlesource.com/580639 Reviewed-by: Michael Nordman <michaeln@chromium.org> Commit-Queue: Michael Nordman <michaeln@chromium.org> Cr-Commit-Position: refs/heads/master@{#488706} [modify] https://crrev.com/20b379ba1a5ce5ae093386c7ffe764d63246b66f/base/BUILD.gn [modify] https://crrev.com/20b379ba1a5ce5ae093386c7ffe764d63246b66f/base/synchronization/waitable_event.h [delete] https://crrev.com/d702c3905dbe9ef36bfda544228b53bf073702f4/base/synchronization/waitable_event_mac.cc [modify] https://crrev.com/20b379ba1a5ce5ae093386c7ffe764d63246b66f/base/synchronization/waitable_event_watcher.h [delete] https://crrev.com/d702c3905dbe9ef36bfda544228b53bf073702f4/base/synchronization/waitable_event_watcher_mac.cc [modify] https://crrev.com/20b379ba1a5ce5ae093386c7ffe764d63246b66f/base/task_scheduler/task_tracker_unittest.cc
,
Jul 27 2017
I've been digging into the crash data and have some results to report. There are a handful of issues. Unfortunately, the PCHECKs in the code didn't preserve errno, so it's not easy to get aggregated data about why the assertions are tripping. I just fixed issue 708245 , which will resolve that going forward. The crashes bucket into these broad categories: === base::WaitableEvent::WaitableEvent (issue 744556) === base::WaitableEvent::WaitMany These is failure to create a kqueue. I pulled a handful of reports and manually extracted the errno value from the TEB in the minidump and found these errnos: 23 ENFILE - Too many open files in system 24 EMFILE - Too many open files Due to the way errno is stored on Mac, I could only examine crashes not on the main thread and recover errno, which significantly limits the sample size. I don't think it'll change the ultimate conclusion, though. Further corroborating evidence is that many of the crash reports are from macOS "version 0.0.0.0", which is what would be set if crashpad_handler were not able to open the SystemVersion.plist file (i.e., ENFILE). In dev channel 61.0.3159.5 on my Mac, with 79 tabs open, the browser process is consuming 533 kqueues, the GPU process has 187, and renderers have 45 each, which is a grand total of 4275 kqueues for Chrome. That's a *lot* of FDs (not including all the other FDs used for files and sockets), so it's pretty clear why we're hitting ENFILE/EMFILE. I also noticed that with this instance of Chrome running, my normal `ninja -j250`, Goma exits/crashes with ENFILE. === base::WaitableEvent::Signal This is failing kevent64() which has to be done on a valid kqueue object (since the above crashes account for failure to create one). The crash I looked at was in the disk_cache code, so it may just be a case of issue 638122. 9 EBADF - Bad file descriptor === ___ZN4base20WaitableEventWatcher13StartWatchingEPNS_13WaitableEventENS_8CallbackIFvS2_ELNS_8internal8CopyModeE0ELNS5_10RepeatModeE0EEE_block_invoke This is a subtle bug in the WaitableEventWatcher implementation that is easy to fix. The |source_| in the dispatch event handler was being captured through the |this| pointer; copying the |source_| pointer to a local and capturing that fixes this. === base::Process::Terminate (issue 744550) I think this is actually a bug in the hung renderer dialog, where it's trying to kill an invalid process. But it points to a child process hang. I was able to reproduce this locally when using the Secure Shell extension, which uses NaCl. I captured part of the trace: 11 base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) (in Google Chrome Framework) load address 0x113921000 + 0x1aae722 [callback_forward.h:29] 11 IPC::ChannelProxy::Context::OnDispatchMessage(IPC::Message const&) (in Google Chrome Framework) load address 0x113921000 + 0x1d16e5b [ipc_message.h:126] 11 nacl::ManifestServiceChannel::OnMessageReceived(IPC::Message const&) (in Google Chrome Framework) load address 0x113921000 + 0x5bdeeb7 [manifest_service_channel.cc:53] 11 bool IPC::MessageT<PpapiHostMsg_StartupInitializationComplete_Meta::Dispatch<>() (in Google Chrome Framework) load address 0x113921000 + 0x5bdf016 [tuple.h:77] 11 nacl::(anonymous namespace)::ManifestServiceProxy::StartupInitializationComplete() (in Google Chrome Framework) load address 0x113921000 + 0x5be7498 [ppb_nacl_private_impl.cc:594] 11 content::PepperPluginInstanceImpl::SwitchToOutOfProcessProxy(base::FilePath const&, ppapi::PpapiPermissions, IPC::ChannelHandle const&, int, int) (in Google Chrome Framework) load address 0x113921000 + 0x5b5a1ba [pepper_plugin_instance_impl.cc:3159] 11 content::PluginModule::InitAsProxiedExternalPlugin(content::PepperPluginInstanceImpl*) (in Google Chrome Framework) load address 0x113921000 + 0x5b69626 [plugin_module.cc:612] 11 content::PepperPluginInstanceImpl::ResetAsProxied(scoped_refptr<content::PluginModule>) (in Google Chrome Framework) load address 0x113921000 + 0x5b59dd0 [pepper_plugin_instance_impl.cc:3051] 11 ppapi::PPP_Instance_Combined::DidCreate(int, unsigned int, char const**, char const**) (in Google Chrome Framework) load address 0x113921000 + 0x3019def [proxy_lock.h:154] 11 ppapi::proxy::(anonymous namespace)::DidCreate(int, unsigned int, char const**, char const**) (in Google Chrome Framework) load address 0x113921000 + 0x58c69d8 [ppp_instance_proxy.cc:61] 11 ppapi::proxy::HostDispatcher::Send(IPC::Message*) (in Google Chrome Framework) load address 0x113921000 + 0x58fd1b0 [host_dispatcher.cc:161] 11 IPC::SyncChannel::Send(IPC::Message*) (in Google Chrome Framework) load address 0x113921000 + 0x1d22387 [atomic:922] 11 IPC::SyncChannel::WaitForReply(mojo::SyncHandleRegistry*, IPC::SyncChannel::SyncContext*, bool) (in Google Chrome Framework) load address 0x113921000 + 0x1d228e3 [ref_counted.h:515] 11 mojo::SyncHandleRegistry::Wait(bool const**, unsigned long) (in Google Chrome Framework) load address 0x113921000 + 0x1b770c1 [sync_handle_registry.cc:103] 11 mojo::WaitSet::State::Wait(base::WaitableEvent**, unsigned long*, mojo::Handle*, unsigned int*, MojoHandleSignalsState*) (in Google Chrome Framework) load address 0x113921000 + 0x1b7a3fc [wait_set.cc:205] 11 base::WaitableEvent::WaitMany(base::WaitableEvent**, unsigned long) (in Google Chrome Framework) load address 0x113921000 + 0x1b0d55e [scoped_generic.h:146] 11 base::WaitableEvent::TimedWaitUntil(base::TimeTicks const&) (in Google Chrome Framework) load address 0x113921000 + 0x1b0d283 [waitable_event_mac.cc:86] 11 kevent64 (in libsystem_kernel.dylib) + 10 [0x7fffa4c62dae] I think there's probably a subtle bug in waitable_event_mac with the way SyncChannel uses it that needs to be investigated. My conclusion is that the existing waitable_event_mac.cc I landed needs some significant changes. We either need to coalesce WaitableEvents onto a single kqueue to reduce usage (and then dispatch appropriately), or look for an alternative implementation and hope it provides the same performance boost. I do have some ideas for the latter that I'll try first.
,
Aug 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/cfab3ada781b1dcd673fb256bc02354d24124f1e commit cfab3ada781b1dcd673fb256bc02354d24124f1e Author: Robert Sesek <rsesek@chromium.org> Date: Tue Aug 01 16:25:43 2017 Add new WaitableEventWatcherTests for WaitableEvent::ResetPolicy::AUTOMATIC. These verify that WaitableEventWatcher consumes an auto-reset event signal. Bug: 681167 Change-Id: I610715ceb882da19bd42b7097675d051fb3fa57a Reviewed-on: https://chromium-review.googlesource.com/592518 Reviewed-by: danakj <danakj@chromium.org> Commit-Queue: Robert Sesek <rsesek@chromium.org> Cr-Commit-Position: refs/heads/master@{#491007} [modify] https://crrev.com/cfab3ada781b1dcd673fb256bc02354d24124f1e/base/synchronization/waitable_event_watcher_unittest.cc
,
Aug 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d54c0c1eccbddf90ea69bd14ca1290d721eee9b9 commit d54c0c1eccbddf90ea69bd14ca1290d721eee9b9 Author: Robert Sesek <rsesek@chromium.org> Date: Thu Aug 03 22:18:28 2017 Reimplement base::WaitableEvent with Mach messaging on Mac. A WaitableEvent is now a Mach port with a message queue length of one. Signaling the event enqueues an empty message on the port, or times out if one is already queued. Waiting on the event receives a message on the port, dequeuing it if the event is auto-reset, or just peeking it if the event is manual-reset. WaitMany is implemented by adding all the events' ports to a port set and receiving on it. WaitableEventWatcher is implemented using a TYPE_MACH_RECV dispatch source for auto-reset events and manual-reset events on 10.12. For manual-reset events on macOS prior to 10.12, a lock-protected list of watcher callbacks is used instead of dispatch, because dispatch does not provide timely and reliable invocation callbacks if the event were to be Reset() immediately after Signal(). Bug: 681167 Change-Id: I22a9294ad0ae8900d16716d8033285fe91510eda Reviewed-on: https://chromium-review.googlesource.com/592516 Commit-Queue: Robert Sesek <rsesek@chromium.org> Reviewed-by: Mark Mentovai <mark@chromium.org> Cr-Commit-Position: refs/heads/master@{#491860} [modify] https://crrev.com/d54c0c1eccbddf90ea69bd14ca1290d721eee9b9/base/BUILD.gn [modify] https://crrev.com/d54c0c1eccbddf90ea69bd14ca1290d721eee9b9/base/synchronization/waitable_event.h [add] https://crrev.com/d54c0c1eccbddf90ea69bd14ca1290d721eee9b9/base/synchronization/waitable_event_mac.cc [modify] https://crrev.com/d54c0c1eccbddf90ea69bd14ca1290d721eee9b9/base/synchronization/waitable_event_watcher.h [add] https://crrev.com/d54c0c1eccbddf90ea69bd14ca1290d721eee9b9/base/synchronization/waitable_event_watcher_mac.cc
,
Aug 7 2017
Performance data for the Mach messaging implementation. [==========] Running 4 tests from 2 test cases. [----------] Global test environment set-up. [----------] 3 tests from WaitableEventPerfTest [ RUN ] WaitableEventPerfTest.SingleThread *RESULT signal_time: singlethread-1000-samples= 630 ns/sample *RESULT wait_time: singlethread-1000-samples= 594 ns/sample [ OK ] WaitableEventPerfTest.SingleThread (1 ms) [ RUN ] WaitableEventPerfTest.MultipleThreads *RESULT signal_time_waiter: multithread-1000-samples= 1799 ns/sample *RESULT wait_time_waiter: multithread-1000-samples= 6659 ns/sample *RESULT signal_time_signaler: multithread-1000-samples= 1658 ns/sample *RESULT wait_time_signaler: multithread-1000-samples= 4327 ns/sample [ OK ] WaitableEventPerfTest.MultipleThreads (9 ms) [ RUN ] WaitableEventPerfTest.Throughput *RESULT counts: throughput= 292588 signals *RESULT signal_time: throughput= 776 ns/sample *RESULT wait_time: throughput= 1944 ns/sample [ OK ] WaitableEventPerfTest.Throughput (1003 ms) [----------] 3 tests from WaitableEventPerfTest (1013 ms total) [----------] 1 test from WaitableEventThreadPerfTest [ RUN ] WaitableEventThreadPerfTest.EventPingPong *RESULT task: 4_WaitableEvent_Threads_time = 4.24148 us/hop *RESULT task: 4_WaitableEvent_Threads_cpu = 4.54588 us/hop [ OK ] WaitableEventThreadPerfTest.EventPingPong (426 ms) [----------] 1 test from WaitableEventThreadPerfTest (426 ms total) [----------] Global test environment tear-down [==========] 4 tests from 2 test cases ran. (1439 ms total) [ PASSED ] 4 tests. Also have real-world impact data from https://chromeperf.appspot.com/group_report?rev=491860: - 17-245% increase on various media_perftests - 10-30% increase on blink_perf.canvas - 16-20% decrease on power.trivial_pages idle_wakeups_total! - 6-13% increase on blob_storage perf tests - 20% increase on media.tough_video_cases
,
Aug 24 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/61f09f37c6d4a16378cf7de73a94bf2562a73232 commit 61f09f37c6d4a16378cf7de73a94bf2562a73232 Author: Robert Sesek <rsesek@chromium.org> Date: Thu Aug 24 22:32:56 2017 [Mac] Implement WaitableEvent::WaitMany with a kqueue on 10.12+. The current implementation using Mach port sets may cause system instability on macOS 10.11+. The kqueue implementation does not work until 10.12, so this leaves 10.11 potentially still affected. 10.9, 10.10, 10.12, and 10.13 are okay after this change. Bug: 756102 Bug: 681167 Change-Id: I031437cf414c71863c59d0726f8fdf7e4810a365 Reviewed-on: https://chromium-review.googlesource.com/624188 Commit-Queue: Robert Sesek <rsesek@chromium.org> Reviewed-by: Mark Mentovai <mark@chromium.org> Cr-Commit-Position: refs/heads/master@{#497219} [modify] https://crrev.com/61f09f37c6d4a16378cf7de73a94bf2562a73232/base/synchronization/waitable_event_mac.cc
,
Aug 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5cd8eef498e28e1aacb9109645c001bbbfe1bc26 commit 5cd8eef498e28e1aacb9109645c001bbbfe1bc26 Author: Robert Sesek <rsesek@chromium.org> Date: Mon Aug 28 19:51:25 2017 [Mac] A potential fix for WaitableEvent::WaitMany on macOS 10.11. This avoids the creation of port sets, which could cause system instability. Instead, this uses libdispatch sources to watch the event ports. Bug: 756102 Bug: 681167 Change-Id: Ib6fe015802dbe988d740b8319dd4a92601f17b26 Reviewed-on: https://chromium-review.googlesource.com/634924 Commit-Queue: Robert Sesek <rsesek@chromium.org> Reviewed-by: Mark Mentovai <mark@chromium.org> Cr-Commit-Position: refs/heads/master@{#497850} [modify] https://crrev.com/5cd8eef498e28e1aacb9109645c001bbbfe1bc26/base/synchronization/waitable_event_mac.cc
,
Oct 17
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by roc...@chromium.org
, Jan 14 2017