Use MessagePumpDefault more places on Mac |
||||
Issue descriptionIn issue 681167, I found base::WaitableEvent to be slow on Mac, and so I rewrote it for some considerable performance improvements. One of the performance metrics that was examined was //ipc:ipc_perftests. While the improvements to WaitableEvent were noticeable on //base:base_perftests micro-benchmarks, improvement did not show up in ipc_perftests. When I investigated further in Instruments, I noticed that the bulk of CPU time was dominated by the internals of CFRunLoop. As part of issue 356804 , specifically commit 3669d7d8f64445139731e73da773dbd2d4a81afb, the factory function MessageLoop::CreateMessagePumpForType() for TYPE_DEFAULT was changed from using MessagePumpDefault (a small wrapper around WaitableEvent) to MessagePumpCFRunLoop. The goal of this was to enable timer coalescing on the timers used for scheduling the delayed work source, to reduce the number of idle wakeups. It is not clear from the bug how significant this change actually is to power consumption (it was and still is hard to measure, but issue 388360 indicates there is some performance impact). The use of timers in MessagePumpMac seems to be ultimately unnecessary. The MessageLoop itself has no mechanism by which it schedules or arranges timers on behalf of something like base::Timer -- those are merely delayed posted tasks. The timer used by MessagePumpMac is set to fire at the next_delayed_work_time specified by MessagePump::Delegate::DoDelayedWork() (which is the MessageLoop) or MessagePump::ScheduleDelayedWork(). An alternative to this would be to call CFRunLoopRunInMode() or -[NSRunLoop runMode:beforeDate:] with the next computed work time, rather than always invoking with distantFuture. It has already been observed and somewhat mitigated in issue 511406 that scheduling CFTimers in a run loop is expensive, and this could completely eliminate that. This would remove the opportunity to use timer coalescing, though. In addition, doing this would change the relative priority of work sources in the CFRunLoop. That alternative would still leave MessageLoop::TYPE_DEFAULT using a CFRunLoop implementation, which as noted above is rather heavyweight. Internally there are multiple pthread_mutex_t's that must each be acquired: every run loop object, mode object/array, and source/observer/timer object has its own lock that must be taken before scheduling or invoking the work source callout. This lock acquisition and the re-shuffling of the internal sources arrays are what dominate the ipc_perftests benchmarks. It would be ideal if we could avoid CFRunLoop wherever it is not necessary because it is so heavyweight. Reverting 3669d7d8f644 and using MessagePumpDefault for MessageLoop::TYPE_DEFAULT would certainly do this, and it would now have the benefit of the performance improvements made to WaitableEvent. However if we assume that timer coalescing is beneficial, then that could lead to a power or idle wakeup regression (assuming that switching to a lighter-weight MessagePump wouldn't cause it to be a wash or net-beneficial). Yet there is still a way to both avoid using CFRunLoop and to take advantage of timer coalescing: Internally, CFRunLoopTimerSetTolerance() sets the leeway on a DISPATCH_SOURCE_TYPE_TIMER using _dispatch_source_set_runloop_timer_4CF(). Libdispatch eventually sets this tolerance value on a kevent EV_TIMER using the NOTE_LEEWAY filter flag. It would be relatively trivial to implement a new MessagePumpKqueue that had overhead equivalent to MessagePumpDefault, but with the potential to do timer coalescing. The main difference would be rather than using an explicit timeout to kevent64() when waiting for events, an EV_TIMER event would be placed on the kqueue() to trigger delayed work with leeway. But it may not be actually necessary to write MessagePumpKqueue. Instead, it may be enough to simply lower the scheduler priority of the thread and/or task (in the case of renderer processes). Ultimately timers and other event sinks get represented by a wait queue in the kernel, and the wait queue priority classes are shared amongst both those types of waitable objects (see 10.12/xnu-3789.1.32/osfmk/kern/kern_types.h wait_timeout_urgency_t). The effective priority class is hierarchical from task > thread > wait queue object, meaning that if an entire task or thread has priority of background, then all of its timers by default should too. The main difference would be that MessagePumpMac specifies an explicit timer leeway -- how much time the deadline can be shifted -- rather than setting a priority and letting the system schedule it. I propose that we switch back to using MessagePumpDefault for MessageLoop::TYPE_DEFAULT in an A/B experiment. We can then examine metrics (probably locally using Activity Monitor's "Idle Wakeups" metric) to determine if doing so has caused a performance regression. If not, then we're done -- the TYPE_UI loop obviously has to be NSApplication/CFRunLoop-backed, and TYPE_IO will continue to be backed by MessagePumpLibevent. But if a regression is observed, we can implement MessagePumpKqueue to recreate the timer leeway coalescing done by MessagePumpMac.
,
Sep 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d5ebd4bfd14f9797c64251bf4ec7aa4147ef7839 commit d5ebd4bfd14f9797c64251bf4ec7aa4147ef7839 Author: Robert Sesek <rsesek@chromium.org> Date: Fri Sep 22 23:09:07 2017 Implement MessagePumpDefault::SetTimerSlack on Mac. Bug: 753544 Change-Id: If64adc488f637b9970f11ef5c0b32aab906a84e0 Reviewed-on: https://chromium-review.googlesource.com/615445 Reviewed-by: Mark Mentovai <mark@chromium.org> Commit-Queue: Robert Sesek <rsesek@chromium.org> Cr-Commit-Position: refs/heads/master@{#503899} [modify] https://crrev.com/d5ebd4bfd14f9797c64251bf4ec7aa4147ef7839/base/message_loop/message_pump_default.cc [modify] https://crrev.com/d5ebd4bfd14f9797c64251bf4ec7aa4147ef7839/base/message_loop/message_pump_default.h
,
Sep 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9a461b08c8ee79386bb1ee2541cdcd017d47702c commit 9a461b08c8ee79386bb1ee2541cdcd017d47702c Author: Robert Sesek <rsesek@chromium.org> Date: Tue Sep 26 19:09:25 2017 [Mac] Ensure the MIDI task runners are backed by CFRunLoop. Currently the code assumes that a TYPE_DEFAULT MesssageLoop will be backed by CFRunLoop, which is required to receive the CoreMIDI callbacks. In the future, that assumption may not be true, so explicitly request a TYPE_UI (CFRunLoop) pump. Bug: 753544 Change-Id: I5464e1299d8cb99d6fbf37d8b342417487117744 Reviewed-on: https://chromium-review.googlesource.com/682834 Commit-Queue: Robert Sesek <rsesek@chromium.org> Reviewed-by: Takashi Toyoshima <toyoshim@chromium.org> Cr-Commit-Position: refs/heads/master@{#504448} [modify] https://crrev.com/9a461b08c8ee79386bb1ee2541cdcd017d47702c/media/midi/task_service.cc
,
Jan 5 2018
,
Jan 5 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6a0c375d891af03c4963dfaeb7597774b8f658ae commit 6a0c375d891af03c4963dfaeb7597774b8f658ae Author: Robert Sesek <rsesek@chromium.org> Date: Fri Jan 05 23:47:13 2018 [Mac] Use LATENCY_QOS_TIER_3 instead of 5 for implementing timer slack. From running experiments on the CQ, using LATENCY_QOS_TIER_5 results in test timeouts for threads that enable MessagePump::SetTimerSlack(). At TIER_3, all tests pass. Bug: 753544 Change-Id: Idc3154f803a6d3c333976adda09139b474c7cbbb Reviewed-on: https://chromium-review.googlesource.com/853037 Reviewed-by: Mark Mentovai <mark@chromium.org> Commit-Queue: Robert Sesek <rsesek@chromium.org> Cr-Commit-Position: refs/heads/master@{#527430} [modify] https://crrev.com/6a0c375d891af03c4963dfaeb7597774b8f658ae/base/message_loop/message_pump_default.cc
,
Feb 7 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8e6fc5fef9b85c3fb7888e57eab9bf343ec83cf6 commit 8e6fc5fef9b85c3fb7888e57eab9bf343ec83cf6 Author: Robert Sesek <rsesek@chromium.org> Date: Wed Feb 07 23:32:39 2018 [Mac] Use MessagePumpDefault for MessageLoop::TYPE_DEFAULT. Previously a MessagePumpCFRunLoop was created for TYPE_DEFAULT MessageLoops, in order to support timer coalescing. However CFRunLoop can be inefficient and MessagePumpDefault (a thin wrapper around base::WaitableEvent) is faster. Timer coalescing can be replicated by reducing the thread or task priority. See the bug for more details. Bug: 753544 Change-Id: I5f6ae083a5f7f02c16d34283b2946fbedfbd7fc2 Reviewed-on: https://chromium-review.googlesource.com/681850 Reviewed-by: Mark Mentovai <mark@chromium.org> Commit-Queue: Robert Sesek <rsesek@chromium.org> Cr-Commit-Position: refs/heads/master@{#535188} [modify] https://crrev.com/8e6fc5fef9b85c3fb7888e57eab9bf343ec83cf6/base/message_loop/message_loop.cc
,
Feb 17 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ff4fe2c4f799b5c0272eb65daaa1f35877460f17 commit ff4fe2c4f799b5c0272eb65daaa1f35877460f17 Author: Robert Sesek <rsesek@chromium.org> Date: Sat Feb 17 01:11:30 2018 [Mac] Force a MessageLoop::TYPE_UI for the GPU thread in --single-process. A TYPE_DEFAULT MessageLoop was previously backed by a native MessagePump, but switched to a MessagePumpDefault in 8e6fc5fef9b85c3fb7888e57eab9bf343ec83cf6. Bug: 812572 , 753544 Change-Id: I49c53766067fb69642d5835668c066861d21803f Reviewed-on: https://chromium-review.googlesource.com/923584 Reviewed-by: Antoine Labour <piman@chromium.org> Commit-Queue: Robert Sesek <rsesek@chromium.org> Cr-Commit-Position: refs/heads/master@{#537490} [modify] https://crrev.com/ff4fe2c4f799b5c0272eb65daaa1f35877460f17/content/browser/gpu/gpu_process_host.cc
,
Mar 20 2018
We now have a decent amount of dev channel data with this baking: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=ec060964341f52952a0b222b9a196667 There's a definite downward trend on: PerformanceMonitor.AverageCPU.RendererProcess PerformanceMonitor.IdleWakeups.BrowserProcess PerformanceMonitor.IdleWakeups.RendererProcess ... which correspond to the first dev channel release with the change in it. The perf bots did pick up two regressions: issue 811335 and issue 811445 . Though when looking at the group report for the revision https://chromeperf.appspot.com/group_report?rev=535188 there's improvement on the power benchmarks, which is the desired effect of this change along with the idle wakeups. I believe the performance regressions are caused by one of the several "timer_slack = TIMER_SLACK_MAXIMUM" options specified on various threads. I'll keep this open until M66 goes to stable, but this is looking pretty good.
,
Apr 9 2018
Still look good on M66 beta: significant reduction at the 50th %ile for the metrics listed in #8.
,
May 8 2018
Metrics still hold with M66 stable. |
||||
►
Sign in to add a comment |
||||
Comment 1 by rsesek@chromium.org
, Aug 21 2017