New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 753544 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug

Blocking:
issue 799602



Sign in to add a comment

Use MessagePumpDefault more places on Mac

Project Member Reported by rsesek@chromium.org, Aug 8 2017

Issue description

In issue 681167, I found base::WaitableEvent to be slow on Mac, and so I rewrote it for some considerable performance improvements. One of the performance metrics that was examined was //ipc:ipc_perftests. While the improvements to WaitableEvent were noticeable on //base:base_perftests micro-benchmarks, improvement did not show up in ipc_perftests. When I investigated further in Instruments, I noticed that the bulk of CPU time was dominated by the internals of CFRunLoop.

As part of  issue 356804 , specifically commit 3669d7d8f64445139731e73da773dbd2d4a81afb, the factory function MessageLoop::CreateMessagePumpForType() for TYPE_DEFAULT was changed from using MessagePumpDefault (a small wrapper around WaitableEvent) to MessagePumpCFRunLoop. The goal of this was to enable timer coalescing on the timers used for scheduling the delayed work source, to reduce the number of idle wakeups. It is not clear from the bug how significant this change actually is to power consumption (it was and still is hard to measure, but  issue 388360  indicates there is some performance impact).

The use of timers in MessagePumpMac seems to be ultimately unnecessary. The MessageLoop itself has no mechanism by which it schedules or arranges timers on behalf of something like base::Timer -- those are merely delayed posted tasks. The timer used by MessagePumpMac is set to fire at the next_delayed_work_time specified by MessagePump::Delegate::DoDelayedWork() (which is the MessageLoop) or MessagePump::ScheduleDelayedWork().

An alternative to this would be to call CFRunLoopRunInMode() or -[NSRunLoop runMode:beforeDate:] with the next computed work time, rather than always invoking with distantFuture. It has already been observed and somewhat mitigated in  issue 511406  that scheduling CFTimers in a run loop is expensive, and this could completely eliminate that. This would remove the opportunity to use timer coalescing, though. In addition, doing this would change the relative priority of work sources in the CFRunLoop.

That alternative would still leave MessageLoop::TYPE_DEFAULT using a CFRunLoop implementation, which as noted above is rather heavyweight. Internally there are multiple pthread_mutex_t's that must each be acquired: every run loop object, mode object/array, and source/observer/timer object has its own lock that must be taken before scheduling or invoking the work source callout. This lock acquisition and the re-shuffling of the internal sources arrays are what dominate the ipc_perftests benchmarks. It would be ideal if we could avoid CFRunLoop wherever it is not necessary because it is so heavyweight. Reverting 3669d7d8f644 and using MessagePumpDefault for MessageLoop::TYPE_DEFAULT would certainly do this, and it would now have the benefit of the performance improvements made to WaitableEvent.

However if we assume that timer coalescing is beneficial, then that could lead to a power or idle wakeup regression (assuming that switching to a lighter-weight MessagePump wouldn't cause it to be a wash or net-beneficial). Yet there is still a way to both avoid using CFRunLoop and to take advantage of timer coalescing: Internally, CFRunLoopTimerSetTolerance() sets the leeway on a DISPATCH_SOURCE_TYPE_TIMER using _dispatch_source_set_runloop_timer_4CF(). Libdispatch eventually sets this tolerance value on a kevent EV_TIMER using the NOTE_LEEWAY filter flag. It would be relatively trivial to implement a new MessagePumpKqueue that had overhead equivalent to MessagePumpDefault, but with the potential to do timer coalescing. The main difference would be rather than using an explicit timeout to kevent64() when waiting for events, an EV_TIMER event would be placed on the kqueue() to trigger delayed work with leeway.

But it may not be actually necessary to write MessagePumpKqueue. Instead, it may be enough to simply lower the scheduler priority of the thread and/or task (in the case of renderer processes). Ultimately timers and other event sinks get represented by a wait queue in the kernel, and the wait queue priority classes are shared amongst both those types of waitable objects (see 10.12/xnu-3789.1.32/osfmk/kern/kern_types.h wait_timeout_urgency_t). The effective priority class is hierarchical from task > thread > wait queue object, meaning that if an entire task or thread has priority of background, then all of its timers by default should too. The main difference would be that MessagePumpMac specifies an explicit timer leeway -- how much time the deadline can be shifted -- rather than setting a priority and letting the system schedule it.

I propose that we switch back to using MessagePumpDefault for MessageLoop::TYPE_DEFAULT in an A/B experiment. We can then examine metrics (probably locally using Activity Monitor's "Idle Wakeups" metric) to determine if doing so has caused a performance regression. If not, then we're done -- the TYPE_UI loop obviously has to be NSApplication/CFRunLoop-backed, and TYPE_IO will continue to be backed by MessagePumpLibevent. But if a regression is observed, we can implement MessagePumpKqueue to recreate the timer leeway coalescing done by MessagePumpMac.

 

Comment 1 by rsesek@chromium.org, Aug 21 2017

Cc: robliao@chromium.org
Project Member

Comment 2 by bugdroid1@chromium.org, Sep 22 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d5ebd4bfd14f9797c64251bf4ec7aa4147ef7839

commit d5ebd4bfd14f9797c64251bf4ec7aa4147ef7839
Author: Robert Sesek <rsesek@chromium.org>
Date: Fri Sep 22 23:09:07 2017

Implement MessagePumpDefault::SetTimerSlack on Mac.

Bug:  753544 
Change-Id: If64adc488f637b9970f11ef5c0b32aab906a84e0
Reviewed-on: https://chromium-review.googlesource.com/615445
Reviewed-by: Mark Mentovai <mark@chromium.org>
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Cr-Commit-Position: refs/heads/master@{#503899}
[modify] https://crrev.com/d5ebd4bfd14f9797c64251bf4ec7aa4147ef7839/base/message_loop/message_pump_default.cc
[modify] https://crrev.com/d5ebd4bfd14f9797c64251bf4ec7aa4147ef7839/base/message_loop/message_pump_default.h

Project Member

Comment 3 by bugdroid1@chromium.org, Sep 26 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9a461b08c8ee79386bb1ee2541cdcd017d47702c

commit 9a461b08c8ee79386bb1ee2541cdcd017d47702c
Author: Robert Sesek <rsesek@chromium.org>
Date: Tue Sep 26 19:09:25 2017

[Mac] Ensure the MIDI task runners are backed by CFRunLoop.

Currently the code assumes that a TYPE_DEFAULT MesssageLoop will be backed by
CFRunLoop, which is required to receive the CoreMIDI callbacks. In the future,
that assumption may not be true, so explicitly request a TYPE_UI (CFRunLoop)
pump.

Bug:  753544 
Change-Id: I5464e1299d8cb99d6fbf37d8b342417487117744
Reviewed-on: https://chromium-review.googlesource.com/682834
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Reviewed-by: Takashi Toyoshima <toyoshim@chromium.org>
Cr-Commit-Position: refs/heads/master@{#504448}
[modify] https://crrev.com/9a461b08c8ee79386bb1ee2541cdcd017d47702c/media/midi/task_service.cc

Blocking: 799602
Project Member

Comment 5 by bugdroid1@chromium.org, Jan 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6a0c375d891af03c4963dfaeb7597774b8f658ae

commit 6a0c375d891af03c4963dfaeb7597774b8f658ae
Author: Robert Sesek <rsesek@chromium.org>
Date: Fri Jan 05 23:47:13 2018

[Mac] Use LATENCY_QOS_TIER_3 instead of 5 for implementing timer slack.

From running experiments on the CQ, using LATENCY_QOS_TIER_5 results in test
timeouts for threads that enable MessagePump::SetTimerSlack(). At TIER_3, all
tests pass.

Bug:  753544 
Change-Id: Idc3154f803a6d3c333976adda09139b474c7cbbb
Reviewed-on: https://chromium-review.googlesource.com/853037
Reviewed-by: Mark Mentovai <mark@chromium.org>
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Cr-Commit-Position: refs/heads/master@{#527430}
[modify] https://crrev.com/6a0c375d891af03c4963dfaeb7597774b8f658ae/base/message_loop/message_pump_default.cc

Project Member

Comment 6 by bugdroid1@chromium.org, Feb 7 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8e6fc5fef9b85c3fb7888e57eab9bf343ec83cf6

commit 8e6fc5fef9b85c3fb7888e57eab9bf343ec83cf6
Author: Robert Sesek <rsesek@chromium.org>
Date: Wed Feb 07 23:32:39 2018

[Mac] Use MessagePumpDefault for MessageLoop::TYPE_DEFAULT.

Previously a MessagePumpCFRunLoop was created for TYPE_DEFAULT MessageLoops, in
order to support timer coalescing. However CFRunLoop can be inefficient and
MessagePumpDefault (a thin wrapper around base::WaitableEvent) is faster. Timer
coalescing can be replicated by reducing the thread or task priority. See the
bug for more details.

Bug:  753544 
Change-Id: I5f6ae083a5f7f02c16d34283b2946fbedfbd7fc2
Reviewed-on: https://chromium-review.googlesource.com/681850
Reviewed-by: Mark Mentovai <mark@chromium.org>
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Cr-Commit-Position: refs/heads/master@{#535188}
[modify] https://crrev.com/8e6fc5fef9b85c3fb7888e57eab9bf343ec83cf6/base/message_loop/message_loop.cc

Project Member

Comment 7 by bugdroid1@chromium.org, Feb 17 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ff4fe2c4f799b5c0272eb65daaa1f35877460f17

commit ff4fe2c4f799b5c0272eb65daaa1f35877460f17
Author: Robert Sesek <rsesek@chromium.org>
Date: Sat Feb 17 01:11:30 2018

[Mac] Force a MessageLoop::TYPE_UI for the GPU thread in --single-process.

A TYPE_DEFAULT MessageLoop was previously backed by a native MessagePump, but
switched to a MessagePumpDefault in 8e6fc5fef9b85c3fb7888e57eab9bf343ec83cf6.

Bug:  812572 ,  753544 
Change-Id: I49c53766067fb69642d5835668c066861d21803f
Reviewed-on: https://chromium-review.googlesource.com/923584
Reviewed-by: Antoine Labour <piman@chromium.org>
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Cr-Commit-Position: refs/heads/master@{#537490}
[modify] https://crrev.com/ff4fe2c4f799b5c0272eb65daaa1f35877460f17/content/browser/gpu/gpu_process_host.cc

Comment 8 by rsesek@chromium.org, Mar 20 2018

Labels: M-66
We now have a decent amount of dev channel data with this baking: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=ec060964341f52952a0b222b9a196667

There's a definite downward trend on:

PerformanceMonitor.AverageCPU.RendererProcess
PerformanceMonitor.IdleWakeups.BrowserProcess
PerformanceMonitor.IdleWakeups.RendererProcess

... which correspond to the first dev channel release with the change in it.

The perf bots did pick up two regressions:  issue 811335  and  issue 811445 . Though when looking at the group report for the revision https://chromeperf.appspot.com/group_report?rev=535188 there's improvement on the power benchmarks, which is the desired effect of this change along with the idle wakeups.

I believe the performance regressions are caused by one of the several "timer_slack = TIMER_SLACK_MAXIMUM" options specified on various threads.

I'll keep this open until M66 goes to stable, but this is looking pretty good.
Still look good on M66 beta: significant reduction at the 50th %ile for the metrics listed in #8.
Status: Fixed (was: Assigned)
Metrics still hold with M66 stable.

Sign in to add a comment