New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 874237 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows , Mac
Pri: 2
Type: Bug

Blocked on:
issue 885371
issue 890978

Blocking:
issue 786597



Sign in to add a comment

5-18% of non-idle time on IO thread is spent in MessageLoop::ScheduleWork

Project Member Reported by gab@chromium.org, Aug 14

Issue description

Chrome Version: 70.0.3520.0 (and all prior)
OS: All

Sampling profiler allows us to see that we spend a *lot* of time in MessageLoop::ScheduleWork (which inlines to MessagePump::ScheduleWork and differs per platform), in particular on the IO thread:

Mac OSX:
 * Startup : 8.5% (351ms) : https://uma.googleplex.com/p/chrome/callstacks/?sid=ac3086581d87e7e341618d95e669dfb4
 * Periodic : 18% : https://uma.googleplex.com/p/chrome/callstacks/?sid=c73b9e98ff098320e5035a8d81443ca7

Windows:
 * Startup : 6.3% (211ms) : https://uma.googleplex.com/p/chrome/callstacks/?sid=98a95f9298deda07b304d0cdcdf0bb55
 * Periodic : 5% : https://uma.googleplex.com/p/chrome/callstacks/?sid=ab2637dbd934ef05b67db120f323dfd9

It looks like Mac (CFRunLoop / MessagePumpLibEvent) is slower than Windows.

It also looks like some of the contention (but not the majority) is on MessageLoop::Controller::message_loop_lock_.

This is surprising because there's already logic in place to only grab the lock and ScheduleWork() if the poster is the one that made the queue non-empty (so there shouldn't be contention here unless the destination somehow wakes up and processes the task before the ScheduleWork returns...).

Maybe posting to self when queue is empty is problematic? i.e. ScheduleWork on self while already running because it happened to make the queue non-empty (but we hadn't yet told our pump we were out of work..?)... (would explain IO->IO slowness).

If so, then perhaps posting to another thread which is already running (but happens to have an empty queue while processing its last task) and scheduling it is problematic as well (would explain the IO->UI half of the slowness).

Still not clear why Mac is hurt by this more than Windows... OS specifics I guess.

This is a big chunk of issue 786597 (this issue downgrades that one to a 1-2% issue instead of the full 19% issue it was)
 
Cc: wittman@chromium.org
Cc: kraynov@chromium.org
Cc: brucedaw...@chromium.org
Ok so I've dug into the majority of the use cases where this is happening.

While I think there are a few things we could improve, I'm not convinced that's the problem.

I think there might be a bigger sampling problem here, e.g. if signaling the OS results in yielding to the signaled thread (we suspect some kernels do this), it also makes it more likely that the sampling profiler thread will wake up while the signalee is still on its "send signal" stack.

Case in point, in all of the Mac cases, the top of the stack is a kernel symbol "mach_msg_trap". And similarly on Windows it's often NtUserPostMessage or NtSetIoCompletion.

Since the sampling profiler doesn't really capture "time spent on a stack" but rather "frequency with which a given stack is seen", if signaling another thread results in the current thread being preempted in favor of the signaled thread, then the signalee spends more unscheduled time in the "just signaled" state and is more likely to be seen in that state by the sampling profiler (even more so since, by relinquishing its quanta, it allows the sampling profiler thread itself to be scheduled and see it in that state).

Not sure how to diagnose this... I tried grabbing an ETW system trace of my browser at run time and it indeed has a non-negligible amount of samples in the ScheduleWork() stacks (out of a 10000 samples (ms) trace), see below. If I understand the kernel correctly, it seems the signalee is indeed descheduled after delivering the APC (KiDeliverApc) and it causes ETW to capture stack walks as well? (is ETW also susceptible to the aforementioned sampling issue..?)

+brucedawson for Windows/ETW expertise. ETW trace is here but can be easily reproduced locally (normal browsing) : https://drive.google.com/open?id=1MiQEDQ-QFhOsdtYMZvkb-9Xh5hUI92YM

BrowserThread::IO caught in MessagePumpForUI::ScheduleWork in 293 samples (2.9%)

Line #, Process, Thread ID, Stack, Count, Weight (in view) (ms)
3, , ,   ntdll.dll!RtlUserThreadStart, 350, 350.543886
4, , ,   kernel32.dll!BaseThreadInitThunk, 350, 350.543886
5, , ,   chrome.dll!base::`anonymous namespace'::ThreadFunc, 350, 350.543886
6, , ,   chrome.dll!base::Thread::ThreadMain, 350, 350.543886
7, , ,   chrome.dll!content::BrowserProcessSubThread::IOThreadRun, 350, 350.543886
8, , ,   chrome.dll!base::RunLoop::Run, 350, 350.543886
9, , ,   chrome.dll!base::MessagePumpWin::Run, 350, 350.543886
10, , ,   chrome.dll!base::MessagePumpForIO::DoRunLoop, 350, 350.543886
11, , ,   |- chrome.dll!base::MessagePumpForIO::WaitForIOCompletion, 310, 310.533005
12, , ,   |    chrome.dll!mojo::core::`anonymous namespace'::ChannelWin::OnIOCompleted, 310, 310.533005
13, , ,   |    chrome.dll!mojo::core::Channel::OnReadComplete, 310, 310.533005
14, , ,   |    chrome.dll!mojo::core::NodeChannel::OnChannelMessage, 310, 310.533005
15, , ,   |    chrome.dll!mojo::core::RequestContext::~RequestContext, 310, 310.533005
16, , ,   |    |- chrome.dll!mojo::core::Watch::InvokeCallback, 296, 296.458517
17, , ,   |    |    chrome.dll!mojo::core::WatcherDispatcher::InvokeWatchCallback, 296, 296.458517
18, , ,   |    |    chrome.dll!mojo::SimpleWatcher::Context::CallNotify, 296, 296.458517
19, , ,   |    |    chrome.dll!mojo::SimpleWatcher::Context::Notify, 296, 296.458517
20, , ,   |    |    |- chrome.dll!base::TaskRunner::PostTask, 293, 293.443289
21, , ,   |    |    |    chrome.dll!base::internal::MessageLoopTaskRunner::PostDelayedTask, 293, 293.443289
22, , ,   |    |    |    chrome.dll!base::internal::MessageLoopTaskRunner::AddToIncomingQueue, 293, 293.443289
23, , ,   |    |    |    chrome.dll!base::MessageLoop::Controller::DidQueueTask, 293, 293.443289
24, , ,   |    |    |    chrome.dll!base::MessagePumpForUI::ScheduleWork, 293, 293.443289
25, , ,   |    |    |    |- user32.dll!PostMessageW, 291, 291.418937
26, , ,   |    |    |    |    |- win32u.dll!ZwUserPostMessage, 287, 287.418937
27, , ,   |    |    |    |    |    |- ntoskrnl.exe!KiSystemServiceCopyEnd, 268, 268.327193
28, , ,   |    |    |    |    |    |    win32kfull.sys!NtUserPostMessage, 268, 268.327193
29, , ,   |    |    |    |    |    |    |- ntoskrnl.exe!ExReleaseResourceAndLeaveCriticalRegion, 146, 145.956290
30, , ,   |    |    |    |    |    |    |    |- ntoskrnl.exe!KiCheckForKernelApcDelivery, 132, 131.956290
31, , ,   |    |    |    |    |    |    |    |    ntoskrnl.exe!KiDeliverApc, 132, 131.956290
32, , ,   |    |    |    |    |    |    |    |    |- ntoskrnl.exe!EtwpStackWalkApc, 131, 130.956290
33, , ,   |    |    |    |    |    |    |    |    |    ntoskrnl.exe!EtwpTraceStackWalk, 131, 130.956290
34, , ,   |    |    |    |    |    |    |    |    |    |- ntoskrnl.exe!RtlWalkFrameChain, 128, 127.956290
35, , ,   |    |    |    |    |    |    |    |    |    |    |- ntoskrnl.exe!RtlpWalkFrameChain, 127, 126.956290
36, , ,   |    |    |    |    |    |    |    |    |    |    |    |- ntoskrnl.exe!RtlpLookupFunctionEntryForStackWalks, 83, 82.875410
37, , ,   |    |    |    |    |    |    |    |    |    |    |    |    |- ntoskrnl.exe!RtlpLookupFunctionEntryForStackWalks<itself>, 72, 71.851367
38, , ,   |    |    |    |    |    |    |    |    |    |    |    |    |- ntoskrnl.exe!RtlpLookupUserFunctionTable, 10, 10.024043
39, , ,   |    |    |    |    |    |    |    |    |    |    |    |    |- ntoskrnl.exe!KiDpcInterrupt, 1, 1.000000
40, , ,   |    |    |    |    |    |    |    |    |    |    |    |- ntoskrnl.exe!RtlpWalkFrameChain<itself>, 43, 43.080880
41, , ,   |    |    |    |    |    |    |    |    |    |    |    |- ntoskrnl.exe!RtlpGetStackLimits, 1, 1.000000
42, , ,   |    |    |    |    |    |    |    |    |    |    |- ntoskrnl.exe!MmCanThreadFault, 1, 1.000000
43, , ,   |    |    |    |    |    |    |    |    |    |- ntoskrnl.exe!EtwpLogKernelEvent, 2, 2.000000
44, , ,   |    |    |    |    |    |    |    |    |    |- ntoskrnl.exe!KeQueryCurrentStackInformation, 1, 1.000000
45, , ,   |    |    |    |    |    |    |    |    |- ntoskrnl.exe!KiDeliverApc<itself>, 1, 1.000000
46, , ,   |    |    |    |    |    |    |    |- ntoskrnl.exe!ExpReleaseResourceForThreadLite, 14, 14.000000
47, , ,   |    |    |    |    |    |    |- win32kfull.sys!_PostTransformableMessageExtended, 95, 95.316909
48, , ,   |    |    |    |    |    |    |- win32kbase.sys!EnterCrit, 17, 17.000022
49, , ,   |    |    |    |    |    |    |- win32kbase.sys!ValidateHwndEx, 8, 8.053972
50, , ,   |    |    |    |    |    |    |- win32kfull.sys!NtUserPostMessage<itself>, 1, 1.000000
51, , ,   |    |    |    |    |    |    |- win32kbase.sys!ValidateHwnd, 1, 1.000000

BrowserThread::UI caught in MessageLoopForIO::ScheduleWork() 64 times (0.6%)

Line #, Process, Thread ID, Stack, Count, Weight (in view) (ms)
3, , ,   ntdll.dll!RtlUserThreadStart, 143, 142.992344
4, , ,   kernel32.dll!BaseThreadInitThunk, 143, 142.992344
5, , ,   chrome.exe!__scrt_common_main_seh, 143, 142.992344
6, , ,   chrome.exe!wWinMain, 143, 142.992344
7, , ,   chrome.exe!MainDllLoader::Launch, 143, 142.992344
8, , ,   chrome.dll!ChromeMain, 143, 142.992344
9, , ,   chrome.dll!content::ContentMain, 143, 142.992344
10, , ,   chrome.dll!service_manager::Main, 143, 142.992344
11, , ,   chrome.dll!content::ContentMainRunnerImpl::Run, 143, 142.992344
12, , ,   chrome.dll!content::RunBrowserProcessMain, 143, 142.992344
13, , ,   chrome.dll!content::BrowserMain, 143, 142.992344
14, , ,   chrome.dll!content::BrowserMainRunnerImpl::Run, 143, 142.992344
15, , ,   chrome.dll!content::BrowserMainLoop::RunMainMessageLoopParts, 143, 142.992344
16, , ,   chrome.dll!ChromeBrowserMainParts::MainMessageLoopRun, 143, 142.992344
17, , ,   chrome.dll!base::RunLoop::Run, 143, 142.992344
18, , ,   chrome.dll!base::MessagePumpWin::Run, 143, 142.992344
19, , ,   chrome.dll!base::MessagePumpForUI::DoRunLoop, 143, 142.992344
20, , ,   |- chrome.dll!base::MessageLoop::DoWork, 133, 133.377316
21, , ,   |    chrome.dll!base::MessageLoop::RunTask, 133, 133.377316
22, , ,   |    chrome.dll!base::debug::TaskAnnotator::RunTask, 133, 133.377316
23, , ,   |    |- chrome.dll!cc::TileManager::FlushAndIssueSignals, 66, 66.048364
24, , ,   |    |    chrome.dll!gpu::GpuChannelHost::EnsureFlush, 66, 66.048364
25, , ,   |    |    chrome.dll!gpu::GpuChannelHost::InternalFlush, 66, 66.048364
26, , ,   |    |    chrome.dll!gpu::GpuChannelHost::Send, 66, 66.048364
27, , ,   |    |    chrome.dll!base::TaskRunner::PostTask, 66, 66.048364
28, , ,   |    |    chrome.dll!base::internal::MessageLoopTaskRunner::PostDelayedTask, 66, 66.048364
29, , ,   |    |    chrome.dll!base::internal::MessageLoopTaskRunner::AddToIncomingQueue, 66, 66.048364
30, , ,   |    |    chrome.dll!base::MessageLoop::Controller::DidQueueTask, 66, 66.048364
31, , ,   |    |    chrome.dll!base::MessagePumpForIO::ScheduleWork, 66, 66.048364
32, , ,   |    |    |- KernelBase.dll!PostQueuedCompletionStatus, 64, 64.012728
33, , ,   |    |    |    ntdll.dll!NtSetIoCompletion, 64, 64.012728
34, , ,   |    |    |    |- ntoskrnl.exe!KiSystemServiceCopyEnd, 63, 63.012728
35, , ,   |    |    |    |    ntoskrnl.exe!NtSetIoCompletion, 63, 63.012728
36, , ,   |    |    |    |    |- ntoskrnl.exe!IoSetIoCompletionEx2, 62, 61.969687
37, , ,   |    |    |    |    |    |- ntoskrnl.exe!KiExitDispatcher, 58, 57.893148
38, , ,   |    |    |    |    |    |    |- ntoskrnl.exe!KiApcInterrupt, 46, 45.890263
39, , ,   |    |    |    |    |    |    |    ntoskrnl.exe!KiDeliverApc, 46, 45.890263
40, , ,   |    |    |    |    |    |    |    |- ntoskrnl.exe!EtwpStackWalkApc, 45, 44.815134
41, , ,   |    |    |    |    |    |    |    |    ntoskrnl.exe!EtwpTraceStackWalk, 45, 44.815134
42, , ,   |    |    |    |    |    |    |    |    ntoskrnl.exe!RtlWalkFrameChain, 45, 44.815134
43, , ,   |    |    |    |    |    |    |    |    ntoskrnl.exe!RtlpWalkFrameChain, 45, 44.815134
44, , ,   |    |    |    |    |    |    |    |    |- ntoskrnl.exe!RtlpWalkFrameChain<itself>, 23, 22.868560
45, , ,   |    |    |    |    |    |    |    |    |- ntoskrnl.exe!RtlpLookupFunctionEntryForStackWalks, 22, 21.946574
46, , ,   |    |    |    |    |    |    |    |- ntoskrnl.exe!KiDeliverApc<itself>, 1, 1.075129
47, , ,   |    |    |    |    |    |    |- ntoskrnl.exe!EtwTraceReadyThread, 10, 10.002885
48, , ,   |    |    |    |    |    |    |- ntoskrnl.exe!KiDeferredReadyThread, 2, 2.000000
49, , ,   |    |    |    |    |    |- ntoskrnl.exe!KiWakeQueueWaiter, 2, 2.076539
50, , ,   |    |    |    |    |    |- ntoskrnl.exe!IopAllocateMiniCompletionPacket, 2, 2.000000
51, , ,   |    |    |    |    |- ntoskrnl.exe!ObReferenceObjectByHandle, 1, 1.043041

Posting from BrowserThread::IO -> BrowserThread::UI is definitely the worst MessagePump interaction according to both sampling profiler (Canary -- both Mac and Win) and ETW (local).
Any sampling profiler is potentially vulnerable to this type of issue. The sampler built in to Chrome is more vulnerable because it is waking up on the same timer interrupt that Chrome is using to wake up so it can easily end up with bias.

ETW's sampling uses a different interrupt and can be run at 8 kHz which should significantly avoid this. The other issue that you are seeing from the ETW traces is that when a thread wakes up a stack walk is done and this stack walk ends up being sampled. If there are a lot of context switches then this distorts the results. Luckily this is easily avoided: in UIforETW just uncheck context switch call stacks. If you are just looking at CPU Usage (sampled) then you can disable the context switch call stacks without losing anything (except the distortions they cause).
Here's a fast sampling ETW trace with "context switch call stacks" disabled : https://drive.google.com/file/d/1UvENzlsS_kf_U6uYB0uLmajBjMDMo0NJ/view?usp=sharing

If I'm reading the trace correctly, looks like BrowserThread::IO was captured for 8685 samples (chrome.dll!base::MessagePumpForIO::DoRunLoop), of which it was non-idle 6586 (chrome.dll!mojo::core::`anonymous namespace'::ChannelWin::OnIOCompleted) of which 576 were inside chrome.dll!base::MessagePumpForUI::ScheduleWork. That's still 8.7% of non-idle time on BrowserThread::IO spent in MessagePumpForUI::ScheduleWork.

@bruce could you confirm I'm doing this right? Thanks!
This might be real (i.e. not a sampling bias) and caused by kernel priority boosting, here are my thoughts  on that : https://groups.google.com/a/chromium.org/d/msg/scheduler-dev/jYA3SYZ1aDc/7lz1P-GvDQAJ
I like your very explicit trace renaming - thank you.

Ah, Windows 10 14393 - no thread names. I've started to get accustomed to having them and I definitely miss them now.

The dynamic priority boosting of readied threads can cause some problems - it can (especially if there are not enough cores for all threads that are ready to run) cause the thread that does the signaling to lose the CPU. If the signaling/readying thread loses the CPU then it will spend a long time in that function... but most of that time will not show up in CPU Usage (Sampled), because the samples are only recorded for the threads that are on CPU.

The context switches will cause some wasted time (storing/restoring registers, cache misses) but I wanted to clarify that the time the thread was not running would not be recorded.

Anyway, on to the trace. Over the 27.234 s of the trace I see 33,431 samples inside of chrome.dll!base::MessagePumpForIO::DoRunLoop on thread 7,736 of process 8996. There were also 19 samples in that function in thread 8,620 but let's ignore those.

There seems to be at least 101 threads in that Chrome process. That seems... excessive. Especially for a two-core two-thread machine. Is that normal for Chrome? It seems higher than I have noticed before.

I don't understand this comment.
> of which it was non-idle 6586

I then viewed callees of chrome.dll!base::MessagePumpForUI::ScheduleWork to make sure I found all instances of it and I found 513 samples on thread 7,736, 783 across all threads. I looked to see where time was spent inside ScheduleWork (that's why I used view callees instead of view callers) and notice that 112 samples were in win32kbase.sys!EtwTraceQueueMessage, so I guess has some overhead even with context switches turned off.


So, I'm seeing 513/33,431 samples on thread 7,736 that were inside of chrome.dll!base::MessagePumpForIO::DoRunLoop are also inside of chrome.dll!base::MessagePumpForUI::ScheduleWork. That's 1.53%, which is fairly low.

But, this may be the wrong question. 

The CPU Usage (Precise) data says that thread 7,736 used 4,425.225341 ms of CPU time during this trace. It also says that thread 7,736 was context switched in (and therefore also out) 29,897 times during the trace. That's 6.75 times per ms that the thread is running. That seems very high. The browser process as a whole did 70,563 context switches during that time.

The priority boost probably increases context switches to some extent, but maybe the root cause is too much communication between threads. I know that some messages take multiple context switches to send (if the message is larger than about 4,000 bytes, due to our pipe buffer sizes).

The context switch data allows exploring things such as which threads are communicating. In the attached screenshot you can see that 8996/7,736 was woken up 12,337 of its 29,897 times by the same process, mostly by threads 8,948 (main thread) and 6,748 (running on chrome.dll!base::win::ObjectWatcher::DoneWaiting)

Thread 6,748, BTW, did an impressive 4,804 context switches and only ran for a total of 71.97 ms, or 66.75 per ms of run-time.

Traces with context switch call stacks allow a better understanding of things like where threads are context switching and where they are readying others.


Tricky stuff. I would agree that if we can reduce context switches we can probably improve performance. Turning off the priority boost is one way (although less relevant on many-core machines) but there may be some other options worth exploring.

Context switches.PNG
47.7 KB View Download
Components: -Internals
Blockedon: 885371
Blockedon: 890978
Project Member

Comment 12 by bugdroid1@chromium.org, Oct 22

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f7da13a28c3799819bd39c4d95dc02a805b27e4f

commit f7da13a28c3799819bd39c4d95dc02a805b27e4f
Author: Gabriel Charette <gab@chromium.org>
Date: Mon Oct 22 16:07:49 2018

[MessageLoop] Lock-free ScheduleWork() scheme

The Lock is causing hangs because of priority inversion
mixed with priority boosting (ScheduleWork() tends to
boost the destination thread which may deschedule the
posting thread; if the posting thread is a background
thread this boost-induded-desched-while-holding-lock
can cause a livelock). See  https://crbug.com/890978#c10 
for example crashes catching this.

The Lock was only necessary for startup/shutdown and is
being replaced by a lock-free atomic scheme in this CL.

MessagePump::ScheduleWork() itself was already thread-safe
(but the Android impl did unnecessarily check a non-atomic bool)

This adds a WaitableEvent in ~MessageLoop(); hence the requirement
for a wait-allowance in net's EmbeddedTestServer.

TBR=zhongyi@chromium.org (embedded_test_server.cc side-effects)

Bug:  890978 , 874237
Change-Id: I0916e5a99035a935b0a23a770af256f334e78c43
Reviewed-on: https://chromium-review.googlesource.com/c/1278631
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#601600}
[modify] https://crrev.com/f7da13a28c3799819bd39c4d95dc02a805b27e4f/base/message_loop/message_loop.cc
[modify] https://crrev.com/f7da13a28c3799819bd39c4d95dc02a805b27e4f/base/message_loop/message_loop.h
[modify] https://crrev.com/f7da13a28c3799819bd39c4d95dc02a805b27e4f/base/message_loop/message_pump.h
[modify] https://crrev.com/f7da13a28c3799819bd39c4d95dc02a805b27e4f/base/message_loop/message_pump_android.cc
[modify] https://crrev.com/f7da13a28c3799819bd39c4d95dc02a805b27e4f/net/test/embedded_test_server/embedded_test_server.cc

Project Member

Comment 13 by bugdroid1@chromium.org, Oct 22

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/243f975c09f36ff7e90f43a530e64b31aac641e9

commit 243f975c09f36ff7e90f43a530e64b31aac641e9
Author: Yi Gu <yigu@chromium.org>
Date: Mon Oct 22 21:34:13 2018

Revert "[MessageLoop] Lock-free ScheduleWork() scheme"

This reverts commit f7da13a28c3799819bd39c4d95dc02a805b27e4f.

Reason for revert: As per comments by Findit

Original change's description:
> [MessageLoop] Lock-free ScheduleWork() scheme
> 
> The Lock is causing hangs because of priority inversion
> mixed with priority boosting (ScheduleWork() tends to
> boost the destination thread which may deschedule the
> posting thread; if the posting thread is a background
> thread this boost-induded-desched-while-holding-lock
> can cause a livelock). See  https://crbug.com/890978#c10 
> for example crashes catching this.
> 
> The Lock was only necessary for startup/shutdown and is
> being replaced by a lock-free atomic scheme in this CL.
> 
> MessagePump::ScheduleWork() itself was already thread-safe
> (but the Android impl did unnecessarily check a non-atomic bool)
> 
> This adds a WaitableEvent in ~MessageLoop(); hence the requirement
> for a wait-allowance in net's EmbeddedTestServer.
> 
> TBR=zhongyi@chromium.org (embedded_test_server.cc side-effects)
> 
> Bug:  890978 , 874237
> Change-Id: I0916e5a99035a935b0a23a770af256f334e78c43
> Reviewed-on: https://chromium-review.googlesource.com/c/1278631
> Commit-Queue: Gabriel Charette <gab@chromium.org>
> Reviewed-by: François Doray <fdoray@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#601600}

TBR=gab@chromium.org,fdoray@chromium.org,zhongyi@chromium.org

Change-Id: I521ef6ba2758f84d13a4f98d65cb41b276cb115e
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  890978 , 874237
Reviewed-on: https://chromium-review.googlesource.com/c/1294717
Reviewed-by: Yi Gu <yigu@chromium.org>
Commit-Queue: Yi Gu <yigu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#601731}
[modify] https://crrev.com/243f975c09f36ff7e90f43a530e64b31aac641e9/base/message_loop/message_loop.cc
[modify] https://crrev.com/243f975c09f36ff7e90f43a530e64b31aac641e9/base/message_loop/message_loop.h
[modify] https://crrev.com/243f975c09f36ff7e90f43a530e64b31aac641e9/base/message_loop/message_pump.h
[modify] https://crrev.com/243f975c09f36ff7e90f43a530e64b31aac641e9/base/message_loop/message_pump_android.cc
[modify] https://crrev.com/243f975c09f36ff7e90f43a530e64b31aac641e9/net/test/embedded_test_server/embedded_test_server.cc

Project Member

Comment 14 by bugdroid1@chromium.org, Oct 23

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c8a1dc548deacfd28e32e5f1c54b02ec59dfc911

commit c8a1dc548deacfd28e32e5f1c54b02ec59dfc911
Author: Gabriel Charette <gab@chromium.org>
Date: Tue Oct 23 23:10:55 2018

Reland "[MessageLoop] Lock-free ScheduleWork() scheme"

This is a reland of f7da13a28c3799819bd39c4d95dc02a805b27e4f
(original change in PS1)

The original change was missing acquire ordering in
DisconnectFromParent(). This is necessary in order for
the disconnecting thread to see all memory side-effects
previously made by other threads (or some side-effects
of message_loop_->ScheduleWork() could racily come in
after ~MessageLoop()).

Also removed the DCHECK that
|operations_state_ == kDisconnectedBit| at the end of
DisconnectFromParent() as it was incorrect. A racy
BeforeOperation() call can make it 1->3 (and no-op) after
DisconnectFromParent() made it 0->1.

And lastly, added a scoped allowance to always allow the
very fast wait instead of requiring callers to know about
this implementation detail of MessageLoop (and reverted
changes to //net).

Original change's description:
> [MessageLoop] Lock-free ScheduleWork() scheme
>
> The Lock is causing hangs because of priority inversion
> mixed with priority boosting (ScheduleWork() tends to
> boost the destination thread which may deschedule the
> posting thread; if the posting thread is a background
> thread this boost-induded-desched-while-holding-lock
> can cause a livelock). See  https://crbug.com/890978#c10 
> for example crashes catching this.
>
> The Lock was only necessary for startup/shutdown and is
> being replaced by a lock-free atomic scheme in this CL.
>
> MessagePump::ScheduleWork() itself was already thread-safe
> (but the Android impl did unnecessarily check a non-atomic bool)
>
> This adds a WaitableEvent in ~MessageLoop(); hence the requirement
> for a wait-allowance in net's EmbeddedTestServer.
>
> TBR=zhongyi@chromium.org (embedded_test_server.cc side-effects)
>
> Bug:  890978 , 874237
> Change-Id: I0916e5a99035a935b0a23a770af256f334e78c43
> Reviewed-on: https://chromium-review.googlesource.com/c/1278631
> Commit-Queue: Gabriel Charette <gab@chromium.org>
> Reviewed-by: François Doray <fdoray@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#601600}

Bug:  890978 , 874237,  897925 
Change-Id: I17c515f9a3169bbdfc303a4b259f34097e31730d
Reviewed-on: https://chromium-review.googlesource.com/c/1297129
Reviewed-by: François Doray <fdoray@chromium.org>
Commit-Queue: Gabriel Charette <gab@chromium.org>
Cr-Commit-Position: refs/heads/master@{#602166}
[modify] https://crrev.com/c8a1dc548deacfd28e32e5f1c54b02ec59dfc911/base/message_loop/message_loop.cc
[modify] https://crrev.com/c8a1dc548deacfd28e32e5f1c54b02ec59dfc911/base/message_loop/message_loop.h
[modify] https://crrev.com/c8a1dc548deacfd28e32e5f1c54b02ec59dfc911/base/message_loop/message_pump.h
[modify] https://crrev.com/c8a1dc548deacfd28e32e5f1c54b02ec59dfc911/base/message_loop/message_pump_android.cc
[modify] https://crrev.com/c8a1dc548deacfd28e32e5f1c54b02ec59dfc911/base/threading/thread_restrictions.h

Getting rid of the lock and priority inversion ( issue 890978 ) helped but ScheduleWork() is still expensive.

The lock cleanup primarily helped during startup on Windows, reducing the overhead from 5.2% to 2.6% on the IO thread and from 2.2% to 0.6% on the UI thread:
 - https://uma.googleplex.com/p/chrome/callstacks/?sid=55c7cb696600ca527cad74069af7a270
 - https://uma.googleplex.com/p/chrome/callstacks/?sid=51e79153dd8ac9d82a5288f5ecfcccc6


The remaining overhead is 1-2% on Windows IO thread (lower on main) : 
 - https://uma.googleplex.com/p/chrome/callstacks/?sid=0157620b8cebb4be20ab988468968543
 - https://uma.googleplex.com/p/chrome/callstacks/?sid=903dc4f763589bb8ec609535cac12197
but still 5-19% on Mac (CFRunLoopWakeUp dominates) :
 - https://uma.googleplex.com/p/chrome/callstacks/?sid=b04ef942e7583beb7ce96e88d572da3b
 - https://uma.googleplex.com/p/chrome/callstacks/?sid=8da74b9f906046707c4a16bf63927682

The next trick inline is to reduce the number of calls to ScheduleWork : https://docs.google.com/document/d/1no1JMli6F1r8gTF9KDIOvoWkUUZcXDktPf4A1IXYc3M/edit#

which will also be the result of the incoming replacement of base::MessageLoop by base::SequenceManager (the current problem is that posting to self from a delayed task while having no immediate tasks results in a redundant ScheduleWork() despite already running).
Cc: etienneb@chromium.org
Adding to the above analysis, attached are plots of the cumulative distribution of profiles containing MessageLoopTaskRunner::PostDelayedTask samples based on the number of samples in the function. This gives a sense of how this change affected performance over the population. In this case it appears that there was a fairly uniform improvement across all users on the UI thread, and a slightly less uniform improvement on the IO thread.

Note that the percentages in these graphs are higher than what we see in the dashboard because this analysis covers all invocations of the MessageLoopTaskRunner::PostDelayedTask function, whereas the dashboard is restricted to invocations taking at least 100μs.

Internal colab for this analysis: colab/drive/1p_hhVhjWVIYOiRxutqIV6SW0kg5bg97n
BrowserIOThreadInversionFix.png
100 KB View Download
BrowserMainThreadInversionFix.png
103 KB View Download
Labels: UMA-Sampling-Profiler

Sign in to add a comment