5-18% of non-idle time on IO thread is spent in MessageLoop::ScheduleWork |
||||||||
Issue descriptionChrome Version: 70.0.3520.0 (and all prior) OS: All Sampling profiler allows us to see that we spend a *lot* of time in MessageLoop::ScheduleWork (which inlines to MessagePump::ScheduleWork and differs per platform), in particular on the IO thread: Mac OSX: * Startup : 8.5% (351ms) : https://uma.googleplex.com/p/chrome/callstacks/?sid=ac3086581d87e7e341618d95e669dfb4 * Periodic : 18% : https://uma.googleplex.com/p/chrome/callstacks/?sid=c73b9e98ff098320e5035a8d81443ca7 Windows: * Startup : 6.3% (211ms) : https://uma.googleplex.com/p/chrome/callstacks/?sid=98a95f9298deda07b304d0cdcdf0bb55 * Periodic : 5% : https://uma.googleplex.com/p/chrome/callstacks/?sid=ab2637dbd934ef05b67db120f323dfd9 It looks like Mac (CFRunLoop / MessagePumpLibEvent) is slower than Windows. It also looks like some of the contention (but not the majority) is on MessageLoop::Controller::message_loop_lock_. This is surprising because there's already logic in place to only grab the lock and ScheduleWork() if the poster is the one that made the queue non-empty (so there shouldn't be contention here unless the destination somehow wakes up and processes the task before the ScheduleWork returns...). Maybe posting to self when queue is empty is problematic? i.e. ScheduleWork on self while already running because it happened to make the queue non-empty (but we hadn't yet told our pump we were out of work..?)... (would explain IO->IO slowness). If so, then perhaps posting to another thread which is already running (but happens to have an empty queue while processing its last task) and scheduling it is problematic as well (would explain the IO->UI half of the slowness). Still not clear why Mac is hurt by this more than Windows... OS specifics I guess. This is a big chunk of issue 786597 (this issue downgrades that one to a 1-2% issue instead of the full 19% issue it was)
,
Aug 16
,
Aug 18
Ok so I've dug into the majority of the use cases where this is happening. While I think there are a few things we could improve, I'm not convinced that's the problem. I think there might be a bigger sampling problem here, e.g. if signaling the OS results in yielding to the signaled thread (we suspect some kernels do this), it also makes it more likely that the sampling profiler thread will wake up while the signalee is still on its "send signal" stack. Case in point, in all of the Mac cases, the top of the stack is a kernel symbol "mach_msg_trap". And similarly on Windows it's often NtUserPostMessage or NtSetIoCompletion. Since the sampling profiler doesn't really capture "time spent on a stack" but rather "frequency with which a given stack is seen", if signaling another thread results in the current thread being preempted in favor of the signaled thread, then the signalee spends more unscheduled time in the "just signaled" state and is more likely to be seen in that state by the sampling profiler (even more so since, by relinquishing its quanta, it allows the sampling profiler thread itself to be scheduled and see it in that state). Not sure how to diagnose this... I tried grabbing an ETW system trace of my browser at run time and it indeed has a non-negligible amount of samples in the ScheduleWork() stacks (out of a 10000 samples (ms) trace), see below. If I understand the kernel correctly, it seems the signalee is indeed descheduled after delivering the APC (KiDeliverApc) and it causes ETW to capture stack walks as well? (is ETW also susceptible to the aforementioned sampling issue..?) +brucedawson for Windows/ETW expertise. ETW trace is here but can be easily reproduced locally (normal browsing) : https://drive.google.com/open?id=1MiQEDQ-QFhOsdtYMZvkb-9Xh5hUI92YM BrowserThread::IO caught in MessagePumpForUI::ScheduleWork in 293 samples (2.9%) Line #, Process, Thread ID, Stack, Count, Weight (in view) (ms) 3, , , ntdll.dll!RtlUserThreadStart, 350, 350.543886 4, , , kernel32.dll!BaseThreadInitThunk, 350, 350.543886 5, , , chrome.dll!base::`anonymous namespace'::ThreadFunc, 350, 350.543886 6, , , chrome.dll!base::Thread::ThreadMain, 350, 350.543886 7, , , chrome.dll!content::BrowserProcessSubThread::IOThreadRun, 350, 350.543886 8, , , chrome.dll!base::RunLoop::Run, 350, 350.543886 9, , , chrome.dll!base::MessagePumpWin::Run, 350, 350.543886 10, , , chrome.dll!base::MessagePumpForIO::DoRunLoop, 350, 350.543886 11, , , |- chrome.dll!base::MessagePumpForIO::WaitForIOCompletion, 310, 310.533005 12, , , | chrome.dll!mojo::core::`anonymous namespace'::ChannelWin::OnIOCompleted, 310, 310.533005 13, , , | chrome.dll!mojo::core::Channel::OnReadComplete, 310, 310.533005 14, , , | chrome.dll!mojo::core::NodeChannel::OnChannelMessage, 310, 310.533005 15, , , | chrome.dll!mojo::core::RequestContext::~RequestContext, 310, 310.533005 16, , , | |- chrome.dll!mojo::core::Watch::InvokeCallback, 296, 296.458517 17, , , | | chrome.dll!mojo::core::WatcherDispatcher::InvokeWatchCallback, 296, 296.458517 18, , , | | chrome.dll!mojo::SimpleWatcher::Context::CallNotify, 296, 296.458517 19, , , | | chrome.dll!mojo::SimpleWatcher::Context::Notify, 296, 296.458517 20, , , | | |- chrome.dll!base::TaskRunner::PostTask, 293, 293.443289 21, , , | | | chrome.dll!base::internal::MessageLoopTaskRunner::PostDelayedTask, 293, 293.443289 22, , , | | | chrome.dll!base::internal::MessageLoopTaskRunner::AddToIncomingQueue, 293, 293.443289 23, , , | | | chrome.dll!base::MessageLoop::Controller::DidQueueTask, 293, 293.443289 24, , , | | | chrome.dll!base::MessagePumpForUI::ScheduleWork, 293, 293.443289 25, , , | | | |- user32.dll!PostMessageW, 291, 291.418937 26, , , | | | | |- win32u.dll!ZwUserPostMessage, 287, 287.418937 27, , , | | | | | |- ntoskrnl.exe!KiSystemServiceCopyEnd, 268, 268.327193 28, , , | | | | | | win32kfull.sys!NtUserPostMessage, 268, 268.327193 29, , , | | | | | | |- ntoskrnl.exe!ExReleaseResourceAndLeaveCriticalRegion, 146, 145.956290 30, , , | | | | | | | |- ntoskrnl.exe!KiCheckForKernelApcDelivery, 132, 131.956290 31, , , | | | | | | | | ntoskrnl.exe!KiDeliverApc, 132, 131.956290 32, , , | | | | | | | | |- ntoskrnl.exe!EtwpStackWalkApc, 131, 130.956290 33, , , | | | | | | | | | ntoskrnl.exe!EtwpTraceStackWalk, 131, 130.956290 34, , , | | | | | | | | | |- ntoskrnl.exe!RtlWalkFrameChain, 128, 127.956290 35, , , | | | | | | | | | | |- ntoskrnl.exe!RtlpWalkFrameChain, 127, 126.956290 36, , , | | | | | | | | | | | |- ntoskrnl.exe!RtlpLookupFunctionEntryForStackWalks, 83, 82.875410 37, , , | | | | | | | | | | | | |- ntoskrnl.exe!RtlpLookupFunctionEntryForStackWalks<itself>, 72, 71.851367 38, , , | | | | | | | | | | | | |- ntoskrnl.exe!RtlpLookupUserFunctionTable, 10, 10.024043 39, , , | | | | | | | | | | | | |- ntoskrnl.exe!KiDpcInterrupt, 1, 1.000000 40, , , | | | | | | | | | | | |- ntoskrnl.exe!RtlpWalkFrameChain<itself>, 43, 43.080880 41, , , | | | | | | | | | | | |- ntoskrnl.exe!RtlpGetStackLimits, 1, 1.000000 42, , , | | | | | | | | | | |- ntoskrnl.exe!MmCanThreadFault, 1, 1.000000 43, , , | | | | | | | | | |- ntoskrnl.exe!EtwpLogKernelEvent, 2, 2.000000 44, , , | | | | | | | | | |- ntoskrnl.exe!KeQueryCurrentStackInformation, 1, 1.000000 45, , , | | | | | | | | |- ntoskrnl.exe!KiDeliverApc<itself>, 1, 1.000000 46, , , | | | | | | | |- ntoskrnl.exe!ExpReleaseResourceForThreadLite, 14, 14.000000 47, , , | | | | | | |- win32kfull.sys!_PostTransformableMessageExtended, 95, 95.316909 48, , , | | | | | | |- win32kbase.sys!EnterCrit, 17, 17.000022 49, , , | | | | | | |- win32kbase.sys!ValidateHwndEx, 8, 8.053972 50, , , | | | | | | |- win32kfull.sys!NtUserPostMessage<itself>, 1, 1.000000 51, , , | | | | | | |- win32kbase.sys!ValidateHwnd, 1, 1.000000 BrowserThread::UI caught in MessageLoopForIO::ScheduleWork() 64 times (0.6%) Line #, Process, Thread ID, Stack, Count, Weight (in view) (ms) 3, , , ntdll.dll!RtlUserThreadStart, 143, 142.992344 4, , , kernel32.dll!BaseThreadInitThunk, 143, 142.992344 5, , , chrome.exe!__scrt_common_main_seh, 143, 142.992344 6, , , chrome.exe!wWinMain, 143, 142.992344 7, , , chrome.exe!MainDllLoader::Launch, 143, 142.992344 8, , , chrome.dll!ChromeMain, 143, 142.992344 9, , , chrome.dll!content::ContentMain, 143, 142.992344 10, , , chrome.dll!service_manager::Main, 143, 142.992344 11, , , chrome.dll!content::ContentMainRunnerImpl::Run, 143, 142.992344 12, , , chrome.dll!content::RunBrowserProcessMain, 143, 142.992344 13, , , chrome.dll!content::BrowserMain, 143, 142.992344 14, , , chrome.dll!content::BrowserMainRunnerImpl::Run, 143, 142.992344 15, , , chrome.dll!content::BrowserMainLoop::RunMainMessageLoopParts, 143, 142.992344 16, , , chrome.dll!ChromeBrowserMainParts::MainMessageLoopRun, 143, 142.992344 17, , , chrome.dll!base::RunLoop::Run, 143, 142.992344 18, , , chrome.dll!base::MessagePumpWin::Run, 143, 142.992344 19, , , chrome.dll!base::MessagePumpForUI::DoRunLoop, 143, 142.992344 20, , , |- chrome.dll!base::MessageLoop::DoWork, 133, 133.377316 21, , , | chrome.dll!base::MessageLoop::RunTask, 133, 133.377316 22, , , | chrome.dll!base::debug::TaskAnnotator::RunTask, 133, 133.377316 23, , , | |- chrome.dll!cc::TileManager::FlushAndIssueSignals, 66, 66.048364 24, , , | | chrome.dll!gpu::GpuChannelHost::EnsureFlush, 66, 66.048364 25, , , | | chrome.dll!gpu::GpuChannelHost::InternalFlush, 66, 66.048364 26, , , | | chrome.dll!gpu::GpuChannelHost::Send, 66, 66.048364 27, , , | | chrome.dll!base::TaskRunner::PostTask, 66, 66.048364 28, , , | | chrome.dll!base::internal::MessageLoopTaskRunner::PostDelayedTask, 66, 66.048364 29, , , | | chrome.dll!base::internal::MessageLoopTaskRunner::AddToIncomingQueue, 66, 66.048364 30, , , | | chrome.dll!base::MessageLoop::Controller::DidQueueTask, 66, 66.048364 31, , , | | chrome.dll!base::MessagePumpForIO::ScheduleWork, 66, 66.048364 32, , , | | |- KernelBase.dll!PostQueuedCompletionStatus, 64, 64.012728 33, , , | | | ntdll.dll!NtSetIoCompletion, 64, 64.012728 34, , , | | | |- ntoskrnl.exe!KiSystemServiceCopyEnd, 63, 63.012728 35, , , | | | | ntoskrnl.exe!NtSetIoCompletion, 63, 63.012728 36, , , | | | | |- ntoskrnl.exe!IoSetIoCompletionEx2, 62, 61.969687 37, , , | | | | | |- ntoskrnl.exe!KiExitDispatcher, 58, 57.893148 38, , , | | | | | | |- ntoskrnl.exe!KiApcInterrupt, 46, 45.890263 39, , , | | | | | | | ntoskrnl.exe!KiDeliverApc, 46, 45.890263 40, , , | | | | | | | |- ntoskrnl.exe!EtwpStackWalkApc, 45, 44.815134 41, , , | | | | | | | | ntoskrnl.exe!EtwpTraceStackWalk, 45, 44.815134 42, , , | | | | | | | | ntoskrnl.exe!RtlWalkFrameChain, 45, 44.815134 43, , , | | | | | | | | ntoskrnl.exe!RtlpWalkFrameChain, 45, 44.815134 44, , , | | | | | | | | |- ntoskrnl.exe!RtlpWalkFrameChain<itself>, 23, 22.868560 45, , , | | | | | | | | |- ntoskrnl.exe!RtlpLookupFunctionEntryForStackWalks, 22, 21.946574 46, , , | | | | | | | |- ntoskrnl.exe!KiDeliverApc<itself>, 1, 1.075129 47, , , | | | | | | |- ntoskrnl.exe!EtwTraceReadyThread, 10, 10.002885 48, , , | | | | | | |- ntoskrnl.exe!KiDeferredReadyThread, 2, 2.000000 49, , , | | | | | |- ntoskrnl.exe!KiWakeQueueWaiter, 2, 2.076539 50, , , | | | | | |- ntoskrnl.exe!IopAllocateMiniCompletionPacket, 2, 2.000000 51, , , | | | | |- ntoskrnl.exe!ObReferenceObjectByHandle, 1, 1.043041
,
Aug 18
Posting from BrowserThread::IO -> BrowserThread::UI is definitely the worst MessagePump interaction according to both sampling profiler (Canary -- both Mac and Win) and ETW (local).
,
Aug 19
Any sampling profiler is potentially vulnerable to this type of issue. The sampler built in to Chrome is more vulnerable because it is waking up on the same timer interrupt that Chrome is using to wake up so it can easily end up with bias. ETW's sampling uses a different interrupt and can be run at 8 kHz which should significantly avoid this. The other issue that you are seeing from the ETW traces is that when a thread wakes up a stack walk is done and this stack walk ends up being sampled. If there are a lot of context switches then this distorts the results. Luckily this is easily avoided: in UIforETW just uncheck context switch call stacks. If you are just looking at CPU Usage (sampled) then you can disable the context switch call stacks without losing anything (except the distortions they cause).
,
Aug 20
Here's a fast sampling ETW trace with "context switch call stacks" disabled : https://drive.google.com/file/d/1UvENzlsS_kf_U6uYB0uLmajBjMDMo0NJ/view?usp=sharing If I'm reading the trace correctly, looks like BrowserThread::IO was captured for 8685 samples (chrome.dll!base::MessagePumpForIO::DoRunLoop), of which it was non-idle 6586 (chrome.dll!mojo::core::`anonymous namespace'::ChannelWin::OnIOCompleted) of which 576 were inside chrome.dll!base::MessagePumpForUI::ScheduleWork. That's still 8.7% of non-idle time on BrowserThread::IO spent in MessagePumpForUI::ScheduleWork. @bruce could you confirm I'm doing this right? Thanks!
,
Aug 20
This might be real (i.e. not a sampling bias) and caused by kernel priority boosting, here are my thoughts on that : https://groups.google.com/a/chromium.org/d/msg/scheduler-dev/jYA3SYZ1aDc/7lz1P-GvDQAJ
,
Aug 22
I like your very explicit trace renaming - thank you. Ah, Windows 10 14393 - no thread names. I've started to get accustomed to having them and I definitely miss them now. The dynamic priority boosting of readied threads can cause some problems - it can (especially if there are not enough cores for all threads that are ready to run) cause the thread that does the signaling to lose the CPU. If the signaling/readying thread loses the CPU then it will spend a long time in that function... but most of that time will not show up in CPU Usage (Sampled), because the samples are only recorded for the threads that are on CPU. The context switches will cause some wasted time (storing/restoring registers, cache misses) but I wanted to clarify that the time the thread was not running would not be recorded. Anyway, on to the trace. Over the 27.234 s of the trace I see 33,431 samples inside of chrome.dll!base::MessagePumpForIO::DoRunLoop on thread 7,736 of process 8996. There were also 19 samples in that function in thread 8,620 but let's ignore those. There seems to be at least 101 threads in that Chrome process. That seems... excessive. Especially for a two-core two-thread machine. Is that normal for Chrome? It seems higher than I have noticed before. I don't understand this comment. > of which it was non-idle 6586 I then viewed callees of chrome.dll!base::MessagePumpForUI::ScheduleWork to make sure I found all instances of it and I found 513 samples on thread 7,736, 783 across all threads. I looked to see where time was spent inside ScheduleWork (that's why I used view callees instead of view callers) and notice that 112 samples were in win32kbase.sys!EtwTraceQueueMessage, so I guess has some overhead even with context switches turned off. So, I'm seeing 513/33,431 samples on thread 7,736 that were inside of chrome.dll!base::MessagePumpForIO::DoRunLoop are also inside of chrome.dll!base::MessagePumpForUI::ScheduleWork. That's 1.53%, which is fairly low. But, this may be the wrong question. The CPU Usage (Precise) data says that thread 7,736 used 4,425.225341 ms of CPU time during this trace. It also says that thread 7,736 was context switched in (and therefore also out) 29,897 times during the trace. That's 6.75 times per ms that the thread is running. That seems very high. The browser process as a whole did 70,563 context switches during that time. The priority boost probably increases context switches to some extent, but maybe the root cause is too much communication between threads. I know that some messages take multiple context switches to send (if the message is larger than about 4,000 bytes, due to our pipe buffer sizes). The context switch data allows exploring things such as which threads are communicating. In the attached screenshot you can see that 8996/7,736 was woken up 12,337 of its 29,897 times by the same process, mostly by threads 8,948 (main thread) and 6,748 (running on chrome.dll!base::win::ObjectWatcher::DoneWaiting) Thread 6,748, BTW, did an impressive 4,804 context switches and only ran for a total of 71.97 ms, or 66.75 per ms of run-time. Traces with context switch call stacks allow a better understanding of things like where threads are context switching and where they are readying others. Tricky stuff. I would agree that if we can reduce context switches we can probably improve performance. Turning off the priority boost is one way (although less relevant on many-core machines) but there may be some other options worth exploring.
,
Aug 23
,
Sep 18
,
Oct 13
,
Oct 22
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f7da13a28c3799819bd39c4d95dc02a805b27e4f commit f7da13a28c3799819bd39c4d95dc02a805b27e4f Author: Gabriel Charette <gab@chromium.org> Date: Mon Oct 22 16:07:49 2018 [MessageLoop] Lock-free ScheduleWork() scheme The Lock is causing hangs because of priority inversion mixed with priority boosting (ScheduleWork() tends to boost the destination thread which may deschedule the posting thread; if the posting thread is a background thread this boost-induded-desched-while-holding-lock can cause a livelock). See https://crbug.com/890978#c10 for example crashes catching this. The Lock was only necessary for startup/shutdown and is being replaced by a lock-free atomic scheme in this CL. MessagePump::ScheduleWork() itself was already thread-safe (but the Android impl did unnecessarily check a non-atomic bool) This adds a WaitableEvent in ~MessageLoop(); hence the requirement for a wait-allowance in net's EmbeddedTestServer. TBR=zhongyi@chromium.org (embedded_test_server.cc side-effects) Bug: 890978 , 874237 Change-Id: I0916e5a99035a935b0a23a770af256f334e78c43 Reviewed-on: https://chromium-review.googlesource.com/c/1278631 Commit-Queue: Gabriel Charette <gab@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Cr-Commit-Position: refs/heads/master@{#601600} [modify] https://crrev.com/f7da13a28c3799819bd39c4d95dc02a805b27e4f/base/message_loop/message_loop.cc [modify] https://crrev.com/f7da13a28c3799819bd39c4d95dc02a805b27e4f/base/message_loop/message_loop.h [modify] https://crrev.com/f7da13a28c3799819bd39c4d95dc02a805b27e4f/base/message_loop/message_pump.h [modify] https://crrev.com/f7da13a28c3799819bd39c4d95dc02a805b27e4f/base/message_loop/message_pump_android.cc [modify] https://crrev.com/f7da13a28c3799819bd39c4d95dc02a805b27e4f/net/test/embedded_test_server/embedded_test_server.cc
,
Oct 22
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/243f975c09f36ff7e90f43a530e64b31aac641e9 commit 243f975c09f36ff7e90f43a530e64b31aac641e9 Author: Yi Gu <yigu@chromium.org> Date: Mon Oct 22 21:34:13 2018 Revert "[MessageLoop] Lock-free ScheduleWork() scheme" This reverts commit f7da13a28c3799819bd39c4d95dc02a805b27e4f. Reason for revert: As per comments by Findit Original change's description: > [MessageLoop] Lock-free ScheduleWork() scheme > > The Lock is causing hangs because of priority inversion > mixed with priority boosting (ScheduleWork() tends to > boost the destination thread which may deschedule the > posting thread; if the posting thread is a background > thread this boost-induded-desched-while-holding-lock > can cause a livelock). See https://crbug.com/890978#c10 > for example crashes catching this. > > The Lock was only necessary for startup/shutdown and is > being replaced by a lock-free atomic scheme in this CL. > > MessagePump::ScheduleWork() itself was already thread-safe > (but the Android impl did unnecessarily check a non-atomic bool) > > This adds a WaitableEvent in ~MessageLoop(); hence the requirement > for a wait-allowance in net's EmbeddedTestServer. > > TBR=zhongyi@chromium.org (embedded_test_server.cc side-effects) > > Bug: 890978 , 874237 > Change-Id: I0916e5a99035a935b0a23a770af256f334e78c43 > Reviewed-on: https://chromium-review.googlesource.com/c/1278631 > Commit-Queue: Gabriel Charette <gab@chromium.org> > Reviewed-by: François Doray <fdoray@chromium.org> > Cr-Commit-Position: refs/heads/master@{#601600} TBR=gab@chromium.org,fdoray@chromium.org,zhongyi@chromium.org Change-Id: I521ef6ba2758f84d13a4f98d65cb41b276cb115e No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 890978 , 874237 Reviewed-on: https://chromium-review.googlesource.com/c/1294717 Reviewed-by: Yi Gu <yigu@chromium.org> Commit-Queue: Yi Gu <yigu@chromium.org> Cr-Commit-Position: refs/heads/master@{#601731} [modify] https://crrev.com/243f975c09f36ff7e90f43a530e64b31aac641e9/base/message_loop/message_loop.cc [modify] https://crrev.com/243f975c09f36ff7e90f43a530e64b31aac641e9/base/message_loop/message_loop.h [modify] https://crrev.com/243f975c09f36ff7e90f43a530e64b31aac641e9/base/message_loop/message_pump.h [modify] https://crrev.com/243f975c09f36ff7e90f43a530e64b31aac641e9/base/message_loop/message_pump_android.cc [modify] https://crrev.com/243f975c09f36ff7e90f43a530e64b31aac641e9/net/test/embedded_test_server/embedded_test_server.cc
,
Oct 23
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c8a1dc548deacfd28e32e5f1c54b02ec59dfc911 commit c8a1dc548deacfd28e32e5f1c54b02ec59dfc911 Author: Gabriel Charette <gab@chromium.org> Date: Tue Oct 23 23:10:55 2018 Reland "[MessageLoop] Lock-free ScheduleWork() scheme" This is a reland of f7da13a28c3799819bd39c4d95dc02a805b27e4f (original change in PS1) The original change was missing acquire ordering in DisconnectFromParent(). This is necessary in order for the disconnecting thread to see all memory side-effects previously made by other threads (or some side-effects of message_loop_->ScheduleWork() could racily come in after ~MessageLoop()). Also removed the DCHECK that |operations_state_ == kDisconnectedBit| at the end of DisconnectFromParent() as it was incorrect. A racy BeforeOperation() call can make it 1->3 (and no-op) after DisconnectFromParent() made it 0->1. And lastly, added a scoped allowance to always allow the very fast wait instead of requiring callers to know about this implementation detail of MessageLoop (and reverted changes to //net). Original change's description: > [MessageLoop] Lock-free ScheduleWork() scheme > > The Lock is causing hangs because of priority inversion > mixed with priority boosting (ScheduleWork() tends to > boost the destination thread which may deschedule the > posting thread; if the posting thread is a background > thread this boost-induded-desched-while-holding-lock > can cause a livelock). See https://crbug.com/890978#c10 > for example crashes catching this. > > The Lock was only necessary for startup/shutdown and is > being replaced by a lock-free atomic scheme in this CL. > > MessagePump::ScheduleWork() itself was already thread-safe > (but the Android impl did unnecessarily check a non-atomic bool) > > This adds a WaitableEvent in ~MessageLoop(); hence the requirement > for a wait-allowance in net's EmbeddedTestServer. > > TBR=zhongyi@chromium.org (embedded_test_server.cc side-effects) > > Bug: 890978 , 874237 > Change-Id: I0916e5a99035a935b0a23a770af256f334e78c43 > Reviewed-on: https://chromium-review.googlesource.com/c/1278631 > Commit-Queue: Gabriel Charette <gab@chromium.org> > Reviewed-by: François Doray <fdoray@chromium.org> > Cr-Commit-Position: refs/heads/master@{#601600} Bug: 890978 , 874237, 897925 Change-Id: I17c515f9a3169bbdfc303a4b259f34097e31730d Reviewed-on: https://chromium-review.googlesource.com/c/1297129 Reviewed-by: François Doray <fdoray@chromium.org> Commit-Queue: Gabriel Charette <gab@chromium.org> Cr-Commit-Position: refs/heads/master@{#602166} [modify] https://crrev.com/c8a1dc548deacfd28e32e5f1c54b02ec59dfc911/base/message_loop/message_loop.cc [modify] https://crrev.com/c8a1dc548deacfd28e32e5f1c54b02ec59dfc911/base/message_loop/message_loop.h [modify] https://crrev.com/c8a1dc548deacfd28e32e5f1c54b02ec59dfc911/base/message_loop/message_pump.h [modify] https://crrev.com/c8a1dc548deacfd28e32e5f1c54b02ec59dfc911/base/message_loop/message_pump_android.cc [modify] https://crrev.com/c8a1dc548deacfd28e32e5f1c54b02ec59dfc911/base/threading/thread_restrictions.h
,
Oct 30
Getting rid of the lock and priority inversion ( issue 890978 ) helped but ScheduleWork() is still expensive. The lock cleanup primarily helped during startup on Windows, reducing the overhead from 5.2% to 2.6% on the IO thread and from 2.2% to 0.6% on the UI thread: - https://uma.googleplex.com/p/chrome/callstacks/?sid=55c7cb696600ca527cad74069af7a270 - https://uma.googleplex.com/p/chrome/callstacks/?sid=51e79153dd8ac9d82a5288f5ecfcccc6 The remaining overhead is 1-2% on Windows IO thread (lower on main) : - https://uma.googleplex.com/p/chrome/callstacks/?sid=0157620b8cebb4be20ab988468968543 - https://uma.googleplex.com/p/chrome/callstacks/?sid=903dc4f763589bb8ec609535cac12197 but still 5-19% on Mac (CFRunLoopWakeUp dominates) : - https://uma.googleplex.com/p/chrome/callstacks/?sid=b04ef942e7583beb7ce96e88d572da3b - https://uma.googleplex.com/p/chrome/callstacks/?sid=8da74b9f906046707c4a16bf63927682 The next trick inline is to reduce the number of calls to ScheduleWork : https://docs.google.com/document/d/1no1JMli6F1r8gTF9KDIOvoWkUUZcXDktPf4A1IXYc3M/edit# which will also be the result of the incoming replacement of base::MessageLoop by base::SequenceManager (the current problem is that posting to self from a delayed task while having no immediate tasks results in a redundant ScheduleWork() despite already running).
,
Nov 6
Adding to the above analysis, attached are plots of the cumulative distribution of profiles containing MessageLoopTaskRunner::PostDelayedTask samples based on the number of samples in the function. This gives a sense of how this change affected performance over the population. In this case it appears that there was a fairly uniform improvement across all users on the UI thread, and a slightly less uniform improvement on the IO thread. Note that the percentages in these graphs are higher than what we see in the dashboard because this analysis covers all invocations of the MessageLoopTaskRunner::PostDelayedTask function, whereas the dashboard is restricted to invocations taking at least 100μs. Internal colab for this analysis: colab/drive/1p_hhVhjWVIYOiRxutqIV6SW0kg5bg97n
,
Nov 12
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by wittman@chromium.org
, Aug 14