Hangouts is doing ~50 context switches per second doing IPC with the browser process |
||||||||
Issue descriptionThis appears to be a new source of context switches when hangouts should be idle. This was observed in Version 53.0.2768.0 canary (64-bit) while verifying that crbug.com/614192 was fixed. This mostly looks different from bug 614192 , but the mention of APIActivityLogger::LogInternal on one of the call stacks suggests that it might not be. It's not clear whether this is multiple bugs/root-causes or just one - filing just a single bug for now. Using procexp.exe (sysinternal's process explorer) on Windows and the --force-fieldtrials=SiteIsolationExtensions/Control command line option to keep all of hangouts in one process I noticed that CSwitch Delta (equivalent to idle wakeups) was about 50 per second in the hangouts process. Additionally, the browser process was also showing an CSwitch Delta about 50 higher when hangouts was running. This is with no hangouts windows visible - as idle as it can be. The combination of high CSwitch Delta in both processes suggests ongoing IPC. I grabbed an ETW trace (tinyurl.com/etwcentral) to see where the context switches were coming from. There appear to be multiple sources and it isn't clear if they are related. About 3-4 times a second the hangout process wakes up (thread 14,432 on the trace I'm examining) on this call stack after WaitForSingleObjectEx times out: |- ntdll.dll!RtlUserThreadStart | kernel32.dll!BaseThreadInitThunk | chrome.exe!__scrt_common_main_seh | chrome.exe!wWinMain | chrome.exe!MainDllLoader::Launch | chrome_child.dll!ChromeMain | chrome_child.dll!content::ContentMainRunnerImpl::Run | chrome_child.dll!content::RendererMain | chrome_child.dll!base::MessageLoop::Run | chrome_child.dll!base::RunLoop::Run | chrome_child.dll!base::MessagePumpDefault::Run | chrome_child.dll!base::WaitableEvent::TimedWait | KernelBase.dll!WaitForSingleObjectEx On the same thread, once a second the hangout process wakes up eight times in quick succession on this call stack: | |- chrome_child.dll!v8::internal::Builtin_HandleApiCall | | chrome_child.dll!v8::internal::Builtin_Impl_HandleApiCall | | chrome_child.dll!v8::internal::`anonymous namespace'::HandleApiCallHelper | | chrome_child.dll!v8::internal::FunctionCallbackArguments::Call | | chrome_child.dll!extensions::ObjectBackedNativeHandler::Router | | chrome_child.dll!extensions::RuntimeCustomBindings::OpenChannelToExtension | | chrome_child.dll!content::RenderThreadImpl::Send | | chrome_child.dll!IPC::SyncChannel::Send | | chrome_child.dll!IPC::SyncChannel::WaitForReply | | chrome_child.dll!base::WaitableEvent::WaitMany | | kernel32.dll!WaitForMultipleObjects It is readied (woken up) every time by this call stack on another thread in the same process: ntdll.dll!RtlUserThreadStart kernel32.dll!BaseThreadInitThunk chrome_child.dll!base::`anonymous namespace'::ThreadFunc chrome_child.dll!base::Thread::ThreadMain chrome_child.dll!base::Thread::Run chrome_child.dll!base::RunLoop::Run chrome_child.dll!base::MessagePumpWin::Run chrome_child.dll!base::MessagePumpForIO::DoRunLoop chrome_child.dll!base::MessagePumpForIO::WaitForIOCompletion chrome_child.dll!mojo::edk::`anonymous namespace'::ChannelWin::OnIOCompleted chrome_child.dll!mojo::edk::Channel::OnReadComplete chrome_child.dll!mojo::edk::NodeChannel::OnChannelMessage chrome_child.dll!mojo::edk::RequestContext::~RequestContext chrome_child.dll!base::internal::Invoker<...>::Run chrome_child.dll!mojo::Watcher::CallOnHandleReady chrome_child.dll!mojo::Watcher::OnHandleReady chrome_child.dll!mojo::internal::Connector::OnHandleReadyInternal chrome_child.dll!mojo::internal::Connector::ReadSingleMessage chrome_child.dll!mojo::internal::MessageHeaderValidator::Accept chrome_child.dll!mojo::internal::MultiplexRouter::Accept chrome_child.dll!mojo::internal::MultiplexRouter::ProcessIncomingMessage chrome_child.dll!IPC::mojom::ChannelRequestValidator::Accept chrome_child.dll!mojo::internal::InterfaceEndpointClient::HandleValidatedMessage chrome_child.dll!IPC::mojom::ChannelStub::Accept chrome_child.dll!IPC::internal::MessagePipeReader::Receive chrome_child.dll!IPC::ChannelMojo::OnMessageReceived chrome_child.dll!IPC::SyncChannel::SyncContext::OnMessageReceived chrome_child.dll!IPC::SyncChannel::SyncContext::TryToUnblockListener KernelBase.dll!SetEvent Meanwhile the other active thread in the hangouts process (18,004 in the trace I'm examining) wakes up about 33 times a second on this call stack: ntdll.dll!RtlUserThreadStart kernel32.dll!BaseThreadInitThunk chrome_child.dll!base::`anonymous namespace'::ThreadFunc chrome_child.dll!base::Thread::ThreadMain chrome_child.dll!base::Thread::Run chrome_child.dll!base::RunLoop::Run chrome_child.dll!base::MessagePumpWin::Run chrome_child.dll!base::MessagePumpForIO::DoRunLoop chrome_child.dll!base::MessagePumpForIO::WaitForIOCompletion KernelBase.dll!GetQueuedCompletionStatus ntdll.dll!NtRemoveIoCompletion It is woken up three quarters of the time by thread 14,432 on the call stacks below (the three stacks share the top five entries and then split into five different call directions) | chrome_child.dll!v8::internal::Builtin_HandleApiCall | chrome_child.dll!v8::internal::Builtin_Impl_HandleApiCall | chrome_child.dll!v8::internal::`anonymous namespace'::HandleApiCallHelper | chrome_child.dll!v8::internal::FunctionCallbackArguments::Call | chrome_child.dll!extensions::ObjectBackedNativeHandler::Router | |- chrome_child.dll!extensions::RuntimeCustomBindings::OpenChannelToExtension | |- chrome_child.dll!extensions::APIActivityLogger::LogInternal | |- chrome_child.dll!extensions::`anonymous namespace'::ExtensionImpl::PostMessageW The other quarter of the time it is woken up by the browser process on the call stack below: ntdll.dll!RtlUserThreadStart kernel32.dll!BaseThreadInitThunk chrome.dll!base::`anonymous namespace'::ThreadFunc chrome.dll!base::Thread::ThreadMain chrome.dll!content::BrowserThreadImpl::Run chrome.dll!content::BrowserThreadImpl::IOThreadRun chrome.dll!base::Thread::Run chrome.dll!base::RunLoop::Run chrome.dll!base::MessagePumpWin::Run chrome.dll!base::MessagePumpForIO::DoRunLoop |- chrome.dll!base::MessagePumpForIO::WaitForIOCompletion
,
Jun 16 2016
I'm not sure about the case here. What does the "Hangouts process" mean? In bug 614192 , there is WebRTC code, but that runs in the render process so I'm confused. Between the render and browser processes, there is a task run every 200 ms that sends WebRTC log messages from the render process to the browser process (if any messages). It doesn't match the above though (ca 8 and 24 times per second). I'm not familiar with APIActivityLogger, adding Jochen. Jochen: could you take a quick look at this. How does APIActivityLogger work?
,
Jun 16 2016
By the "Hangouts process" I mean the process that is listed in Chrome's Task Manager as "Extension: Google Hangouts". > there is a task run every 200 ms that sends WebRTC log messages from > the render process to the browser process One of the events was waking up every second, but doing ~8 context switches each time. That could be the logging, if something is producing messages every second. The question then would be why something is producing messages when hangouts should be idle. It should be easy to repro. Or, at least, let me know if you can't repro it. As it stands now I can identify the "Extension: Google Hangouts" process either by using Chrome's Task Manager or by looking at CSwitch Delta in procexp - see the attached screen shot to see what I mean.
,
Jun 16 2016
OK, so you're saying the Hangouts extension process is doing IPC with the browser process? And it's not the render process doing IPC with the browser process? I'm still confused how the webrtc code in the other bug could be run in the Hangouts extension process, if that actually was the case there.
,
Jun 16 2016
It's true that I can't tell who is initiating the IPC. However none of the other extension processes or tab processes are engaging in any IPC with the browser process, so Occam's razor says that the first place to look is in the hangouts extension. I haven't tried to figure out why any of this code is running in the hangouts extension process, but the ETW traces show that it definitely is.
,
Jun 21 2016
I grabbed a chrome://tracing trace. That plus some C++ debugging shows that this is being called repeatedly the hangouts extension process: > chrome_child.dll!extensions::MessagingBindings::DispatchOnDisconnect() Line 400 C++ This was on the latest canary. Any ideas?
,
Jun 21 2016
no idea what the api activity log is doing here. Adding felt@ for historical insights, and rdevlin.cronin as current owner
,
Jun 21 2016
Bruce - is there a chance that you have an "actual" hangouts extension installed? E.g. https://chrome.google.com/webstore/detail/google-hangouts/nckgahadagoaajjgafhacjanaoiihapd?hl=en I think we may be talking about two separate things with the same name.
,
Jun 21 2016
re activity log, right now we're really silly with it. We dispatch all events to be logged regardless of whether or not the user has enabled logging. I'm actively working on making this better. That said, it shouldn't be anything new - activity log code hasn't changed in months/years. I seem to recall that the hangouts extension does a *lot* of messaging, which has in the past caused issues with the process being continually woken up/kept awake and various other lifetime management issues. The stack trace for DispatchOnDisconnect makes me think this could be related to the same type of thing - disconnecting numerous message ports. Does this only repro on canary? Or also on dev/beta/stable? And +Rob as FYI since he's been doing some messaging related stuff (though I wouldn't think it would have caused this).
,
Jun 21 2016
tommi - I do have that hangouts extension installed. In chrome://extensions it shows up as 2015.1204.418.1. The last two failure-to-go-idle bugs were in this extension and they both got fixed so I assume I must be on the right code? I suspect that the API activity log is a side effect rather than a cause. I've found that ExtensionFrameHelper::OnExtensionDispatchOnDisconnect is being called about 10 times a second in the process labeled as "Extension: Google Hangouts" and it seems like that is probably the problem. There is Javascript being run but I've had no luck debugging it. I grabbed a JS Profile but it appears to be minified, and I'm not even sure I'm looking at the right code. Can you repro? I think this is the extension that is installed by default on gWindows, and process explorer makes it quite obvious - the "Extension: Google Hangouts" process can be identified by its context switch patterns.
,
Jun 21 2016
ExtensionFrameHelper::OnExtensionDispatchOnDisconnect is called whenever the extension message port (chrome.runtime.connect) is closed. Hangouts from the Chrome Web Store seems to use chrome.runtime.sendMessage. This API is implemented internally using chrome.runtime.connect, and causes the following IPC when frame A sends a message to a receiver frame(s) B in the extension. - A: JS calls chrome.runtime.connect() 1 A: sync IPC to browser to get an unique port ID - ExtensionHostMsg_OpenChannelToExtension 2 B: Receives IPC from browser: ExtensionMsg_DispatchOnConnect 3 B: Sends IPC to browser - ExtensionHostMsg_OpenMessagePort - A: JS calls port.postMessage 4 A: IPC to browser - ExtensionHostMsg_PostMessage 5 B: IPC from browser - ExtensionMsg_DeliverMessage - B: JS calls port.disconnect() (or the port is GC'd and the port is implicitly closed). 6 B: IPC to browser - ExtensionHostMsg_CloseMessagePort 7 A: IPC from browser - ExtensionMsg_DispatchOnDisconnect This involves 7 IPCs. When a response is sent, 2 there are two more IPCs. There are three ways to improve this: - Specialize the implementation of chrome.runtime.sendMessage. Without callbacks, 2 IPCs are sufficient. With a callback, 4 IPCs are sufficient. - Recognize when chrome.runtime.sendMessage is used from an extension process, and don't do IPC with the browser (stay within the renderer). (extension messages can also be sent from a different process; in that case this optimization is not applicable). - Re-use the infrastructure of the MessageChannel API from the web platform (since this should work cross-process thanks to the OOPIF efforts). Is this "bug" causing any issue, besides showing up in the metrics?
,
Jun 21 2016
Ah - okay. So ExtensionMsg_DispatchOnDisconnect is just a symptom, not the cause. Thank you for the explanation. The reason I filed this bug is that Hangouts not staying asleep when idle has consequences for power consumption. A similar issue was investigated with Battor a few weeks ago and was shown to increase power draw on the test laptop by 7-13% (IIRC). Any code that wakes up periodically is a concern for power consumption. The CPU time used (about 11 ms per s according to my traces) burns some power, but waking up at all is the main concern, since the CPUs take a while to get into a low power state after running. All of the other renderer processes that I have open in my (admittedly very lightweight) test scenario have zero context switches during most one-second periods. Since hangouts is running constantly in the background it is crucial that it be able to go for long periods of time without waking up at all.
,
Jun 21 2016
A few thoughts to tack on to what Rob mentioned: > Re-use the infrastructure of the MessageChannel API from the web platform (since this should work cross-process thanks to the OOPIF efforts). I am incredibly keen on this given how much grief our custom-rolled messaging has caused and how far web messaging has come (yay service workers!). Actually implementing it is another matter. It definitely should be done at some point, once we find the resources. Solutions #1 and #2 are good, reasonable optimizations, but I think are only band-aiding the problem, and that our time would be better spent in #3. Way #4 of potentially improving this is to analyze whether or not hangouts should be using chrome.runtime.sendMessage (instead of runtime.connect). sendMessage will create/destroy a single-use port, whereas connect keeps it around. So sendMessage is great for one-off things, and is pretty easy to code with, but isn't performant. It strikes me that any extension calling sendMessage 10 times a second (stat from #10) should probably be using something else. I wonder if, as chrome, we should recognize this and impose throttles to encourage developers to do the right thing - unfortunately, this will likely break some extensions (e.g. hangouts).
,
Jun 21 2016
It sounds like the solutions being suggested would make the sending of messages more efficient, or would reduce the number of context switches per message. That might be nice, but I think that is orthogonal to this bug. This bug was intended to say that hangouts should stay completely asleep for many seconds at a time. Hangouts needs to completely stop sending messages when there is nothing going on. Waking up periodically is incompatible with being power efficient, and being power efficient is necessary for any always-on components. I'm not clear on what messages are being sent, and why.
,
Jun 21 2016
,
Jun 21 2016
To me that sounds like the general idea behind solution #4 - make hangouts better. :)
,
Jun 22 2016
ivaylo - can you take a look at this?
,
Jun 22 2016
Re #13 (Way #4): Hangouts does not use chrome.runtime.sendMessage See here: cs/search/?q=f:chat/extensions/messaging.js Re #14: I agree with this is principle, but as someone else said here: "once we find the resources" I can look into it. Right now, *all* resource are spend on preparing for http://crbug.com/571511 and THE CHROME APPS news. After cl/95846902 , Hangouts will try to setup a connection with another Hangouts client *only* for the first 100 seconds. If it doesn't connect during the first 100 seconds, it will stop trying until it is restarted. I hope that helps you isolate the connection issue and verify if it is the only issue here, or of the many. brucedawson@, I can help set up a deobfuscated Hangouts client for JS profiling if you think it will be useful.
,
Jun 22 2016
Bruce - how are things if you try uninstalling that extension? Btw, it looks like the latest version is "2015.1204.418.1" so I don't know how much development is being done on it.
,
Jun 22 2016
Disabling the extension solves the problem, so I assume that uninstalling it would also. I'm confused about the version number and the lack of maintenance. The build number makes it look like it hasn't been updated since December 2015, and yet previous bugs in this area ( crbug.com/614192 and crbug.com/612366 ) got fixed recently. It was while verifying that those bugs were fixed that I hit this power regression. And, it sounds like cl/95846902 is a possible change to this extension. Plus, the context switches in this bug seem like they are different/new from what I was seeing before, which would require active development to introduce that power regression. I'm clearly missing something.
,
Jun 22 2016
brucedawson@ we can meet and discuss the extension development now. tommi@ we can meet in two week, if you are still interested.
,
Jun 22 2016
Definitely still interested - I think I'm just as confused as Bruce :)
,
Dec 9 2016
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by tommi@chromium.org
, Jun 16 2016