Potential memory leak in UiThreadObserver::OnRTTOrThroughputEstimatesComputed |
||||||||||||
Issue description
I've attached a symbolized trace from a real user where there are 29064 live mallocs attributable to OnRTTOrThroughputEstimatesComputed. Together, they occupy 118MB of memory.
It looks like there are Mojo Messages being created and never destroyed. Based on the code in OnRTTOrThroughputEstimatesComputed:
"""
void OnRTTOrThroughputEstimatesComputed(
const net::nqe::internal::NetworkQuality& network_quality) {
DCHECK_CURRENTLY_ON(BrowserThread::UI);
last_notified_network_quality_ = network_quality;
// Notify all the existing renderers of the change in the network quality.
for (RenderProcessHost::iterator it(RenderProcessHost::AllHostsIterator());
!it.IsAtEnd(); it.Advance()) {
it.GetCurrentValue()->GetRendererInterface()->OnNetworkQualityChanged(
last_notified_type_, last_notified_network_quality_.http_rtt(),
last_notified_network_quality_.transport_rtt(),
last_notified_network_quality_.downstream_throughput_kbps());
}
}
"""
I would guess that there are renderers that are never dequeuing these messages. I think the relationship between RPH and renderers is not exactly a 1:1 correspondence.
rockot, you're the expert here. Thoughts?
,
Jan 22 2018
Upping the priority so this is actually looked at. 118MB is quite a bit.
,
Jan 22 2018
,
Jan 22 2018
From what I understand, there are two problems here: (1) It seems that UiThreadObserver::OnRTTOrThroughputEstimatesComputed() method has been called 116k times. This should not have happened even though the uptime of the browser process is ~26 hours. (2) Calls to UiThreadObserver::OnRTTOrThroughputEstimatesComputed() somehow get stuck due to renderers not dequeuing the message?
,
Jan 22 2018
From https://uma.googleplex.com/p/chrome/timeline_v2/?sid=1008c78c4c5d56714c2af413bae4428e, UiThreadObserver::OnRTTOrThroughputEstimatesComputed() method is called an average of 40-75 times per hour on desktop.
,
Jan 22 2018
Please add affected OSs.
,
Jan 22 2018
,
Jan 22 2018
Some observations. The code in question calls GetRendererInterface(). This has the scary comment: """ // Acquires the |mojom::Renderer| interface to the render process. This is for // internal use only, and is only exposed here to support // MockRenderProcessHost usage in tests. virtual mojom::Renderer* GetRendererInterface() = 0; """ And the implementation has this comment: """ // Note that Channel send is effectively paused and unpaused at various points // during startup, and existing code relies on a fragile relative message // ordering resulting from some early messages being queued until process // launch while others are sent immediately. See https://goo.gl/REW75h for // details. // // We acquire a few associated interface proxies here -- before the channel is // paused -- to ensure that subsequent initialization messages on those // interfaces behave properly. Specifically, this avoids the risk of an // interface being requested while the Channel is paused, which could // effectively and undesirably block the transmission of a subsequent message // on that interface while the Channel is unpaused. // // See OnProcessLaunched() for some additional details of this somewhat // surprising behavior. """ It's possible this comment is out of date. It's also possible that it's not out of the date and all the external callers of GetRendererInterface are playing with fire. rockot, thoughts? What's the appropriate way to send a message to the mojom::Renderer?
,
Jan 22 2018
I wonder if the code in question is missing a conditional to " if (GetProcess()->HasConnection())". If the process doesn't have a connection, will the messages be created and languish in the aether?
,
Jan 22 2018
I can think of two ways in which sender-side queuing is even possible. One is if the OS pipe is full (presumably the renderer's IO thread is hung) and the other is the case you suggest, i.e. where an IPC::Channel remains paused indefinitely. In both cases we should see a diverse array of messages leaking, not just one type of message. Regarding the "renderer not dequeuing" speculation: receivers do not pull messages from the sender and therefore the receiver (i.e. renderer here) has no control over whether or not the message is shipped out from the browser process (modulo the IO thread being hung and not reading internal control messages at all, as stated above). Without more information I'm afraid I'm probably not much help. Can we confirm that either: (a) there are other message types being leaked, or (b) there are other objects being leaked, like an IPC::Channel or a mojo::edk::MessagePipeDispatcher or mojo::edk::ports::Port or... something? anything?
,
Jan 23 2018
I had a cursory look at the dump. I also see FieldTrialSynchronizer::NotifyAllRenderers() with 13 MB malloc. That method is functionally similar to UiThreadObserver::OnRTTOrThroughputEstimatesComputed(). Seems like FieldTrialSynchronizer::NotifyAllRenderers() is also leaking?
,
Jan 23 2018
I am removing the release blocking label. UiThreadObserver::OnRTTOrThroughputEstimatesComputed() has been running 100% in stable since M-61. It is important and critical to fix this, but I do not think it should block releases.
,
Jan 23 2018
,
Jan 31 2018
Based on comment #12 above, it seems that this bug is not specific to UiThreadObserver::OnRTTOrThroughputEstimatesComputed. As such, I am unassigning myself. Please feel free to assign it back to me if my understanding is incorrect.
,
Jan 31 2018
rockot@, you're an IPC owner - what can we do to figure this out?
,
Feb 3 2018
,
Feb 3 2018
,
Feb 7 2018
BTW, landed https://chromium-review.googlesource.com/c/chromium/src/+/902813 as a speculative fix for message leaks. If we can confirm that these leaks are now gone, I can attempt to track down the root cause - presumably a circular ref causing ChannelAssociatedGroupController[1] itself to leak. [1] https://cs.chromium.org/chromium/src/ipc/ipc_mojo_bootstrap.cc?rcl=e87e08cc58764a7cf9324f8b0c04f2f27b6f4048&l=40
,
Feb 7 2018
Serendipitously we now have bug 809311 which also seems to indicate a leak here.
,
Feb 16 2018
I'm going to speculatively mark this as fixed, and reopen if we see more problems.
,
Mar 2 2018
,
Mar 2 2018
Haven't seen any further reports. :)
,
Mar 2 2018
Excellent. I'll remove my evil CHECKs.
,
Mar 6 2018
Unfortunately, this bug just shows up again this morning on our dashboard. Here are some crash reports with the unsymbolized trace: 06a5d9156613efea 4a522d12dc2326bc 561fe381e60f66f7 6b2e59518979d220 0f69ce3443dc1da4 All traces are on Windows with Chrome 66.0.3355.0. I'm attaching the output of the diff_heap_dump json file.
,
Mar 19 2018
Ken, is there anything else you need from Memory folks to make this more actionable?
,
Mar 19 2018
Is there any reason to expect that this is different from bug 813045?
,
Mar 19 2018
|
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by tbansal@chromium.org
, Dec 29 2017