Potential memory leak in Mojo, mojo::edk::Core::Close |
||||||||
Issue descriptionI'm running an experiment to detect potential memory leaks in Chrome. Example: https://bugs.chromium.org/p/chromium/issues/detail?id=700617 The way I reproduced the potential leak is: 1) Launch chrome % chrome --user-data-dir=<empty profile dir> --enable-heap-profiling 2) Grab a memory dump by navigating to chrome://tracing % disable all categories, turn on "MemoryInfra" category -> Keep the tracing tab open until the end of the process -> Save it on disk 3) Open tons of tabs using a .bat file: % chrome www.perdu.com [run this command about 100 times] 4) Close all tabs, except chrome://tracing (using close all tabs to the right sub-menu] 6) goto 2) until you feel tired of doing it 7) Using a custom python script to make a diff of objects allocated between two memory snapshots, the top potential leaks can be found. Configuration: OS: Win7 and Win10 / 64-bits chrome version: https://chromium.googlesource.com/chromium/src/+/34e20090704e62dbb27e402b116e5a0259c385ee Top Leaked memory blocks by StackFrames: https://drive.google.com/open?id=0B2eVPm4kQyHoblRicmpvQmk5VTg Example: u'malloc\r', u'operator new\r', u"mojo::edk::`...'::CreateMessage<mojo::edk::`...'::IntroductionData>\r", u'mojo::edk::NodeChannel::CreatePortsMessage\r', u'mojo::edk::PortsMessage::PortsMessage\r', u'mojo::edk::NodeController::AllocMessage\r', u'mojo::edk::ports::Node::NewInternalMessage_Helper\r', u'mojo::edk::ports::Node::ClosePort\r', u'mojo::edk::NodeController::ClosePort\r', u'mojo::edk::MessagePipeDispatcher::CloseNoLock\r', u'mojo::edk::MessagePipeDispatcher::Close\r', u'mojo::edk::Core::Close\r', u'mojo::Connector::CloseMessagePipe\r', u'mojo::internal::MultiplexRouter::CloseMessagePipe\r', u'mojo::internal::BindingStateBase::Close\r', u'mojo::BindingSetBase<..., void>::OnConnectionError\r', u'mojo::InterfaceEndpointClient::NotifyError\r', u'mojo::internal::MultiplexRouter::ProcessNotifyErrorTask\r', u'mojo::internal::MultiplexRouter::ProcessTasks\r', u'mojo::internal::MultiplexRouter::OnPipeConnectionError\r', u'mojo::Connector::HandleError\r', u'mojo::SimpleWatcher::OnHandleReady\r', u'mojo::SimpleWatcher::Context::Notify\r', u'mojo::SimpleWatcher::Context::CallNotify\r', u'mojo::edk::WatcherDispatcher::InvokeWatchCallback\r', u'mojo::edk::Watch::InvokeCallback\r', u'mojo::edk::RequestContext::~RequestContext\r', u'mojo::edk::NodeChannel::OnChannelError\r', u'mojo::edk::Channel::OnError\r', u"mojo::edk::`anonymous namespace'::ChannelWin::OnIOCompleted\r", u'base::MessagePumpForIO::DoRunLoop\r', u'base::MessagePumpWin::Run\r', u'base::RunLoop::Run\r', u'content::BrowserThreadImpl::IOThreadRun\r', u'content::BrowserThreadImpl::Run\r', u'base::Thread::ThreadMain\r', u"base::`anonymous namespace'::ThreadFunc\r", u'BaseThreadInitThunk\r', u'RtlUserThreadStart\r', u'[Thread: Chrome_IOThread]' The number of allocated objects is increasing with time. This points to a potential leak when closing a tab. Potential leaks found have this two stack-frames present. Which may a good starting point to figure out where the memory leak is. mojo::edk::ports::Node::ClosePort mojo::edk::Core::Close
,
Mar 24 2017
How confident are you in the methodology here? I sincerely doubt the memory allocated by CreateMessage can ever be leaked. It's always safely scoped, and the only way would be for this[1] logic to somehow cease (causing messages to be queued indefinitely), which would break the entire browser in very obvious ways. [1] https://cs.chromium.org/chromium/src/mojo/edk/system/node_controller.cc?rcl=4b613c4cd7c83a70d4f176bcb6394b2a800bbd4d&l=725
,
Mar 24 2017
This is the first bug we've filed on Windows using native heap profiler. It is possible that there are bugs in the implementation [e.g. maybe we're not correctly tracking frees] *but* we're only observing this for mojo, which is fairly suspicious. Etienne was hypothesizing that the leaks might be associated with tabs that are closed very early in the life cycle [before they get rendered]. > causing messages to be queued indefinitely All messages sent to the browser process, or just messages from a specific renderer?
,
Mar 24 2017
All messages sent to the browser process or even messages transferred between pipes in the browser process. It's a shared queue which is pumped basically any time any kind of Mojo message passing operation occurs in the process.
,
Mar 24 2017
Let's say that we adding logging to the browser process to count total number of constructors and destructors called for Channel::Message. If we find that this number starts to diverge [by the same amount that we're observing in native heap profiling], would that be indicative of a leak of mojo messages?
,
Mar 24 2017
Sure, that would be a trivial way to verify it. And actually I thought of another reasonable way messages could leak. If a message pipe handle is leaked by higher level code (i.e. never closed and never bound), any messages it receives will be queued indefinitely (in a different queue dedicated only to that endpoint) waiting to be read. Such leaks have happened in the past.
,
Mar 27 2017
To reiterate what I said before, this is almost certainly not an internal Mojo leak. However if some Mojo conusmer is leaking a message pipe handle, its that handle's messages are indirectly leaked as well. I have just discovered that content::MessagePort instances appear to leak, and these in turn own some ref-counted state which in turn owns a message pipe handle. The repro case in comment #1 does not seem to use message ports though, so there is probably at least one other similar leak. I'll investigate this further.
,
Mar 27 2017
I've done this patch to take a look to allocated Mojo messages: https://codereview.chromium.org/2772373002/ We are observing an increasing number of messages: [7032:13708:0327/160516.524:ERROR:channel.cc(75)] Message: 1024 [7032] ... [7032:13708:0327/160528.185:ERROR:channel.cc(75)] Message: 2048 [7032] ... [7032:13708:0327/160700.399:ERROR:channel.cc(75)] Message: 4096 [7032] ... [7032:45992:0327/160738.919:ERROR:channel.cc(75)] Message: 5120 [7032] ... [7032:13708:0327/160926.618:ERROR:channel.cc(75)] Message: 9216 [7032] [7032:45992:0327/160926.700:ERROR:channel.cc(75)] Message: 10240 [7032] ... [7032:45992:0327/161016.880:ERROR:channel.cc(75)] Message: 12284 [7032] I want to point out that there is a kind of message that is leaking. As I understand Mojo, I suspect it's the only one to leak (IntroductionData). u"mojo::edk::`...'::CreateMessage<mojo::edk::`...'::IntroductionData>\r",
,
Mar 27 2017
CreateMessage regardless of the message data type produces a single scoped Message object which is always passed immediately to a Channel for writing. If these messages are leaking it would be because the Channel object itself is leaking. That would be quite a surprise.
,
Mar 27 2017
After some further investigation, it doesn't look like any message pipes are leaking. The number of pipe handles in the browser process stays stable over dozens of complete render process lifecycles. I do see the same evidence of a Channel::Message leak, but leaking IntroductionData messages specifically does not make much sense. Still digging. The MessagePort stuff from comment #7 was a red herring btw. Just misinterpreted GC latency.
,
Mar 28 2017
I found the leak. It's this queue: https://cs.chromium.org/chromium/src/mojo/edk/system/node_controller.h?rcl=4967164ec03c538879d6c8131d5e807025b169e4&l=280 This is essentially a race between becoming disconnected from a process, and processing messages which would have otherwise been routed to that process if we were still connected to it. This may be fairly tricky to fix, but I'm looking into it.
,
Mar 28 2017
Nice work! Two thoughts: 1) Can you give us an estimate of the impact of the leak? How many messages are we leaking per life cycle? How large is a message? 2) What do you think of creating a MemoryDumpProvider for Mojo that reports the total number of connections, handles, messages, etc. The net/ team put together a MemoryDumpProvider after the first large network leak earlier this year, and it proved very helpful it diagnosing the next large leak: https://bugs.chromium.org/p/chromium/issues/detail?id=700617#c3
,
Mar 28 2017
Ah, it's not as bad as I thought it would be. Just a minor oversight in the management of that queue. Fix is here: https://codereview.chromium.org/2780433003 I can verify that this seems to completely fix the Message leak while also addressing a handful of tiny related leaks. Please let me know if you see similar results in your tests. Re comment #12: 1) Leaking approximately 8 messages per child process connection. Size varies based on content but it seems most of the messages are small control messages. Ballpark ~64 bytes per message, so ~512 bytes per render process. 2) I don't know what that means but it sounds incredibly useful so please file a bug for it and give me more documentation. :)
,
Mar 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/69726f4ba3947fd4ddbfb07fd9f1a596289cdae9 commit 69726f4ba3947fd4ddbfb07fd9f1a596289cdae9 Author: rockot <rockot@chromium.org> Date: Tue Mar 28 03:38:28 2017 Fix leaks in mojo::edk::NodeController There are a few instances where the broker process (e.g. the browser in process Chrome) tracks some state -- including outgoing messages -- pertaining to a client process connection. In some cases this tracked data is not cleaned up once the client process is disconnected. This fixes that. BUG= 704938 Review-Url: https://codereview.chromium.org/2780433003 Cr-Commit-Position: refs/heads/master@{#460001} [modify] https://crrev.com/69726f4ba3947fd4ddbfb07fd9f1a596289cdae9/mojo/edk/system/node_controller.cc
,
Mar 28 2017
We should merge this fix back. I took some measurements of messages that would have leaked due to this bug. Measurements were taken while browsing several popular sites and just using the browser normally for a while. As it turns out, the simple static HTML repro is not really representative of the average behavior. After 51 render processes terminations: - Message size is always 56 bytes - they were all internal control messages signaling port closure - The number of messages leaked varies wildly from process to process. It is ultimately proportional to the number of messages pipes which were connected to the child process when terminated. - Mean # of leaked messages was 43 per process - Mean total memory leak per process ~2.2 kB - Total leak ~121 kB
,
Mar 29 2017
Your change meets the bar and is auto-approved for M58. Please go ahead and merge the CL to branch 3029 manually. Please contact milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), bhthompson@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Mar 29 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/97716adf5756137e3dfde3526e3d5007e75988a6 commit 97716adf5756137e3dfde3526e3d5007e75988a6 Author: Ken Rockot <rockot@chromium.org> Date: Wed Mar 29 05:55:22 2017 Fix leaks in mojo::edk::NodeController There are a few instances where the broker process (e.g. the browser in process Chrome) tracks some state -- including outgoing messages -- pertaining to a client process connection. In some cases this tracked data is not cleaned up once the client process is disconnected. This fixes that. BUG= 704938 Review-Url: https://codereview.chromium.org/2780433003 Cr-Commit-Position: refs/heads/master@{#460001} (cherry picked from commit 69726f4ba3947fd4ddbfb07fd9f1a596289cdae9) Review-Url: https://codereview.chromium.org/2784773002 . Cr-Commit-Position: refs/branch-heads/3029@{#471} Cr-Branched-From: 939b32ee5ba05c396eef3fd992822fcca9a2e262-refs/heads/master@{#454471} [modify] https://crrev.com/97716adf5756137e3dfde3526e3d5007e75988a6/mojo/edk/system/node_controller.cc
,
Mar 29 2017
This is a pretty long-standing (probably since ~M51) leak, and I guess it's not huge (see comment #15 for details), so it may not be worth a merge to M57. Still adding the label to get a review just in case. Thanks!
,
Mar 29 2017
Thank you rockot@. M57 #57.0.2987.133 went out today and there is no plan M57 release anymore unless something critical comes up. Also per comment #18, this issue exists for a while so rejecting merge to M57. Please let me know if there is any concern here.
,
Mar 29 2017
That all sounds reasonable to me. Thanks for looking! |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by erikc...@chromium.org
, Mar 24 2017