New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 704938 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
please use my google.com address
Closed: Mar 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 3
Type: Bug



Sign in to add a comment

Potential memory leak in Mojo, mojo::edk::Core::Close

Project Member Reported by etienneb@chromium.org, Mar 24 2017

Issue description

I'm running an experiment to detect potential memory leaks in Chrome.
  Example: https://bugs.chromium.org/p/chromium/issues/detail?id=700617

The way I reproduced the potential leak is:
  1) Launch chrome
      % chrome --user-data-dir=<empty profile dir>  --enable-heap-profiling
  2) Grab a memory dump by navigating to chrome://tracing
      % disable all categories, turn on "MemoryInfra" category
      -> Keep the tracing tab open until the end of the process
      -> Save it on disk
  3) Open tons of tabs using a .bat file:
      % chrome www.perdu.com   [run this command about 100 times]
  4) Close all tabs, except chrome://tracing (using close all tabs to the right sub-menu]
  6) goto 2) until you feel tired of doing it
  7) Using a custom python script to make a diff of objects allocated between two memory snapshots, the top potential leaks can be found.


Configuration:
  OS: Win7 and Win10  /  64-bits
  chrome version: https://chromium.googlesource.com/chromium/src/+/34e20090704e62dbb27e402b116e5a0259c385ee


Top Leaked memory blocks by StackFrames:
  https://drive.google.com/open?id=0B2eVPm4kQyHoblRicmpvQmk5VTg

Example:
      u'malloc\r',
      u'operator new\r',
      u"mojo::edk::`...'::CreateMessage<mojo::edk::`...'::IntroductionData>\r",
      u'mojo::edk::NodeChannel::CreatePortsMessage\r',
      u'mojo::edk::PortsMessage::PortsMessage\r',
      u'mojo::edk::NodeController::AllocMessage\r',
      u'mojo::edk::ports::Node::NewInternalMessage_Helper\r',
      u'mojo::edk::ports::Node::ClosePort\r',
      u'mojo::edk::NodeController::ClosePort\r',
      u'mojo::edk::MessagePipeDispatcher::CloseNoLock\r',
      u'mojo::edk::MessagePipeDispatcher::Close\r',
      u'mojo::edk::Core::Close\r',
      u'mojo::Connector::CloseMessagePipe\r',
      u'mojo::internal::MultiplexRouter::CloseMessagePipe\r',
      u'mojo::internal::BindingStateBase::Close\r',
      u'mojo::BindingSetBase<..., void>::OnConnectionError\r',
      u'mojo::InterfaceEndpointClient::NotifyError\r',
      u'mojo::internal::MultiplexRouter::ProcessNotifyErrorTask\r',
      u'mojo::internal::MultiplexRouter::ProcessTasks\r',
      u'mojo::internal::MultiplexRouter::OnPipeConnectionError\r',
      u'mojo::Connector::HandleError\r',
      u'mojo::SimpleWatcher::OnHandleReady\r',
      u'mojo::SimpleWatcher::Context::Notify\r',
      u'mojo::SimpleWatcher::Context::CallNotify\r',
      u'mojo::edk::WatcherDispatcher::InvokeWatchCallback\r',
      u'mojo::edk::Watch::InvokeCallback\r',
      u'mojo::edk::RequestContext::~RequestContext\r',
      u'mojo::edk::NodeChannel::OnChannelError\r',
      u'mojo::edk::Channel::OnError\r',
      u"mojo::edk::`anonymous namespace'::ChannelWin::OnIOCompleted\r",
      u'base::MessagePumpForIO::DoRunLoop\r',
      u'base::MessagePumpWin::Run\r',
      u'base::RunLoop::Run\r',
      u'content::BrowserThreadImpl::IOThreadRun\r',
      u'content::BrowserThreadImpl::Run\r',
      u'base::Thread::ThreadMain\r',
      u"base::`anonymous namespace'::ThreadFunc\r",
      u'BaseThreadInitThunk\r',
      u'RtlUserThreadStart\r',
      u'[Thread: Chrome_IOThread]'


The number of allocated objects is increasing with time. This points to a potential leak when closing a tab.

Potential leaks found have this two stack-frames present. Which may a good starting point to figure out where the memory leak is.

      mojo::edk::ports::Node::ClosePort
      mojo::edk::Core::Close

 
Cc: roc...@chromium.org

Comment 2 by roc...@chromium.org, Mar 24 2017

How confident are you in the methodology here?

I sincerely doubt the memory allocated by CreateMessage can ever be leaked. It's always safely scoped, and the only way would be for this[1] logic to somehow cease (causing messages to be queued indefinitely), which would break the entire browser in very obvious ways.

[1] https://cs.chromium.org/chromium/src/mojo/edk/system/node_controller.cc?rcl=4b613c4cd7c83a70d4f176bcb6394b2a800bbd4d&l=725
This is the first bug we've filed on Windows using native heap profiler. It is possible that there are bugs in the implementation [e.g. maybe we're not correctly tracking frees] *but* we're only observing this for mojo,  which is fairly suspicious. 

Etienne was hypothesizing that the leaks might be associated with tabs that are closed very early in the life cycle [before they get rendered]. 

> causing messages to be queued indefinitely
All messages sent to the browser process, or just messages from a specific renderer?

Comment 4 by roc...@chromium.org, Mar 24 2017

All messages sent to the browser process or even messages transferred between pipes in the browser process. It's a shared queue which is pumped basically any time any kind of Mojo message passing operation occurs in the process.
Let's say that we adding logging to the browser process to count total number of constructors and destructors called for Channel::Message. If we find that this number starts to diverge [by the same amount that we're observing in native heap profiling], would that be indicative of a leak of mojo messages?

Comment 6 by roc...@chromium.org, Mar 24 2017

Sure, that would be a trivial way to verify it.

And actually I thought of another reasonable way messages could leak. If a message pipe handle is leaked by higher level code (i.e. never closed and never bound), any messages it receives will be queued indefinitely (in a different queue dedicated only to that endpoint) waiting to be read.

Such leaks have happened in the past.

Comment 7 by roc...@chromium.org, Mar 27 2017

Cc: -roc...@chromium.org
Components: -Internals>Mojo
Owner: roc...@chromium.org
Status: Assigned (was: Untriaged)
To reiterate what I said before, this is almost certainly not an internal Mojo leak. However if some Mojo conusmer is leaking a message pipe handle, its that handle's messages are indirectly leaked as well.

I have just discovered that content::MessagePort instances appear to leak, and these in turn own some ref-counted state which in turn owns a message pipe handle.

The repro case in comment #1 does not seem to use message ports though, so there is probably at least one other similar leak. I'll investigate this further.
I've done this patch to take a look to allocated Mojo messages:
  https://codereview.chromium.org/2772373002/

We are observing an increasing number of messages:

[7032:13708:0327/160516.524:ERROR:channel.cc(75)] Message: 1024 [7032]
...
[7032:13708:0327/160528.185:ERROR:channel.cc(75)] Message: 2048 [7032]
...
[7032:13708:0327/160700.399:ERROR:channel.cc(75)] Message: 4096 [7032]
...
[7032:45992:0327/160738.919:ERROR:channel.cc(75)] Message: 5120 [7032]
...
[7032:13708:0327/160926.618:ERROR:channel.cc(75)] Message: 9216 [7032]
[7032:45992:0327/160926.700:ERROR:channel.cc(75)] Message: 10240 [7032]
...
[7032:45992:0327/161016.880:ERROR:channel.cc(75)] Message: 12284 [7032]


I want to point out that there is a kind of message that is leaking.
As I understand Mojo, I suspect it's the only one to leak (IntroductionData).

   u"mojo::edk::`...'::CreateMessage<mojo::edk::`...'::IntroductionData>\r",

Comment 9 by roc...@chromium.org, Mar 27 2017

CreateMessage regardless of the message data type produces a single scoped Message object which is always passed immediately to a Channel for writing.

If these messages are leaking it would be because the Channel object itself is leaking. That would be quite a surprise.
After some further investigation, it doesn't look like any message pipes are leaking. The number of pipe handles in the browser process stays stable over dozens of complete render process lifecycles.

I do see the same evidence of a Channel::Message leak, but leaking IntroductionData messages specifically does not make much sense. Still digging.

The MessagePort stuff from comment #7 was a red herring btw. Just misinterpreted GC latency.
I found the leak.

It's this queue: https://cs.chromium.org/chromium/src/mojo/edk/system/node_controller.h?rcl=4967164ec03c538879d6c8131d5e807025b169e4&l=280

This is essentially a race between becoming disconnected from a process, and processing messages which would have otherwise been routed to that process if we were still connected to it.

This may be fairly tricky to fix, but I'm looking into it.
Cc: etienneb@chromium.org
Nice work! Two thoughts:

1) Can you give us an estimate of the impact of the leak? How many messages are we leaking per life cycle? How large is a message?

2) What do you think of creating a MemoryDumpProvider for Mojo that reports the total number of connections, handles, messages, etc. The net/ team put together a MemoryDumpProvider after the first large network leak earlier this year, and it proved very helpful it diagnosing the next large leak:
https://bugs.chromium.org/p/chromium/issues/detail?id=700617#c3
Ah, it's not as bad as I thought it would be. Just a minor oversight in the management of that queue.

Fix is here: https://codereview.chromium.org/2780433003

I can verify that this seems to completely fix the Message leak while also addressing a handful of tiny related leaks. Please let me know if you see similar results in your tests.

Re comment #12:

1) Leaking approximately 8 messages per child process connection. Size varies based on content but it seems most of the messages are small control messages. Ballpark ~64 bytes per message, so ~512 bytes per render process.

2) I don't know what that means but it sounds incredibly useful so please file a bug for it and give me more documentation. :)
Project Member

Comment 14 by bugdroid1@chromium.org, Mar 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/69726f4ba3947fd4ddbfb07fd9f1a596289cdae9

commit 69726f4ba3947fd4ddbfb07fd9f1a596289cdae9
Author: rockot <rockot@chromium.org>
Date: Tue Mar 28 03:38:28 2017

Fix leaks in mojo::edk::NodeController

There are a few instances where the broker process (e.g. the browser in
process Chrome) tracks some state -- including outgoing messages --
pertaining to a client process connection. In some cases this tracked data
is not cleaned up once the client process is disconnected. This fixes that.

BUG= 704938 

Review-Url: https://codereview.chromium.org/2780433003
Cr-Commit-Position: refs/heads/master@{#460001}

[modify] https://crrev.com/69726f4ba3947fd4ddbfb07fd9f1a596289cdae9/mojo/edk/system/node_controller.cc

Labels: Merge-Request-58
Status: Fixed (was: Assigned)
We should merge this fix back.

I took some measurements of messages that would have leaked due to this bug. Measurements were taken while browsing several popular sites and just using the browser normally for a while. As it turns out, the simple static HTML repro is not really representative of the average behavior.

After 51 render processes terminations:

 - Message size is always 56 bytes - they were all internal control messages
   signaling port closure
 - The number of messages leaked varies wildly from process to process.
   It is ultimately proportional to the number of messages pipes which were
   connected to the child process when terminated.
 - Mean # of leaked messages was 43 per process
 - Mean total memory leak per process ~2.2 kB
 - Total leak ~121 kB

Project Member

Comment 16 by sheriffbot@chromium.org, Mar 29 2017

Labels: -Merge-Request-58 Hotlist-Merge-Approved Merge-Approved-58
Your change meets the bar and is auto-approved for M58. Please go ahead and merge the CL to branch 3029 manually. Please contact milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), bhthompson@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 17 by bugdroid1@chromium.org, Mar 29 2017

Labels: -merge-approved-58 merge-merged-3029
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/97716adf5756137e3dfde3526e3d5007e75988a6

commit 97716adf5756137e3dfde3526e3d5007e75988a6
Author: Ken Rockot <rockot@chromium.org>
Date: Wed Mar 29 05:55:22 2017

Fix leaks in mojo::edk::NodeController

There are a few instances where the broker process (e.g. the browser in
process Chrome) tracks some state -- including outgoing messages --
pertaining to a client process connection. In some cases this tracked data
is not cleaned up once the client process is disconnected. This fixes that.

BUG= 704938 

Review-Url: https://codereview.chromium.org/2780433003
Cr-Commit-Position: refs/heads/master@{#460001}
(cherry picked from commit 69726f4ba3947fd4ddbfb07fd9f1a596289cdae9)

Review-Url: https://codereview.chromium.org/2784773002 .
Cr-Commit-Position: refs/branch-heads/3029@{#471}
Cr-Branched-From: 939b32ee5ba05c396eef3fd992822fcca9a2e262-refs/heads/master@{#454471}

[modify] https://crrev.com/97716adf5756137e3dfde3526e3d5007e75988a6/mojo/edk/system/node_controller.cc

Labels: Merge-Request-57
This is a pretty long-standing (probably since ~M51) leak, and I guess it's not huge (see comment #15 for details), so it may not be worth a merge to M57. Still adding the label to get a review just in case. Thanks!
Labels: -Merge-Request-57 Merge-Rejected-57
Thank you  rockot@. 
M57 #57.0.2987.133 went out today and there is no plan M57 release anymore unless something critical comes up. 

Also per comment #18, this issue exists for a while so rejecting merge to M57. Please let me know if there is any concern here.
That all sounds reasonable to me. Thanks for looking!

Sign in to add a comment