New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 719774 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 3
Type: Bug

Blocking:
issue 674287



Sign in to add a comment

MultiplexRouter::tasks_ deque wastes memory on Android

Project Member Reported by dskiba@chromium.org, May 9 2017

Issue description

std::deque strikes again!

In one of our low end device-specific benchmarks MultiplexRouter::OnPipeConnectionError() allocates 1.1 MiB on two paths:

1247.2 KiB (3148) [Thread: Chrome_IOThread]
1247.2 KiB (3148) </system/lib/libc.so>
1247.2 KiB (3148) ThreadFunc
1247.2 KiB (3148) base::Thread::ThreadMain()
1247.2 KiB (3148) content::BrowserThreadImpl::Run(base::RunLoop*)
1247.2 KiB (3148) content::BrowserThreadImpl::IOThreadRun(base::RunLoop*)
1247.2 KiB (3148) base::RunLoop::Run()
1247.2 KiB (3148) base::MessagePumpLibevent::Run(base::MessagePump::Delegate*)
520.0 KiB (1204) base::MessageLoop::DoWork()
520.0 KiB (1204) base::MessageLoop::DeferOrRunPendingTask(base::PendingTask)
520.0 KiB (1204) base::MessageLoop::RunTask(base::PendingTask*)
520.0 KiB (1204) base::debug::TaskAnnotator::RunTask(char const* base::PendingTask*)
504.2 KiB (1100) mojo::SimpleWatcher::OnHandleReady(int unsigned int)
452.4 KiB (226) mojo::Connector::HandleError(bool bool)
452.4 KiB (226) mojo::internal::MultiplexRouter::OnPipeConnectionError()
452.4 KiB (226) std::__ndk1::deque<std::__ndk1::unique_ptr<mojo::internal::MultiplexRouter::Task std::__ndk1::default_delete<mojo::internal::MultiplexRouter::Task> > std::__ndk1::allocator<std::__ndk1::unique_ptr<mojo::internal::MultiplexRouter::Task std::__ndk1::default_delete<mojo::internal::MultiplexRouter::Task> > > >::push_back(std::__ndk1::unique_ptr<mojo::internal::MultiplexRouter::Task std::__ndk1::default_delete<mojo::internal::MultiplexRouter::Task> >&&)

1247.2 KiB (3148) [Thread: Chrome_IOThread]
1247.2 KiB (3148) </system/lib/libc.so>
1247.2 KiB (3148) ThreadFunc
1247.2 KiB (3148) base::Thread::ThreadMain()
1247.2 KiB (3148) content::BrowserThreadImpl::Run(base::RunLoop*)
1247.2 KiB (3148) content::BrowserThreadImpl::IOThreadRun(base::RunLoop*)
1247.2 KiB (3148) base::RunLoop::Run()
1247.2 KiB (3148) base::MessagePumpLibevent::Run(base::MessagePump::Delegate*)
727.1 KiB (1944) event_base_loop
727.1 KiB (1944) base::MessagePumpLibevent::OnLibeventNotification(int short void*)
727.1 KiB (1944) OnFileCanReadWithoutBlocking
727.1 KiB (1944) mojo::edk::Channel::OnReadComplete(unsigned int unsigned int*)
727.1 KiB (1944) mojo::edk::NodeChannel::OnChannelMessage(void const* unsigned int std::__ndk1::unique_ptr<std::__ndk1::vector<mojo::edk::PlatformHandle std::__ndk1::allocator<mojo::edk::PlatformHandle> > mojo::edk::PlatformHandleVectorDeleter>)
727.1 KiB (1944) mojo::edk::RequestContext::~RequestContext()
723.7 KiB (1858) mojo::edk::Watch::InvokeCallback(unsigned int mojo::HandleSignalsState const& unsigned int)
723.7 KiB (1858) mojo::edk::WatcherDispatcher::InvokeWatchCallback(unsigned int unsigned int mojo::HandleSignalsState const& unsigned int)
723.7 KiB (1858) mojo::SimpleWatcher::Context::CallNotify(unsigned int unsigned int MojoHandleSignalsState unsigned int)
723.7 KiB (1858) mojo::SimpleWatcher::Context::Notify(unsigned int MojoHandleSignalsState unsigned int)
723.7 KiB (1858) mojo::SimpleWatcher::OnHandleReady(int unsigned int)
660.6 KiB (330) mojo::Connector::HandleError(bool bool)
660.6 KiB (330) mojo::internal::MultiplexRouter::OnPipeConnectionError()
660.6 KiB (330) std::__ndk1::deque<std::__ndk1::unique_ptr<mojo::internal::MultiplexRouter::Task std::__ndk1::default_delete<mojo::internal::MultiplexRouter::Task> > std::__ndk1::allocator<std::__ndk1::unique_ptr<mojo::internal::MultiplexRouter::Task std::__ndk1::default_delete<mojo::internal::MultiplexRouter::Task> > > >::push_back(std::__ndk1::unique_ptr<mojo::internal::MultiplexRouter::Task std::__ndk1::default_delete<mojo::internal::MultiplexRouter::Task> >&&)

This is because libc implementation of std::deque is very memory-inefficient, and allocates minimum 4KiB, see issue 674287.

In this case we can see that 1113.0KiB was allocated in 556 allocations, i.e. 2KiB per allocation. That's because std::deque actually allocates twice: once for 4KiB block and second time for block pointer array. I.e. in this case we have 278 unique MultiplexRouters.

Please consider changing std::deque to std::vector in both tasks_ and sync_message_tasks_. It doesn't seem that MultiplexRouter relies on deque's pointer stability, so the change is safe.
 
In a different test OnPipeConnectionError() allocated 3.7 MiB: go/vsjkk

I wonder if we're leaking MultiplexRouters.
Cc: yzshen@chromium.org
A MultiplexRouter leak likely implies a binding endpoint leak, i.e. something who owns an InterfacePtr or Binding is itself leaking.

Killing use of std::deque seems like a good idea in any case.
I am happy to make it consume less memory. But maybe std::vector is not a great choice? The most common operations on these data structures are pushing to the end and popping the front.
Yes, but the payload is either unique_ptr or regular pointers - i.e. things that are cheap to move around.

Regarding endpoint leaks - is there a place where number of endpoints is known? I'm thinking of creating mojo MemoryDumpProvider to report that number so we could catch leaks in slow reports (go/slow-memory-reports). Are there other interesting mojo-specific numbers we can report there?
Components: -Internals>Mojo Internals>Mojo>Bindings
Owner: yzshen@chromium.org
Status: Assigned (was: Untriaged)
Maybe as a quick fix we could use std::unique_ptr<std::deque<...>> because in the majority case: non-associated interfaces without sync messages, there is no need to have a queue.

And then we could take our time to measure whether std::vector is a better alternative. (or maybe std::list?)

> Regarding endpoint leaks - is there a place where number of endpoints is known?
There is not a central place keeping track of this at the moment. We could track the number by maintaining a counter if necessary.


> I'm thinking of creating mojo MemoryDumpProvider to report that number so we could catch leaks in slow reports (go/slow-memory-reports). Are there other interesting mojo-specific numbers we can report there?

Will think about it and get back to you later.

Ken: Do you have any good suggestions?
std::unique_ptr<std::deque<...>> won't help, because the problem is that std::deque allocates 4 KiB to hold a single object, and it doesn't allocate until that. (BTW libstdc++ implementation (Linux) of std::deque allocates block in the ctor, but its block size is just 512 bytes).

std::list<> will allocate/free each node, which will outperformance std::vector only on large number of elements (this needs benchmarking).

But there is base::LinkedList, which I think fits better (almost perfectly). The only catch is lifetime management - you'll need to manually delete tasks after calling RemoveFromList() on them.
Thanks!

Chatted with Brett and Ken. We agreed that it seems a good idea to implement a deque (base::Deque?) that is backed by a std::vector. The std::vector is used as a circular buffer to reduce copying/moving and expands when necessary.

It shouldn't be too complex of a change.
I suggest base::RingBuffer or something, "deque" means pointer stability (to me at least).

Are you going to implement the whole thing with iterators, etc., or something minimal (push_back, pop_front and the like)?
After a second thought, maybe I should implement a less generic thing for MultiplexRouter first, and then worry about generalizing it to be a base utility. Then we can take our time considering the proper API and making sure it works efficiently in all cases.
Yeah, it will be useful without iterators - half of std::deque memory comes from std::queue / std::stack which use std::deque by default, but they only use a small subset of methods.

I.e. if we had base::RingBuffer, we could create base::Queue and base::Stack on top of it (specializing std::queue / std::stack), and use these to improve memory usage in places that currently use std::queue / std::stack.
Cc: yutak@chromium.org
Starting with a less generic thing for MultiplexRouter sounds good, but we should anyway aim at removing std::deque, std::queue and std::stack from the code base (i.e., we need to implement their equivalents in base/).


Comment 13 by yutak@chromium.org, May 15 2017

Blocking: 674287
See Issue 674287, too.
I think this is looking really good so far. Seeing a 500KiB drop in median on Memory.Browser.Large2 in the range where this CL was submitted: https://uma.googleplex.com/timeline_v2?sid=d63d1207b904d291bbf4a6b71f805b1e.
Wow, good to know! :) Thanks for the number.
 Issue 651103  has been merged into this issue.
Is there anything left to do on this issue?
Status: Fixed (was: Assigned)
I don't think so.

Brett is working on a more general deque solution in base/. We could migrate to that after it is ready. But that could be considered a separate issue.

Marked as fixed.


Sign in to add a comment