MultiplexRouter::tasks_ deque wastes memory on Android |
|||||
Issue descriptionstd::deque strikes again! In one of our low end device-specific benchmarks MultiplexRouter::OnPipeConnectionError() allocates 1.1 MiB on two paths: 1247.2 KiB (3148) [Thread: Chrome_IOThread] 1247.2 KiB (3148) </system/lib/libc.so> 1247.2 KiB (3148) ThreadFunc 1247.2 KiB (3148) base::Thread::ThreadMain() 1247.2 KiB (3148) content::BrowserThreadImpl::Run(base::RunLoop*) 1247.2 KiB (3148) content::BrowserThreadImpl::IOThreadRun(base::RunLoop*) 1247.2 KiB (3148) base::RunLoop::Run() 1247.2 KiB (3148) base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) 520.0 KiB (1204) base::MessageLoop::DoWork() 520.0 KiB (1204) base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) 520.0 KiB (1204) base::MessageLoop::RunTask(base::PendingTask*) 520.0 KiB (1204) base::debug::TaskAnnotator::RunTask(char const* base::PendingTask*) 504.2 KiB (1100) mojo::SimpleWatcher::OnHandleReady(int unsigned int) 452.4 KiB (226) mojo::Connector::HandleError(bool bool) 452.4 KiB (226) mojo::internal::MultiplexRouter::OnPipeConnectionError() 452.4 KiB (226) std::__ndk1::deque<std::__ndk1::unique_ptr<mojo::internal::MultiplexRouter::Task std::__ndk1::default_delete<mojo::internal::MultiplexRouter::Task> > std::__ndk1::allocator<std::__ndk1::unique_ptr<mojo::internal::MultiplexRouter::Task std::__ndk1::default_delete<mojo::internal::MultiplexRouter::Task> > > >::push_back(std::__ndk1::unique_ptr<mojo::internal::MultiplexRouter::Task std::__ndk1::default_delete<mojo::internal::MultiplexRouter::Task> >&&) 1247.2 KiB (3148) [Thread: Chrome_IOThread] 1247.2 KiB (3148) </system/lib/libc.so> 1247.2 KiB (3148) ThreadFunc 1247.2 KiB (3148) base::Thread::ThreadMain() 1247.2 KiB (3148) content::BrowserThreadImpl::Run(base::RunLoop*) 1247.2 KiB (3148) content::BrowserThreadImpl::IOThreadRun(base::RunLoop*) 1247.2 KiB (3148) base::RunLoop::Run() 1247.2 KiB (3148) base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) 727.1 KiB (1944) event_base_loop 727.1 KiB (1944) base::MessagePumpLibevent::OnLibeventNotification(int short void*) 727.1 KiB (1944) OnFileCanReadWithoutBlocking 727.1 KiB (1944) mojo::edk::Channel::OnReadComplete(unsigned int unsigned int*) 727.1 KiB (1944) mojo::edk::NodeChannel::OnChannelMessage(void const* unsigned int std::__ndk1::unique_ptr<std::__ndk1::vector<mojo::edk::PlatformHandle std::__ndk1::allocator<mojo::edk::PlatformHandle> > mojo::edk::PlatformHandleVectorDeleter>) 727.1 KiB (1944) mojo::edk::RequestContext::~RequestContext() 723.7 KiB (1858) mojo::edk::Watch::InvokeCallback(unsigned int mojo::HandleSignalsState const& unsigned int) 723.7 KiB (1858) mojo::edk::WatcherDispatcher::InvokeWatchCallback(unsigned int unsigned int mojo::HandleSignalsState const& unsigned int) 723.7 KiB (1858) mojo::SimpleWatcher::Context::CallNotify(unsigned int unsigned int MojoHandleSignalsState unsigned int) 723.7 KiB (1858) mojo::SimpleWatcher::Context::Notify(unsigned int MojoHandleSignalsState unsigned int) 723.7 KiB (1858) mojo::SimpleWatcher::OnHandleReady(int unsigned int) 660.6 KiB (330) mojo::Connector::HandleError(bool bool) 660.6 KiB (330) mojo::internal::MultiplexRouter::OnPipeConnectionError() 660.6 KiB (330) std::__ndk1::deque<std::__ndk1::unique_ptr<mojo::internal::MultiplexRouter::Task std::__ndk1::default_delete<mojo::internal::MultiplexRouter::Task> > std::__ndk1::allocator<std::__ndk1::unique_ptr<mojo::internal::MultiplexRouter::Task std::__ndk1::default_delete<mojo::internal::MultiplexRouter::Task> > > >::push_back(std::__ndk1::unique_ptr<mojo::internal::MultiplexRouter::Task std::__ndk1::default_delete<mojo::internal::MultiplexRouter::Task> >&&) This is because libc implementation of std::deque is very memory-inefficient, and allocates minimum 4KiB, see issue 674287. In this case we can see that 1113.0KiB was allocated in 556 allocations, i.e. 2KiB per allocation. That's because std::deque actually allocates twice: once for 4KiB block and second time for block pointer array. I.e. in this case we have 278 unique MultiplexRouters. Please consider changing std::deque to std::vector in both tasks_ and sync_message_tasks_. It doesn't seem that MultiplexRouter relies on deque's pointer stability, so the change is safe.
,
May 9 2017
A MultiplexRouter leak likely implies a binding endpoint leak, i.e. something who owns an InterfacePtr or Binding is itself leaking. Killing use of std::deque seems like a good idea in any case.
,
May 9 2017
I am happy to make it consume less memory. But maybe std::vector is not a great choice? The most common operations on these data structures are pushing to the end and popping the front.
,
May 9 2017
Yes, but the payload is either unique_ptr or regular pointers - i.e. things that are cheap to move around. Regarding endpoint leaks - is there a place where number of endpoints is known? I'm thinking of creating mojo MemoryDumpProvider to report that number so we could catch leaks in slow reports (go/slow-memory-reports). Are there other interesting mojo-specific numbers we can report there?
,
May 9 2017
Maybe as a quick fix we could use std::unique_ptr<std::deque<...>> because in the majority case: non-associated interfaces without sync messages, there is no need to have a queue. And then we could take our time to measure whether std::vector is a better alternative. (or maybe std::list?) > Regarding endpoint leaks - is there a place where number of endpoints is known? There is not a central place keeping track of this at the moment. We could track the number by maintaining a counter if necessary. > I'm thinking of creating mojo MemoryDumpProvider to report that number so we could catch leaks in slow reports (go/slow-memory-reports). Are there other interesting mojo-specific numbers we can report there? Will think about it and get back to you later. Ken: Do you have any good suggestions?
,
May 9 2017
std::unique_ptr<std::deque<...>> won't help, because the problem is that std::deque allocates 4 KiB to hold a single object, and it doesn't allocate until that. (BTW libstdc++ implementation (Linux) of std::deque allocates block in the ctor, but its block size is just 512 bytes). std::list<> will allocate/free each node, which will outperformance std::vector only on large number of elements (this needs benchmarking). But there is base::LinkedList, which I think fits better (almost perfectly). The only catch is lifetime management - you'll need to manually delete tasks after calling RemoveFromList() on them.
,
May 9 2017
Thanks! Chatted with Brett and Ken. We agreed that it seems a good idea to implement a deque (base::Deque?) that is backed by a std::vector. The std::vector is used as a circular buffer to reduce copying/moving and expands when necessary. It shouldn't be too complex of a change.
,
May 9 2017
I suggest base::RingBuffer or something, "deque" means pointer stability (to me at least). Are you going to implement the whole thing with iterators, etc., or something minimal (push_back, pop_front and the like)?
,
May 9 2017
After a second thought, maybe I should implement a less generic thing for MultiplexRouter first, and then worry about generalizing it to be a base utility. Then we can take our time considering the proper API and making sure it works efficiently in all cases.
,
May 9 2017
Yeah, it will be useful without iterators - half of std::deque memory comes from std::queue / std::stack which use std::deque by default, but they only use a small subset of methods. I.e. if we had base::RingBuffer, we could create base::Queue and base::Stack on top of it (specializing std::queue / std::stack), and use these to improve memory usage in places that currently use std::queue / std::stack.
,
May 10 2017
,
May 10 2017
Starting with a less generic thing for MultiplexRouter sounds good, but we should anyway aim at removing std::deque, std::queue and std::stack from the code base (i.e., we need to implement their equivalents in base/).
,
May 15 2017
,
May 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7a6de50a928be5382714446119f704df13973205 commit 7a6de50a928be5382714446119f704df13973205 Author: yzshen <yzshen@chromium.org> Date: Thu May 25 02:23:04 2017 Mojo C++ bindings: introduce mojo::internal::deque and use it in MultiplexRouter. The reason is that std::deque is memory inefficient. BUG= 719774 Review-Url: https://codereview.chromium.org/2900543002 Cr-Commit-Position: refs/heads/master@{#474527} [modify] https://crrev.com/7a6de50a928be5382714446119f704df13973205/mojo/public/cpp/bindings/BUILD.gn [add] https://crrev.com/7a6de50a928be5382714446119f704df13973205/mojo/public/cpp/bindings/lib/deque.h [modify] https://crrev.com/7a6de50a928be5382714446119f704df13973205/mojo/public/cpp/bindings/lib/multiplex_router.h [modify] https://crrev.com/7a6de50a928be5382714446119f704df13973205/mojo/public/cpp/bindings/tests/BUILD.gn [add] https://crrev.com/7a6de50a928be5382714446119f704df13973205/mojo/public/cpp/bindings/tests/deque_unittest.cc
,
Jun 1 2017
I think this is looking really good so far. Seeing a 500KiB drop in median on Memory.Browser.Large2 in the range where this CL was submitted: https://uma.googleplex.com/timeline_v2?sid=d63d1207b904d291bbf4a6b71f805b1e.
,
Jun 2 2017
Wow, good to know! :) Thanks for the number.
,
Jun 15 2017
Issue 651103 has been merged into this issue.
,
Jul 17 2017
Is there anything left to do on this issue?
,
Jul 17 2017
I don't think so. Brett is working on a more general deque solution in base/. We could migrate to that after it is ready. But that could be considered a separate issue. Marked as fixed. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by dskiba@chromium.org
, May 9 2017