base::RandBytes() takes up quite a significant amount of time during message pipe creation |
||
Issue descriptionI modified the testing params of "MultiThreadInterfacePassing/1" to test 10000 in-process calls, each carrying 8 interface requests: out/PerfRelease/ipc_perftests -test_filter=*MultiThreadInterfacePassing/1 It showed that 10.16% of the entire test is spent on base::RandBytes() in mojo::edk::ports::Node::CreateUninitializedPort(). Please see the attached pprof result. This is tested on Linux.
,
Aug 9 2017
I don't think there's any risk to maintaining an internal pool. I am curious how often this is going to be an issue outside of microbenchmarks though.
,
Aug 9 2017
RE #2: We have seen perf regression with some loading.desktop test cases when switching URLLoader associated interface to non-associated interface (i.e., using a separate message pipe). It is not uncommon to have dozens or more of URL requests for each page load. It seems possible that the regression is caused by the extra cost of setting up message pipes.
,
Aug 9 2017
In a quick empirical test of reading 50 MB from /dev/urandom in a tight loop, the work done is reduced by about 60-70% if we read 4 kB chunks instead of 16 byte chunks. The benefit seems to disappear once we reach page size though, and there's no difference between 4 kB reads vs, e.g., 8 kB or 16 kB or 1 MB reads. I'm observing a single navigation to youtube.com resulting in thousands of calls to RandBytes in the browser process, mostly to get 8 or 16 random bytes. The render process is constantly spamming RandBytes for something that's 16 bytes (though apparently -not- message pipe creation). Maybe it's actually worth doing this caching in the POSIX RandBytes implementation. WDYT?
,
Aug 9 2017
Seems reasonable to me, though I guess we'll need per-thread caches.
,
Aug 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1dae18b210f14028a6fcea2343191bada2a30eb0 commit 1dae18b210f14028a6fcea2343191bada2a30eb0 Author: Ken Rockot <rockot@chromium.org> Date: Thu Aug 10 20:32:26 2017 Mojo: Optimize random port name generation Maintain a thread-safe cache of randomly generated port names from which to draw when new names are needed. This amortizes the per-call cost of base::RandBytes, which is otherwise called at very high frequency when many pipes are being created. MessagePipeTest.ClosePipesStressTest creates lots of pipes in a tight loop. On POSIX this change reduces the running time of that test by ~20%. On Windows there is also a marginal reduction of about ~4%, so this caching behavior is used unconditionally on all platforms. BUG= 754004 Change-Id: I3ad0cdc2705dab29aa2cb8aaaab77e7416400eb4 Reviewed-on: https://chromium-review.googlesource.com/610444 Commit-Queue: Ken Rockot <rockot@chromium.org> Reviewed-by: Yuzhu Shen <yzshen@chromium.org> Cr-Commit-Position: refs/heads/master@{#493522} [modify] https://crrev.com/1dae18b210f14028a6fcea2343191bada2a30eb0/mojo/edk/system/ports/node.cc
,
Aug 14 2017
Just to persist ongoing discussion around this, jschuh@ has also recommended the original approach taken by https://chromium-review.googlesource.com/c/609442. That approach has two problems: 1) unbounded lifetime of TLS caches, and 2) cache duplication on zygote fork. #1 could be solved using thread_local storage duration, but see https://groups.google.com/a/chromium.org/forum/#!topic/cxx/h7O5BdtWCZw #2 can be addressed by flushing the cache with explicit half-page-sized calls to base::RandBytes immediately before forking, which is subtle and fragile but possibly OK. It's still unclear whether thread_local can be relied upon.
,
Aug 21 2017
Calling this fixed in any case, since the concern here was with Mojo performance. We can revisit base::RandBytes separately if there's sufficient motivation. |
||
►
Sign in to add a comment |
||
Comment 1 by roc...@chromium.org
, Aug 9 2017Owner: roc...@chromium.org
Status: Assigned (was: Untriaged)