New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 754004 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
please use my google.com address
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 3
Type: Bug

Blocking:
issue 750187



Sign in to add a comment

base::RandBytes() takes up quite a significant amount of time during message pipe creation

Project Member Reported by yzshen@chromium.org, Aug 9 2017

Issue description


I modified the testing params of "MultiThreadInterfacePassing/1" to test 10000 in-process calls, each carrying 8 interface requests:

out/PerfRelease/ipc_perftests -test_filter=*MultiThreadInterfacePassing/1

It showed that 10.16% of the entire test is spent on base::RandBytes() in mojo::edk::ports::Node::CreateUninitializedPort().

Please see the attached pprof result. This is tested on Linux.


 
profile001.pdf
53.3 KB Download
Cc: -roc...@chromium.org jsc...@chromium.org dcheng@chromium.org
Owner: roc...@chromium.org
Status: Assigned (was: Untriaged)
Aha. Yeah on POSIX RandBytes() is just going to read /dev/urandom.

I think maybe we should maintain an internal queue of random port names and refill it in bulk rather than issuing a call to RandBytes every time we need a new 128-bit sequence.

Adding some security folks: is there any real risk to this? Like making addresses easier to guess?

For context, when we create a new message pipe we generate two new psuedo-random 128-bit addresses. We also generate a new random address any time an endpoint is sent across a process boundary. These operations are apparently a drag on performance if we do it such a high frequency as is common in microbenchmarks.
I don't think there's any risk to maintaining an internal pool. I am curious how often this is going to be an issue outside of microbenchmarks though.
RE #2:
We have seen perf regression with some loading.desktop test cases when switching URLLoader associated interface to non-associated interface (i.e., using a separate message pipe).

It is not uncommon to have dozens or more of URL requests for each page load. It seems possible that the regression is caused by the extra cost of setting up message pipes.

In a quick empirical test of reading 50 MB from /dev/urandom in a tight loop, the work done is reduced by about 60-70% if we read 4 kB chunks instead of 16 byte chunks. The benefit seems to disappear once we reach page size though, and there's no difference between 4 kB reads vs, e.g., 8 kB or 16 kB or 1 MB reads.

I'm observing a single navigation to youtube.com resulting in thousands of calls to RandBytes in the browser process, mostly to get 8 or 16 random bytes. The render process is constantly spamming RandBytes for something that's 16 bytes (though apparently -not- message pipe creation).

Maybe it's actually worth doing this caching in the POSIX RandBytes implementation. WDYT?

Seems reasonable to me, though I guess we'll need per-thread caches.
Project Member

Comment 6 by bugdroid1@chromium.org, Aug 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1dae18b210f14028a6fcea2343191bada2a30eb0

commit 1dae18b210f14028a6fcea2343191bada2a30eb0
Author: Ken Rockot <rockot@chromium.org>
Date: Thu Aug 10 20:32:26 2017

Mojo: Optimize random port name generation

Maintain a thread-safe cache of randomly generated port names from which
to draw when new names are needed. This amortizes the per-call cost of
base::RandBytes, which is otherwise called at very high frequency when
many pipes are being created.

MessagePipeTest.ClosePipesStressTest creates lots of pipes in a tight
loop. On POSIX this change reduces the running time of that test by
~20%.

On Windows there is also a marginal reduction of about ~4%, so this
caching behavior is used unconditionally on all platforms.

BUG= 754004 

Change-Id: I3ad0cdc2705dab29aa2cb8aaaab77e7416400eb4
Reviewed-on: https://chromium-review.googlesource.com/610444
Commit-Queue: Ken Rockot <rockot@chromium.org>
Reviewed-by: Yuzhu Shen <yzshen@chromium.org>
Cr-Commit-Position: refs/heads/master@{#493522}
[modify] https://crrev.com/1dae18b210f14028a6fcea2343191bada2a30eb0/mojo/edk/system/ports/node.cc

Comment 7 by roc...@chromium.org, Aug 14 2017

Just to persist ongoing discussion around this, jschuh@ has also recommended the original approach taken by https://chromium-review.googlesource.com/c/609442. That approach has two problems: 1) unbounded lifetime of TLS caches, and 2) cache duplication on zygote fork.

#1 could be solved using thread_local storage duration, but see https://groups.google.com/a/chromium.org/forum/#!topic/cxx/h7O5BdtWCZw

#2 can be addressed by flushing the cache with explicit half-page-sized calls to base::RandBytes immediately before forking, which is subtle and fragile but possibly OK.

It's still unclear whether thread_local can be relied upon.

Comment 8 by roc...@chromium.org, Aug 21 2017

Status: Fixed (was: Assigned)
Calling this fixed in any case, since the concern here was with Mojo performance. We can revisit base::RandBytes separately if there's sufficient motivation.

Sign in to add a comment