New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 681493 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

ReadableStream, WritableStream get dramatically slower when queue grows to 56720 chunks

Project Member Reported by ricea@chromium.org, Jan 16 2017

Issue description

Load the attached test in the browser. It enqueues a number of chunks in start() and then uses reads them out using a default reader.

Expected behaviour: roughly constant time per read.

Actual behaviour:

3545 reads took 47.15ms, 0.01ms per read
7090 reads took 56.02ms, 0.01ms per read
14180 reads took 100.84ms, 0.01ms per read
28360 reads took 208.79ms, 0.01ms per read
56720 reads took 5016.03ms, 0.09ms per read
113440 reads took 14758.54ms, 0.13ms per read

I manually bisected to identify 56720 chunks as the threshold where performance changes. It gets from 6 to 50 times slower (consistent within a run, but variable between runs).

The same behaviour can be seen with a similar test against WritableStream.

I have checked the v8 source code to see if the magic number 56720 appears somewhere. It does not. I will try profiling next.
 
readablestream-linear.html
1.8 KB View Download

Comment 1 by ricea@chromium.org, Jan 16 2017

On my android tablet the threshold is at a higher value, 226880 or less.

Comment 2 by ricea@chromium.org, Jan 16 2017

Output from running Chrome with --no-sandbox  --js-flags="--prof --log-timer-events --log" and the crunching the isolate-0x1bbcd53e5000-v8.log file through linux-tick-processor:

Statistical profiling result from out/Release/isolate-0x1bbcd53e5000-v8.log, (54819 ticks, 171 unaccounted, 0 excluded).

 [Shared libraries]:
   ticks  total  nonlib   name
  41628   75.9%          /opt/google/chrome/chrome
   8281   15.1%          /lib/x86_64-linux-gnu/libc-2.19.so
     19    0.0%          /lib/x86_64-linux-gnu/libpthread-2.19.so
      1    0.0%          /lib/x86_64-linux-gnu/libm-2.19.so

 [JavaScript]:
   ticks  total  nonlib   name
     74    0.1%    1.5%  KeyedStoreIC: A keyed store IC from the snapshot
     58    0.1%    1.2%  Stub: FastNewClosureStub
     48    0.1%    1.0%  KeyedLoadIC: A keyed load IC from the snapshot {1}
     45    0.1%    0.9%  LazyCompile: ~CreateResolvingFunctions native promise.js:40:34
     44    0.1%    0.9%  Stub: FastNewFunctionContextStub
     42    0.1%    0.9%  Builtin: CompileLazy
     33    0.1%    0.7%  Stub: CEntryStub
     33    0.1%    0.7%  KeyedStoreIC: A keyed store IC from the snapshot {2}
     27    0.0%    0.6%  LazyCompile: *PerformPromiseThen native promise.js:292:28
     26    0.0%    0.5%  Builtin: CallFunction_ReceiverIsAny
     25    0.0%    0.5%  Handler: An IC handler from the snapshot {2}
     22    0.0%    0.4%  LazyCompile: ~DequeueValue native ReadableStream.js:621:24
     21    0.0%    0.4%  LazyCompile: *resolvePromise native promise.js:181:24
     20    0.0%    0.4%  Stub: StoreTransitionStub {1}
     20    0.0%    0.4%  Stub: RecordWriteStub {3}
     20    0.0%    0.4%  KeyedLoadIC: A keyed load IC from the snapshot {2}
     20    0.0%    0.4%  KeyedLoadIC: A keyed load IC from the snapshot
     18    0.0%    0.4%  Builtin: ObjectHasOwnProperty
     17    0.0%    0.3%  Stub: FastNewRestParameterStub
     16    0.0%    0.3%  Stub: FastNewObjectStub
     16    0.0%    0.3%  LazyCompile: *NewPromiseCapability native promise.js:257:30
     16    0.0%    0.3%  Handler: An IC handler from the snapshot {3}
     15    0.0%    0.3%  LazyCompile: *ReadableStreamDefaultControllerPull native ReadableStream.js:249:47
     14    0.0%    0.3%  Stub: StoreTransitionStub
     14    0.0%    0.3%  LazyCompile: ~y native promise.js:42:7
     14    0.0%    0.3%  LazyCompile: ~reader.read.then.result file:///usr/local/google/home/ricea/readablestream-linear.html:30:31
     14    0.0%    0.3%  LazyCompile: *PromiseHandle native promise.js:105:23
     13    0.0%    0.3%  LazyCompile: *then native promise.js:315:21
     13    0.0%    0.3%  LazyCompile: *createPromise native promise.js:178:23
     13    0.0%    0.3%  Builtin: ReflectApply
     13    0.0%    0.3%  Builtin: ArgumentsAdaptorTrampoline
     12    0.0%    0.2%  LazyCompile: *ReadableStreamDefaultControllerShouldCallPull native ReadableStream.js:527:57
     11    0.0%    0.2%  Stub: StrictEqualStub
     11    0.0%    0.2%  LazyCompile: *PromiseSet native promise.js:79:20
     11    0.0%    0.2%  Builtin: CallFunction_ReceiverIsNullOrUndefined
     10    0.0%    0.2%  LazyCompile: *resolve native promise.js:327:24
      9    0.0%    0.2%  LazyCompile: *read native ReadableStream.js:297:9
      9    0.0%    0.2%  LazyCompile: *FulfillPromise native promise.js:95:24
      9    0.0%    0.2%  Handler: An IC handler from the snapshot
      8    0.0%    0.2%  Stub: RecordWriteStub {6}
      8    0.0%    0.2%  Stub: CompareICStub {1}
      8    0.0%    0.2%  LazyCompile: *ReadableStreamDefaultReaderRead native ReadableStream.js:489:43
      8    0.0%    0.2%  LazyCompile: *PromiseInit native promise.js:92:21
      8    0.0%    0.2%  LazyCompile: *IsPromise native promise.js:175:19
      8    0.0%    0.2%  KeyedStoreIC: A keyed store IC from the snapshot {1}
      8    0.0%    0.2%  Builtin: Apply
      7    0.0%    0.1%  Stub: RecordWriteStub {1}
      7    0.0%    0.1%  LazyCompile: *ReadableStreamGetState native ReadableStream.js:381:34
      7    0.0%    0.1%  LazyCompile: *PromiseEnqueue native promise.js:117:24
      7    0.0%    0.1%  LazyCompile: *IsReadableStreamDefaultReader native ReadableStream.js:441:41
      6    0.0%    0.1%  Stub: RecordWriteStub {7}
      6    0.0%    0.1%  Stub: RecordWriteStub {4}
      6    0.0%    0.1%  LazyCompile: *ReadableStreamDefaultControllerCallPullIfNeeded native ReadableStream.js:500:59
      6    0.0%    0.1%  LazyCompile: *PromiseAttachCallbacks native promise.js:147:32
      6    0.0%    0.1%  Builtin: JSBuiltinsConstructStub
      5    0.0%    0.1%  Stub: StoreTransitionStub {2}
      5    0.0%    0.1%  Stub: StoreFieldStub
      5    0.0%    0.1%  Stub: RecordWriteStub {9}
      5    0.0%    0.1%  Stub: RecordWriteStub {2}
      5    0.0%    0.1%  LazyCompile: *SpeciesConstructor native runtime.js:29:28
      5    0.0%    0.1%  LazyCompile: *EnqueueValueWithSize native ReadableStream.js:626:32
      5    0.0%    0.1%  Handler: symbol("promise_state_symbol" hash 3870e4d2)
      5    0.0%    0.1%  Handler: symbol("promise_has_handler_symbol" hash 370821e9)
      5    0.0%    0.1%  Builtin: Construct
      5    0.0%    0.1%  Builtin: ArrayShift
      4    0.0%    0.1%  Stub: StoreFieldStub {5}
      4    0.0%    0.1%  Stub: StoreFieldStub {2}
      4    0.0%    0.1%  Stub: RecordWriteStub {5}
      4    0.0%    0.1%  Stub: CompareICStub
      4    0.0%    0.1%  LazyCompile: *ReadableStreamDefaultControllerEnqueue native ReadableStream.js:353:50
      4    0.0%    0.1%  LazyCompile: *<anonymous> native prologue.js:234:16
      4    0.0%    0.1%  Handler: symbol("promise_reject_reactions_symbol" hash 2811097e)
      4    0.0%    0.1%  Handler: An IC handler from the snapshot {1}
      4    0.0%    0.1%  Builtin: ToBoolean
      4    0.0%    0.1%  Builtin: Call_ReceiverIsNotNullOrUndefined
      4    0.0%    0.1%  Builtin: CallFunction_ReceiverIsNotNullOrUndefined
      3    0.0%    0.1%  Stub: StoreFieldStub {1}
      3    0.0%    0.1%  Stub: RecordWriteStub {11}
      3    0.0%    0.1%  Stub: JSEntryStub
      3    0.0%    0.1%  Stub: BinaryOpICStub
      3    0.0%    0.1%  LazyCompile: *readLoop file:///usr/local/google/home/ricea/readablestream-linear.html:29:20
      2    0.0%    0.0%  Stub: StoreFieldStub {3}
      2    0.0%    0.0%  Stub: RecordWriteStub {10}
      2    0.0%    0.0%  Stub: FastArrayPushStub
      2    0.0%    0.0%  Stub: CEntryStub {1}
      2    0.0%    0.0%  LazyCompile: ~IsPromise native promise.js:175:19
      2    0.0%    0.0%  LazyCompile: *start file:///usr/local/google/home/ricea/readablestream-linear.html:18:12
      2    0.0%    0.0%  LazyCompile: *enqueue native ReadableStream.js:212:12
      2    0.0%    0.0%  LazyCompile: *Promise native promise.js:58:23
      2    0.0%    0.0%  LazyCompile: *CreateIterResultObject native ReadableStream.js:677:34
      2    0.0%    0.0%  Handler: symbol("promise_state_symbol" hash 3870e4d2) {1}
      2    0.0%    0.0%  Handler: symbol("promise_fulfill_reactions_symbol" hash 35cbba9c) {1}
      2    0.0%    0.0%  Builtin: NumberConstructor
      2    0.0%    0.0%  Builtin: Call_ReceiverIsNullOrUndefined
      1    0.0%    0.0%  Stub: ToBooleanICStub
      1    0.0%    0.0%  Stub: StoreFieldStub {4}
      1    0.0%    0.0%  Stub: RecordWriteStub {8}
      1    0.0%    0.0%  Stub: RecordWriteStub
      1    0.0%    0.0%  LazyCompile: ~PromiseSet native promise.js:79:20
      1    0.0%    0.0%  LazyCompile: ~PromiseAttachCallbacks native promise.js:147:32
      1    0.0%    0.0%  LazyCompile: *IsReadableStreamLocked native ReadableStream.js:435:34
      1    0.0%    0.0%  LazyCompile: *IsReadableStreamDefaultController native ReadableStream.js:438:45
      1    0.0%    0.0%  LazyCompile: *<anonymous> native prologue.js:229:16
      1    0.0%    0.0%  Handler: symbol("promise_result_symbol" hash 1ffbbcfb) {1}
      1    0.0%    0.0%  Handler: symbol("promise_result_symbol" hash 1ffbbcfb)
      1    0.0%    0.0%  Handler: symbol("promise_fulfill_reactions_symbol" hash 35cbba9c)
      1    0.0%    0.0%  Handler: shift
      1    0.0%    0.0%  Builtin: ConstructFunction
      1    0.0%    0.0%  Builtin: Call_ReceiverIsAny

 [C++]:
   ticks  total  nonlib   name
   3399    6.2%   69.5%  __pthread_cond_timedwait
     97    0.2%    2.0%  __pthread_cond_wait
     51    0.1%    1.0%  write
      7    0.0%    0.1%  syscall
      5    0.0%    0.1%  madvise
      5    0.0%    0.1%  _init
      5    0.0%    0.1%  __GI___pthread_mutex_lock
      5    0.0%    0.1%  _IO_vfprintf
      3    0.0%    0.1%  _IO_default_xsputn
      2    0.0%    0.0%  strlen
      2    0.0%    0.0%  memset
      1    0.0%    0.0%  std::string::reserve(unsigned long)
      1    0.0%    0.0%  sem_wait
      1    0.0%    0.0%  char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag)
      1    0.0%    0.0%  __printf_fp
      1    0.0%    0.0%  __lll_unlock_wake
      1    0.0%    0.0%  __libc_malloc
      1    0.0%    0.0%  _IO_file_sync
      1    0.0%    0.0%  _IO_do_write

 [Summary]:
   ticks  total  nonlib   name
   1130    2.1%   23.1%  JavaScript
   3589    6.5%   73.4%  C++
    623    1.1%   12.7%  GC
  49929   91.1%          Shared libraries
    171    0.3%          Unaccounted

 [C++ entry points]:
   ticks    cpp   total   name
     44   73.3%    0.1%  write
      5    8.3%    0.0%  __GI___pthread_mutex_lock
      3    5.0%    0.0%  _init
      3    5.0%    0.0%  _IO_vfprintf
      1    1.7%    0.0%  syscall
      1    1.7%    0.0%  madvise
      1    1.7%    0.0%  __libc_malloc
      1    1.7%    0.0%  _IO_file_sync
      1    1.7%    0.0%  _IO_do_write

 [Bottom up (heavy) profile]:
  Note: percentage shows a share of a particular caller in the total
  amount of its parent calls.
  Callers occupying less than 2.0% are not shown.

   ticks parent  name
  41628   75.9%  /opt/google/chrome/chrome
  40828   98.1%    /opt/google/chrome/chrome
  39831   97.6%      LazyCompile: ~DequeueValue native ReadableStream.js:621:24
  39556   99.3%        LazyCompile: *ReadableStreamDefaultControllerPull native ReadableStream.js:249:47
  39556  100.0%          LazyCompile: *ReadableStreamDefaultReaderRead native ReadableStream.js:489:43
  39556  100.0%            LazyCompile: *read native ReadableStream.js:297:9

   8281   15.1%  /lib/x86_64-linux-gnu/libc-2.19.so
   8273   99.9%    /opt/google/chrome/chrome
   8273  100.0%      LazyCompile: ~DequeueValue native ReadableStream.js:621:24
   8241   99.6%        LazyCompile: *ReadableStreamDefaultControllerPull native ReadableStream.js:249:47
   8241  100.0%          LazyCompile: *ReadableStreamDefaultReaderRead native ReadableStream.js:489:43
   8241  100.0%            LazyCompile: *read native ReadableStream.js:297:9

   3399    6.2%  __pthread_cond_timedwait


The only useful thing I was able to infer from this was that 95% of the time was spent in DequeueValue in ReadableStream.js.

Fortunately that function only has three lines, so I'm pretty sure that this one is the problem:

  const result = controller[_queue].shift();

Comment 3 by ricea@chromium.org, Jan 16 2017

I have one possible fix at https://codereview.chromium.org/2637863002, using a linked-list of arrays to stop any individual array getting "too big".

Problems:
* I don't know the root cause. It doesn't behave like a quadratic algorithm, it behaves like a step function.
* I don't actually know what "too big" is. It's different on the two machines I've tested it on, so what's to stop it being 16384 somewhere else?
* I don't know how to automatically regression test it. If we add a layout test that queues 100,000 chunks it will take a second even on a fast machine. VMs, MSAN, etc. would take an order of magnitude or two longer.

Comment 4 by ricea@chromium.org, Jan 17 2017

A similar test for the [[readRequests]] internal slot. Sample results:

3545 reads took 46.40ms, 0.01ms per read
7090 reads took 67.56ms, 0.01ms per read
14180 reads took 69.30ms, 0.00ms per read
28360 reads took 148.44ms, 0.01ms per read
56720 reads took 55357.14ms, 0.98ms per read
113440 reads took 213609.48ms, 1.88ms per read
readablestream-readrequests-benchmark.html
2.6 KB View Download

Comment 5 by ricea@chromium.org, Jan 17 2017

The benchmark for WritableStream. Sample results:

3545 writes took 84.45ms, 0.02ms per write
7090 writes took 107.83ms, 0.02ms per write
14180 writes took 154.55ms, 0.01ms per write
28360 writes took 287.54ms, 0.01ms per write
56720 writes took 26662.79ms, 0.47ms per write
113440 writes took 46744.04ms, 0.41ms per write
writablestream-linear.html
2.1 KB View Download

Comment 6 by ricea@chromium.org, Jan 17 2017

Results for the original benchmark after applying the tentative fix from https://codereview.chromium.org/2637863002:

3545 reads took 63.67ms, 0.02ms per read
7090 reads took 90.00ms, 0.01ms per read
14180 reads took 108.91ms, 0.01ms per read
28360 reads took 242.88ms, 0.01ms per read
56720 reads took 444.55ms, 0.01ms per read
113440 reads took 929.88ms, 0.01ms per read
226880 reads took 1723.34ms, 0.01ms per read
453760 reads took 3075.35ms, 0.01ms per read
907520 reads took 7433.19ms, 0.01ms per read

Comment 7 by ricea@chromium.org, Jan 17 2017

I have a hypothesis that v8 is switching to a different array structure when the array gets larger than a certain size. The structure used for smaller arrays supports O(1) shift(), but the structure used for larger arrays has O(N) shift.

This is pure speculation based on the benchmark results.
I am curious what a 56720-queued-chunk stream looks like, to get a sense of how much of a real-world problem this is. How big is a typical chunk? E.g. if each chunk is 1 KB then yeah, I guess having 57 MB queued causing a slowdown is not great. But if it's 16 KB then I'm less worried about an extra 0.5 milliseconds when you already have ~900 MB queued.

Comment 9 by ricea@chromium.org, Jan 18 2017

I would expect that having 56720 chunks queued would be unusual in real-world applications. However, falling off a performance cliff would be unhelpful.

I have been thinking about token (as in string parsing) streams as a useful primitive. In that case 56720 chunks would only represent a few megabytes.

Even with 16KB chunks there's no way you wouldn't notice the slowdown. It feels like everything has stopped.

Comment 10 Deleted

Comment 11 by ricea@chromium.org, Jan 19 2017

Jakob Kummerow posted a great explanation of this behaviour to v8-users@googlegroups.com:

https://groups.google.com/d/msg/v8-users/Hj78A4yhQS4/edz6-NsyCAAJ

Comment 12 by ricea@chromium.org, Jan 19 2017

Results for readablestream-readrequests-benchmark.html after also using the new data structure for the [[readRequests]] slot:

3545 reads took 49.09ms, 0.01ms per read
7090 reads took 78.82ms, 0.01ms per read
14180 reads took 51.85ms, 0.00ms per read
28360 reads took 137.34ms, 0.00ms per read
56720 reads took 234.37ms, 0.00ms per read
113440 reads took 542.55ms, 0.00ms per read
Project Member

Comment 13 by bugdroid1@chromium.org, Jan 24 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9106f36b383c3e366fa8bb9db1c5b9b1c426491a

commit 9106f36b383c3e366fa8bb9db1c5b9b1c426491a
Author: ricea <ricea@chromium.org>
Date: Tue Jan 24 14:27:38 2017

Fix ReadableStream scalability issue

ReadableStream becomes very slow when 56720 or more chunks are in the
queue. Instead of one InternalPackedArray as the queue, use a
single-linked list of InternalPackedArrays. If adding a chunk would
cause the array at the back to grow to more than 16384 chunks, add a
new node to the list instead.

The new Queue class hasn't had any particular optimisation, but
informal benchmarks show it slightly slower for small queues, faster
for medium-size queues and drastically faster where the old
implementation fell off the performance cliff.

Thanks to Jakob Kummerow for suggesting the cursor mechanism:
https://groups.google.com/d/msg/v8-users/Hj78A4yhQS4/edz6-NsyCAAJ

Alternatives considered:

* Array-backed circular buffer. Rejected because
  it would cause additional overhead in the case of a small queue.
* Polymorphically switching between old and new queue representations
  at some threshold. Rejected for complexity.

BUG= 681493 

Review-Url: https://codereview.chromium.org/2637863002
Cr-Commit-Position: refs/heads/master@{#445717}

[modify] https://crrev.com/9106f36b383c3e366fa8bb9db1c5b9b1c426491a/third_party/WebKit/Source/core/streams/ReadableStream.js

Project Member

Comment 14 by bugdroid1@chromium.org, Mar 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e123782556b8f18f3a070722dd68ab617a930976

commit e123782556b8f18f3a070722dd68ab617a930976
Author: ricea <ricea@chromium.org>
Date: Tue Mar 21 06:26:29 2017

Use SimpleQueue in WritableStream implementation

Split out Queue into a separate file from ReadableStream and rename is
SimpleQueue to reduce the risk of name collisions on the binding object.

Add a peek() method as it is needed by WritableStream.

Modify WritableStream to use SimpleQueue.

BUG= 681493 

Review-Url: https://codereview.chromium.org/2752133003
Cr-Commit-Position: refs/heads/master@{#458324}

[modify] https://crrev.com/e123782556b8f18f3a070722dd68ab617a930976/.gn
[modify] https://crrev.com/e123782556b8f18f3a070722dd68ab617a930976/third_party/WebKit/Source/core/streams/ReadableStream.js
[add] https://crrev.com/e123782556b8f18f3a070722dd68ab617a930976/third_party/WebKit/Source/core/streams/SimpleQueue.js
[modify] https://crrev.com/e123782556b8f18f3a070722dd68ab617a930976/third_party/WebKit/Source/core/streams/WritableStream.js

Comment 15 by ricea@chromium.org, Mar 21 2017

Status: Fixed (was: Assigned)

Comment 16 Deleted

Comment 17 by ricea@chromium.org, Mar 21 2017

For the record, here's the results for writablestream-linear.html after the fix:

3545 writes took 60.60ms, 0.02ms per write
7090 writes took 47.81ms, 0.01ms per write
14180 writes took 74.96ms, 0.01ms per write
28360 writes took 146.51ms, 0.01ms per write
56720 writes took 295.42ms, 0.01ms per write
113440 writes took 635.59ms, 0.01ms per write

Sign in to add a comment