ReadableStream, WritableStream get dramatically slower when queue grows to 56720 chunks |
||
Issue descriptionLoad the attached test in the browser. It enqueues a number of chunks in start() and then uses reads them out using a default reader. Expected behaviour: roughly constant time per read. Actual behaviour: 3545 reads took 47.15ms, 0.01ms per read 7090 reads took 56.02ms, 0.01ms per read 14180 reads took 100.84ms, 0.01ms per read 28360 reads took 208.79ms, 0.01ms per read 56720 reads took 5016.03ms, 0.09ms per read 113440 reads took 14758.54ms, 0.13ms per read I manually bisected to identify 56720 chunks as the threshold where performance changes. It gets from 6 to 50 times slower (consistent within a run, but variable between runs). The same behaviour can be seen with a similar test against WritableStream. I have checked the v8 source code to see if the magic number 56720 appears somewhere. It does not. I will try profiling next.
,
Jan 16 2017
Output from running Chrome with --no-sandbox --js-flags="--prof --log-timer-events --log" and the crunching the isolate-0x1bbcd53e5000-v8.log file through linux-tick-processor:
Statistical profiling result from out/Release/isolate-0x1bbcd53e5000-v8.log, (54819 ticks, 171 unaccounted, 0 excluded).
[Shared libraries]:
ticks total nonlib name
41628 75.9% /opt/google/chrome/chrome
8281 15.1% /lib/x86_64-linux-gnu/libc-2.19.so
19 0.0% /lib/x86_64-linux-gnu/libpthread-2.19.so
1 0.0% /lib/x86_64-linux-gnu/libm-2.19.so
[JavaScript]:
ticks total nonlib name
74 0.1% 1.5% KeyedStoreIC: A keyed store IC from the snapshot
58 0.1% 1.2% Stub: FastNewClosureStub
48 0.1% 1.0% KeyedLoadIC: A keyed load IC from the snapshot {1}
45 0.1% 0.9% LazyCompile: ~CreateResolvingFunctions native promise.js:40:34
44 0.1% 0.9% Stub: FastNewFunctionContextStub
42 0.1% 0.9% Builtin: CompileLazy
33 0.1% 0.7% Stub: CEntryStub
33 0.1% 0.7% KeyedStoreIC: A keyed store IC from the snapshot {2}
27 0.0% 0.6% LazyCompile: *PerformPromiseThen native promise.js:292:28
26 0.0% 0.5% Builtin: CallFunction_ReceiverIsAny
25 0.0% 0.5% Handler: An IC handler from the snapshot {2}
22 0.0% 0.4% LazyCompile: ~DequeueValue native ReadableStream.js:621:24
21 0.0% 0.4% LazyCompile: *resolvePromise native promise.js:181:24
20 0.0% 0.4% Stub: StoreTransitionStub {1}
20 0.0% 0.4% Stub: RecordWriteStub {3}
20 0.0% 0.4% KeyedLoadIC: A keyed load IC from the snapshot {2}
20 0.0% 0.4% KeyedLoadIC: A keyed load IC from the snapshot
18 0.0% 0.4% Builtin: ObjectHasOwnProperty
17 0.0% 0.3% Stub: FastNewRestParameterStub
16 0.0% 0.3% Stub: FastNewObjectStub
16 0.0% 0.3% LazyCompile: *NewPromiseCapability native promise.js:257:30
16 0.0% 0.3% Handler: An IC handler from the snapshot {3}
15 0.0% 0.3% LazyCompile: *ReadableStreamDefaultControllerPull native ReadableStream.js:249:47
14 0.0% 0.3% Stub: StoreTransitionStub
14 0.0% 0.3% LazyCompile: ~y native promise.js:42:7
14 0.0% 0.3% LazyCompile: ~reader.read.then.result file:///usr/local/google/home/ricea/readablestream-linear.html:30:31
14 0.0% 0.3% LazyCompile: *PromiseHandle native promise.js:105:23
13 0.0% 0.3% LazyCompile: *then native promise.js:315:21
13 0.0% 0.3% LazyCompile: *createPromise native promise.js:178:23
13 0.0% 0.3% Builtin: ReflectApply
13 0.0% 0.3% Builtin: ArgumentsAdaptorTrampoline
12 0.0% 0.2% LazyCompile: *ReadableStreamDefaultControllerShouldCallPull native ReadableStream.js:527:57
11 0.0% 0.2% Stub: StrictEqualStub
11 0.0% 0.2% LazyCompile: *PromiseSet native promise.js:79:20
11 0.0% 0.2% Builtin: CallFunction_ReceiverIsNullOrUndefined
10 0.0% 0.2% LazyCompile: *resolve native promise.js:327:24
9 0.0% 0.2% LazyCompile: *read native ReadableStream.js:297:9
9 0.0% 0.2% LazyCompile: *FulfillPromise native promise.js:95:24
9 0.0% 0.2% Handler: An IC handler from the snapshot
8 0.0% 0.2% Stub: RecordWriteStub {6}
8 0.0% 0.2% Stub: CompareICStub {1}
8 0.0% 0.2% LazyCompile: *ReadableStreamDefaultReaderRead native ReadableStream.js:489:43
8 0.0% 0.2% LazyCompile: *PromiseInit native promise.js:92:21
8 0.0% 0.2% LazyCompile: *IsPromise native promise.js:175:19
8 0.0% 0.2% KeyedStoreIC: A keyed store IC from the snapshot {1}
8 0.0% 0.2% Builtin: Apply
7 0.0% 0.1% Stub: RecordWriteStub {1}
7 0.0% 0.1% LazyCompile: *ReadableStreamGetState native ReadableStream.js:381:34
7 0.0% 0.1% LazyCompile: *PromiseEnqueue native promise.js:117:24
7 0.0% 0.1% LazyCompile: *IsReadableStreamDefaultReader native ReadableStream.js:441:41
6 0.0% 0.1% Stub: RecordWriteStub {7}
6 0.0% 0.1% Stub: RecordWriteStub {4}
6 0.0% 0.1% LazyCompile: *ReadableStreamDefaultControllerCallPullIfNeeded native ReadableStream.js:500:59
6 0.0% 0.1% LazyCompile: *PromiseAttachCallbacks native promise.js:147:32
6 0.0% 0.1% Builtin: JSBuiltinsConstructStub
5 0.0% 0.1% Stub: StoreTransitionStub {2}
5 0.0% 0.1% Stub: StoreFieldStub
5 0.0% 0.1% Stub: RecordWriteStub {9}
5 0.0% 0.1% Stub: RecordWriteStub {2}
5 0.0% 0.1% LazyCompile: *SpeciesConstructor native runtime.js:29:28
5 0.0% 0.1% LazyCompile: *EnqueueValueWithSize native ReadableStream.js:626:32
5 0.0% 0.1% Handler: symbol("promise_state_symbol" hash 3870e4d2)
5 0.0% 0.1% Handler: symbol("promise_has_handler_symbol" hash 370821e9)
5 0.0% 0.1% Builtin: Construct
5 0.0% 0.1% Builtin: ArrayShift
4 0.0% 0.1% Stub: StoreFieldStub {5}
4 0.0% 0.1% Stub: StoreFieldStub {2}
4 0.0% 0.1% Stub: RecordWriteStub {5}
4 0.0% 0.1% Stub: CompareICStub
4 0.0% 0.1% LazyCompile: *ReadableStreamDefaultControllerEnqueue native ReadableStream.js:353:50
4 0.0% 0.1% LazyCompile: *<anonymous> native prologue.js:234:16
4 0.0% 0.1% Handler: symbol("promise_reject_reactions_symbol" hash 2811097e)
4 0.0% 0.1% Handler: An IC handler from the snapshot {1}
4 0.0% 0.1% Builtin: ToBoolean
4 0.0% 0.1% Builtin: Call_ReceiverIsNotNullOrUndefined
4 0.0% 0.1% Builtin: CallFunction_ReceiverIsNotNullOrUndefined
3 0.0% 0.1% Stub: StoreFieldStub {1}
3 0.0% 0.1% Stub: RecordWriteStub {11}
3 0.0% 0.1% Stub: JSEntryStub
3 0.0% 0.1% Stub: BinaryOpICStub
3 0.0% 0.1% LazyCompile: *readLoop file:///usr/local/google/home/ricea/readablestream-linear.html:29:20
2 0.0% 0.0% Stub: StoreFieldStub {3}
2 0.0% 0.0% Stub: RecordWriteStub {10}
2 0.0% 0.0% Stub: FastArrayPushStub
2 0.0% 0.0% Stub: CEntryStub {1}
2 0.0% 0.0% LazyCompile: ~IsPromise native promise.js:175:19
2 0.0% 0.0% LazyCompile: *start file:///usr/local/google/home/ricea/readablestream-linear.html:18:12
2 0.0% 0.0% LazyCompile: *enqueue native ReadableStream.js:212:12
2 0.0% 0.0% LazyCompile: *Promise native promise.js:58:23
2 0.0% 0.0% LazyCompile: *CreateIterResultObject native ReadableStream.js:677:34
2 0.0% 0.0% Handler: symbol("promise_state_symbol" hash 3870e4d2) {1}
2 0.0% 0.0% Handler: symbol("promise_fulfill_reactions_symbol" hash 35cbba9c) {1}
2 0.0% 0.0% Builtin: NumberConstructor
2 0.0% 0.0% Builtin: Call_ReceiverIsNullOrUndefined
1 0.0% 0.0% Stub: ToBooleanICStub
1 0.0% 0.0% Stub: StoreFieldStub {4}
1 0.0% 0.0% Stub: RecordWriteStub {8}
1 0.0% 0.0% Stub: RecordWriteStub
1 0.0% 0.0% LazyCompile: ~PromiseSet native promise.js:79:20
1 0.0% 0.0% LazyCompile: ~PromiseAttachCallbacks native promise.js:147:32
1 0.0% 0.0% LazyCompile: *IsReadableStreamLocked native ReadableStream.js:435:34
1 0.0% 0.0% LazyCompile: *IsReadableStreamDefaultController native ReadableStream.js:438:45
1 0.0% 0.0% LazyCompile: *<anonymous> native prologue.js:229:16
1 0.0% 0.0% Handler: symbol("promise_result_symbol" hash 1ffbbcfb) {1}
1 0.0% 0.0% Handler: symbol("promise_result_symbol" hash 1ffbbcfb)
1 0.0% 0.0% Handler: symbol("promise_fulfill_reactions_symbol" hash 35cbba9c)
1 0.0% 0.0% Handler: shift
1 0.0% 0.0% Builtin: ConstructFunction
1 0.0% 0.0% Builtin: Call_ReceiverIsAny
[C++]:
ticks total nonlib name
3399 6.2% 69.5% __pthread_cond_timedwait
97 0.2% 2.0% __pthread_cond_wait
51 0.1% 1.0% write
7 0.0% 0.1% syscall
5 0.0% 0.1% madvise
5 0.0% 0.1% _init
5 0.0% 0.1% __GI___pthread_mutex_lock
5 0.0% 0.1% _IO_vfprintf
3 0.0% 0.1% _IO_default_xsputn
2 0.0% 0.0% strlen
2 0.0% 0.0% memset
1 0.0% 0.0% std::string::reserve(unsigned long)
1 0.0% 0.0% sem_wait
1 0.0% 0.0% char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag)
1 0.0% 0.0% __printf_fp
1 0.0% 0.0% __lll_unlock_wake
1 0.0% 0.0% __libc_malloc
1 0.0% 0.0% _IO_file_sync
1 0.0% 0.0% _IO_do_write
[Summary]:
ticks total nonlib name
1130 2.1% 23.1% JavaScript
3589 6.5% 73.4% C++
623 1.1% 12.7% GC
49929 91.1% Shared libraries
171 0.3% Unaccounted
[C++ entry points]:
ticks cpp total name
44 73.3% 0.1% write
5 8.3% 0.0% __GI___pthread_mutex_lock
3 5.0% 0.0% _init
3 5.0% 0.0% _IO_vfprintf
1 1.7% 0.0% syscall
1 1.7% 0.0% madvise
1 1.7% 0.0% __libc_malloc
1 1.7% 0.0% _IO_file_sync
1 1.7% 0.0% _IO_do_write
[Bottom up (heavy) profile]:
Note: percentage shows a share of a particular caller in the total
amount of its parent calls.
Callers occupying less than 2.0% are not shown.
ticks parent name
41628 75.9% /opt/google/chrome/chrome
40828 98.1% /opt/google/chrome/chrome
39831 97.6% LazyCompile: ~DequeueValue native ReadableStream.js:621:24
39556 99.3% LazyCompile: *ReadableStreamDefaultControllerPull native ReadableStream.js:249:47
39556 100.0% LazyCompile: *ReadableStreamDefaultReaderRead native ReadableStream.js:489:43
39556 100.0% LazyCompile: *read native ReadableStream.js:297:9
8281 15.1% /lib/x86_64-linux-gnu/libc-2.19.so
8273 99.9% /opt/google/chrome/chrome
8273 100.0% LazyCompile: ~DequeueValue native ReadableStream.js:621:24
8241 99.6% LazyCompile: *ReadableStreamDefaultControllerPull native ReadableStream.js:249:47
8241 100.0% LazyCompile: *ReadableStreamDefaultReaderRead native ReadableStream.js:489:43
8241 100.0% LazyCompile: *read native ReadableStream.js:297:9
3399 6.2% __pthread_cond_timedwait
The only useful thing I was able to infer from this was that 95% of the time was spent in DequeueValue in ReadableStream.js.
Fortunately that function only has three lines, so I'm pretty sure that this one is the problem:
const result = controller[_queue].shift();
,
Jan 16 2017
I have one possible fix at https://codereview.chromium.org/2637863002, using a linked-list of arrays to stop any individual array getting "too big". Problems: * I don't know the root cause. It doesn't behave like a quadratic algorithm, it behaves like a step function. * I don't actually know what "too big" is. It's different on the two machines I've tested it on, so what's to stop it being 16384 somewhere else? * I don't know how to automatically regression test it. If we add a layout test that queues 100,000 chunks it will take a second even on a fast machine. VMs, MSAN, etc. would take an order of magnitude or two longer.
,
Jan 17 2017
A similar test for the [[readRequests]] internal slot. Sample results: 3545 reads took 46.40ms, 0.01ms per read 7090 reads took 67.56ms, 0.01ms per read 14180 reads took 69.30ms, 0.00ms per read 28360 reads took 148.44ms, 0.01ms per read 56720 reads took 55357.14ms, 0.98ms per read 113440 reads took 213609.48ms, 1.88ms per read
,
Jan 17 2017
The benchmark for WritableStream. Sample results: 3545 writes took 84.45ms, 0.02ms per write 7090 writes took 107.83ms, 0.02ms per write 14180 writes took 154.55ms, 0.01ms per write 28360 writes took 287.54ms, 0.01ms per write 56720 writes took 26662.79ms, 0.47ms per write 113440 writes took 46744.04ms, 0.41ms per write
,
Jan 17 2017
Results for the original benchmark after applying the tentative fix from https://codereview.chromium.org/2637863002: 3545 reads took 63.67ms, 0.02ms per read 7090 reads took 90.00ms, 0.01ms per read 14180 reads took 108.91ms, 0.01ms per read 28360 reads took 242.88ms, 0.01ms per read 56720 reads took 444.55ms, 0.01ms per read 113440 reads took 929.88ms, 0.01ms per read 226880 reads took 1723.34ms, 0.01ms per read 453760 reads took 3075.35ms, 0.01ms per read 907520 reads took 7433.19ms, 0.01ms per read
,
Jan 17 2017
I have a hypothesis that v8 is switching to a different array structure when the array gets larger than a certain size. The structure used for smaller arrays supports O(1) shift(), but the structure used for larger arrays has O(N) shift. This is pure speculation based on the benchmark results.
,
Jan 17 2017
I am curious what a 56720-queued-chunk stream looks like, to get a sense of how much of a real-world problem this is. How big is a typical chunk? E.g. if each chunk is 1 KB then yeah, I guess having 57 MB queued causing a slowdown is not great. But if it's 16 KB then I'm less worried about an extra 0.5 milliseconds when you already have ~900 MB queued.
,
Jan 18 2017
I would expect that having 56720 chunks queued would be unusual in real-world applications. However, falling off a performance cliff would be unhelpful. I have been thinking about token (as in string parsing) streams as a useful primitive. In that case 56720 chunks would only represent a few megabytes. Even with 16KB chunks there's no way you wouldn't notice the slowdown. It feels like everything has stopped.
,
Jan 19 2017
Jakob Kummerow posted a great explanation of this behaviour to v8-users@googlegroups.com: https://groups.google.com/d/msg/v8-users/Hj78A4yhQS4/edz6-NsyCAAJ
,
Jan 19 2017
Results for readablestream-readrequests-benchmark.html after also using the new data structure for the [[readRequests]] slot: 3545 reads took 49.09ms, 0.01ms per read 7090 reads took 78.82ms, 0.01ms per read 14180 reads took 51.85ms, 0.00ms per read 28360 reads took 137.34ms, 0.00ms per read 56720 reads took 234.37ms, 0.00ms per read 113440 reads took 542.55ms, 0.00ms per read
,
Jan 24 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9106f36b383c3e366fa8bb9db1c5b9b1c426491a commit 9106f36b383c3e366fa8bb9db1c5b9b1c426491a Author: ricea <ricea@chromium.org> Date: Tue Jan 24 14:27:38 2017 Fix ReadableStream scalability issue ReadableStream becomes very slow when 56720 or more chunks are in the queue. Instead of one InternalPackedArray as the queue, use a single-linked list of InternalPackedArrays. If adding a chunk would cause the array at the back to grow to more than 16384 chunks, add a new node to the list instead. The new Queue class hasn't had any particular optimisation, but informal benchmarks show it slightly slower for small queues, faster for medium-size queues and drastically faster where the old implementation fell off the performance cliff. Thanks to Jakob Kummerow for suggesting the cursor mechanism: https://groups.google.com/d/msg/v8-users/Hj78A4yhQS4/edz6-NsyCAAJ Alternatives considered: * Array-backed circular buffer. Rejected because it would cause additional overhead in the case of a small queue. * Polymorphically switching between old and new queue representations at some threshold. Rejected for complexity. BUG= 681493 Review-Url: https://codereview.chromium.org/2637863002 Cr-Commit-Position: refs/heads/master@{#445717} [modify] https://crrev.com/9106f36b383c3e366fa8bb9db1c5b9b1c426491a/third_party/WebKit/Source/core/streams/ReadableStream.js
,
Mar 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e123782556b8f18f3a070722dd68ab617a930976 commit e123782556b8f18f3a070722dd68ab617a930976 Author: ricea <ricea@chromium.org> Date: Tue Mar 21 06:26:29 2017 Use SimpleQueue in WritableStream implementation Split out Queue into a separate file from ReadableStream and rename is SimpleQueue to reduce the risk of name collisions on the binding object. Add a peek() method as it is needed by WritableStream. Modify WritableStream to use SimpleQueue. BUG= 681493 Review-Url: https://codereview.chromium.org/2752133003 Cr-Commit-Position: refs/heads/master@{#458324} [modify] https://crrev.com/e123782556b8f18f3a070722dd68ab617a930976/.gn [modify] https://crrev.com/e123782556b8f18f3a070722dd68ab617a930976/third_party/WebKit/Source/core/streams/ReadableStream.js [add] https://crrev.com/e123782556b8f18f3a070722dd68ab617a930976/third_party/WebKit/Source/core/streams/SimpleQueue.js [modify] https://crrev.com/e123782556b8f18f3a070722dd68ab617a930976/third_party/WebKit/Source/core/streams/WritableStream.js
,
Mar 21 2017
,
Mar 21 2017
For the record, here's the results for writablestream-linear.html after the fix: 3545 writes took 60.60ms, 0.02ms per write 7090 writes took 47.81ms, 0.01ms per write 14180 writes took 74.96ms, 0.01ms per write 28360 writes took 146.51ms, 0.01ms per write 56720 writes took 295.42ms, 0.01ms per write 113440 writes took 635.59ms, 0.01ms per write |
||
►
Sign in to add a comment |
||
Comment 1 by ricea@chromium.org
, Jan 16 2017