New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 689753 link

Starred by 5 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug



Sign in to add a comment

Need an efficient way to decode a subarray of Uint8Array of utf-8 bytes

Project Member Reported by updogliu@google.com, Feb 8 2017

Issue description

UserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.21 Safari/537.36

Steps to reproduce the problem:
Run the benchmark
https://jsperf.com/utf-8-textdecoder2

What is the expected behavior?
textdecoder.decode(uint8bytes.subarray(0, byteLength)) is much faster than the hand-written JS decoder.

What went wrong?
textdecoder.decode(uint8bytes.subarray(0, byteLength)) is very slow, which may indicate room for improvement in two places:
1. implementation of textdecoder.decode()
2. implementation of uint8bytes.subarray()

Did this work before? N/A 

Does this work in other browsers? N/A

Chrome version: 57.0.2987.21  Channel: beta
OS Version: Ubuntu 14.04
Flash Version:

 

Comment 1 by kbr@chromium.org, Feb 8 2017

Cc: jsb...@chromium.org
Components: Blink>TextEncoding

Comment 2 by ajha@chromium.org, Feb 8 2017

Labels: Needs-Triage-M57

Comment 3 Deleted

Cc: bmeu...@chromium.org
Benedikt, do you have someone in mind who could work on Uint8Array.subarray? The current implementation does not seem ideal.
Cc: ca...@igalia.com
Status: Available (was: Unconfirmed)
The TextDecoder implementation does not use ICU for UTF-8. Like the rest of blink it uses a hand-rolled decoder in WTF - third_party/WebKit/Source/wtf/text/TextCodecUTF8.h

UTF-8 decoding is mechanically quite simple and I suspect a JS implementation can be heavily optimized by V8, getting close to native speed as indicated by the test results (on my machine: 1,350,873ops/sec for native vs. 690,288ops/sec for JS).

Given the profiling results, it looks like subarray() is where the time is going (as hinted at in #c4). I'd push on optimizing that before we extend the API surface area.

Also, spec feature requests can be filed at: https://github.com/whatwg/encoding/issues
Labels: -Needs-Triage-M57 M58
As its already being worked on and seems TE team can't triage it removing Needs-Triage_M57 label and marked to M58.

Comment 8 by updogliu@google.com, Mar 21 2017

Any update on this?

Comment 9 by ca...@igalia.com, Mar 21 2017

Cc: loorong...@gmail.com
I had taken a look at this, but constructing the new typedarray was very slow in c++ or CSA builtins, due to SpeciesConstructor. I haven't got around to speeding it up as is done for Array Builtins, and that would probably not help usecases in Blink anyways. So, there was no real benefit.

Another developer has been spending a lot more time on TypedArray code recently, and may be interested to take this on soon.
cwhan.tunz@ implemented TypedArrayCreate and TypedArraySpeciesCreate which is needed for %TypedArray%.prototype.subarray in https://codereview.chromium.org/2763473002/

Can someone add him to /cc list as well?
Cc: cwhan.t...@gmail.com
Components: Blink>JavaScript>Runtime
Labels: -OS-Linux OS-All
Labels: Needs-Feedback
Can the OP try Chrome Canary and see if the performance with native subarray() is closer to expected?

(I'll give it a whirl myself when I'm not on a chromebook)
FYI, numbers from latest canary (60.0.3096.0, Mac OS on a newish MacBook Pro)

JavaScript: 2.2Mops/sec
JavaScript (Array): 2.3Mops/sec
TextDecoder: 2.2Mops/sec
TextDecoder w/ Uint8Array conversion: 0.55Mops/sec
TextDecoder Subarray: 1.0Mops/sec
JSON: 4.5Mops/sec

So subarray() is 2x faster than newing a Uint8Array, which... is pretty awesome.

JS and native decoding is about equivalent on this machine - not a big surprise.

Taking a slice with subarray() to pass to the native function is still slower. At least part of that must be GC.

For kicks I made a variation that compares JS vs. native decoders, and passing the array vs. using subarray:

JavaScript: 2.1Mops/sec
TextDecoder: 2.1Mops/sec
JavaScript w/ subarray: 1.7Mops/sec
TextDecoder w/ subarray: 1.1Mops/sec

That delta is unexpected. I'm guessing it's down to bindings?

All that said... a question for the OP is: is the current performance acceptable (i.e. can we close this out) or is there a need to push farther (at the expense of other investment in Chrome/V8)?



Thank you for the improvements, jsbell@! I am fine with stopping at here if further investigation has a higher bar and/or unlikely to be paid off.
Labels: -M58 M-58
Project Member

Comment 17 by sheriffbot@chromium.org, Jul 27 2017

Labels: Hotlist-Google
Oops, thanks for following up!

Closing for now, then. It's a legitimate feature request against the Encoding spec's API, but there hasn't been a steady stream of requests for it.
Status: Fixed (was: Available)

Sign in to add a comment