Need an efficient way to decode a subarray of Uint8Array of utf-8 bytes |
|||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.21 Safari/537.36 Steps to reproduce the problem: Run the benchmark https://jsperf.com/utf-8-textdecoder2 What is the expected behavior? textdecoder.decode(uint8bytes.subarray(0, byteLength)) is much faster than the hand-written JS decoder. What went wrong? textdecoder.decode(uint8bytes.subarray(0, byteLength)) is very slow, which may indicate room for improvement in two places: 1. implementation of textdecoder.decode() 2. implementation of uint8bytes.subarray() Did this work before? N/A Does this work in other browsers? N/A Chrome version: 57.0.2987.21 Channel: beta OS Version: Ubuntu 14.04 Flash Version:
,
Feb 8 2017
,
Feb 8 2017
Benedikt, do you have someone in mind who could work on Uint8Array.subarray? The current implementation does not seem ideal.
,
Feb 8 2017
,
Feb 8 2017
The TextDecoder implementation does not use ICU for UTF-8. Like the rest of blink it uses a hand-rolled decoder in WTF - third_party/WebKit/Source/wtf/text/TextCodecUTF8.h UTF-8 decoding is mechanically quite simple and I suspect a JS implementation can be heavily optimized by V8, getting close to native speed as indicated by the test results (on my machine: 1,350,873ops/sec for native vs. 690,288ops/sec for JS). Given the profiling results, it looks like subarray() is where the time is going (as hinted at in #c4). I'd push on optimizing that before we extend the API surface area. Also, spec feature requests can be filed at: https://github.com/whatwg/encoding/issues
,
Feb 20 2017
As its already being worked on and seems TE team can't triage it removing Needs-Triage_M57 label and marked to M58.
,
Mar 21 2017
Any update on this?
,
Mar 21 2017
I had taken a look at this, but constructing the new typedarray was very slow in c++ or CSA builtins, due to SpeciesConstructor. I haven't got around to speeding it up as is done for Array Builtins, and that would probably not help usecases in Blink anyways. So, there was no real benefit. Another developer has been spending a lot more time on TypedArray code recently, and may be interested to take this on soon.
,
Mar 21 2017
cwhan.tunz@ implemented TypedArrayCreate and TypedArraySpeciesCreate which is needed for %TypedArray%.prototype.subarray in https://codereview.chromium.org/2763473002/ Can someone add him to /cc list as well?
,
Mar 21 2017
,
May 10 2017
Can the OP try Chrome Canary and see if the performance with native subarray() is closer to expected?
,
May 10 2017
(I'll give it a whirl myself when I'm not on a chromebook)
,
May 11 2017
FYI, numbers from latest canary (60.0.3096.0, Mac OS on a newish MacBook Pro) JavaScript: 2.2Mops/sec JavaScript (Array): 2.3Mops/sec TextDecoder: 2.2Mops/sec TextDecoder w/ Uint8Array conversion: 0.55Mops/sec TextDecoder Subarray: 1.0Mops/sec JSON: 4.5Mops/sec So subarray() is 2x faster than newing a Uint8Array, which... is pretty awesome. JS and native decoding is about equivalent on this machine - not a big surprise. Taking a slice with subarray() to pass to the native function is still slower. At least part of that must be GC. For kicks I made a variation that compares JS vs. native decoders, and passing the array vs. using subarray: JavaScript: 2.1Mops/sec TextDecoder: 2.1Mops/sec JavaScript w/ subarray: 1.7Mops/sec TextDecoder w/ subarray: 1.1Mops/sec That delta is unexpected. I'm guessing it's down to bindings? All that said... a question for the OP is: is the current performance acceptable (i.e. can we close this out) or is there a need to push farther (at the expense of other investment in Chrome/V8)?
,
May 17 2017
Thank you for the improvements, jsbell@! I am fine with stopping at here if further investigation has a higher bar and/or unlikely to be paid off.
,
Jun 9 2017
,
Jul 27 2017
,
Oct 9 2017
Oops, thanks for following up! Closing for now, then. It's a legitimate feature request against the Encoding spec's API, but there hasn't been a steady stream of requests for it.
,
Oct 9 2017
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by kbr@chromium.org
, Feb 8 2017Components: Blink>TextEncoding