New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 796697 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac , Fuchsia
Pri: 3
Type: Bug



Sign in to add a comment

TextDecoder: streaming result for an invalid sequence is "delayed" ?

Project Member Reported by js...@chromium.org, Dec 20 2017

Issue description

rsk@google.com observed the following and wanted to check if I should ignore it or if it's a symptom of a larger issue.


const decoder = new TextDecoder();
let data = decoder.decode(Uint8Array.of(0xE2), {stream: true});
// data is "" as expected
data = decoder.decode(Uint8Array.of("1".codePointAt()), {stream: true});
// data is "" while I would expect "�1"
data = decoder.decode(Uint8Array.of("1".codePointAt()), {stream: true});
// now it realizes that first byte is incomplete and returns "�11"


 

Comment 1 by jsb...@chromium.org, Dec 20 2017

Labels: Hotlist-Interop
Firefox behaves as expected ("", "�1", "1") so it's an interop issue at least.
I may be misreading this, but I think the error-signalling should be faster here based on reading https://encoding.spec.whatwg.org/#utf-8-decoder

Comment 3 by jsb...@chromium.org, Dec 20 2017

Yeah, this is likely an expectation mismatch between TextDecoder and TextCodecUTF8 - I don't think we expose partial decodes in the platform anywhere else, so it wouldn't have been a "bug" in TextCodecUTF8 when originally written.

(Also FYI I verified it still repros in ToT)

Fix in TextCodecUTF8 will just involve pawing through the state machine a bit, and being performance sensitive. Plus a new WPT case. :)

I think this isn't exposing the partial decode, it's just outputting the '1' (and the preceding U+FFFD) in the same encoding step where the corresponding input byte appeared - even though the appearance of '1' in the input stream immediately forces the error and '1' does not indicate the start of a multi-byte sequence. In other words, it behaves as though our decoder still honors the stated UTF-8 multibyte sequence length from the xE2 even after encountering a following byte not allowed in the sequence.

The same thing happens for longer sequences:

const decoder = new TextDecoder();
const data = [];
data.push(decoder.decode(Uint8Array.of(0xF0), {stream: true}));
// data is [""] as expected
data.push(decoder.decode(Uint8Array.of('1'.charCodeAt()), {stream: true}));
// data is ["", ""] while I would expect ["", "�1"]
data.push(decoder.decode(Uint8Array.of('2'.charCodeAt()), {stream: true}));
// data is ["", "", ""] while I would expect ["", "�1", "2"]
data.push(decoder.decode(Uint8Array.of('3'.charCodeAt()), {stream: true}));
// data is ["", "", "", "�123"] while I would expect ["", "�1", "2", "3"];
// now it realizes that first byte is incomplete and returns "�"
data




Labels: -OS-iOS
iOS doesn't use Blink, so removing that platform.

Comment 6 by jsb...@chromium.org, Jan 19 2018

Labels: Hotlist-GoodFirstBug
Status: Available (was: Untriaged)
Marking "GoodFirstBug" because in theory this can be solved without much more context: (1) the test is straightforward (2) the code change will be constrained to TextCodecUTF8 and (3) there are plenty of potential reviewers.


Project Member

Comment 7 by sheriffbot@chromium.org, Yesterday (47 hours ago)

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 8 by jsbell@google.com, Today (22 hours ago)

Cc: -bsittler@chromium.org domfarolino@gmail.com
Status: Available (was: Untriaged)
Still a good bug.

(cc: domfarolino@ in case there's interest in pursuing a fix here)

Comment 9 by jsbell@google.com, Today (22 hours ago)

Cc: ricea@chromium.org

Sign in to add a comment