UTF-8 decoding produces incorrect results when an erroneous byte sequence is split into multiple chunks |
|||||
Issue descriptionIn the orig. bug, this web site crashes V8: https://www.dallascounty.org/services/record-search/ The problem is in this script: https://www.dallascounty.org/web_resources/cm/common/js/perc_common_ui.js Near byte position: 180224 An invalid byte sequence occurs and is split between two chunks (of size 4096): 0b11100000 << lead << 0xe0 0b10100101 << cont << 0xa5 0b00111111 << ascii << 0x3f The bug is that TextCodecUTF8::HandlePartialSequence calls TextCodecUTF8::HandleError which assumes that each error consumes one byte from the byte stream and produces an invalid char. However, that ignores the fact that we need to consume multiple bytes (i.e., the maximal subpart). This bug is probably very old, but it was only recently exposed since a new V8 feature CHECKs that function positions are what we'd expect them to be. Normally, stuff works just fine if function positions are off by one.
,
Oct 10 2017
,
Oct 11 2017
,
Oct 11 2017
,
Oct 11 2017
Code is old enough that WebKit likely has the same lurking issue: https://trac.webkit.org/browser/webkit/trunk/Source/WebCore/platform/text/TextCodecUTF8.cpp Point them at a fix once we have one?
,
Oct 12 2017
https://chromium-review.googlesource.com/c/chromium/src/+/711846 is the fix, I'm trying to land it right now...
,
Oct 12 2017
https://bugs.webkit.org/show_bug.cgi?id=178207 << WebKit bug submitted.
,
Oct 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/79806338f4762c005b99808c8e97edf4d176b621 commit 79806338f4762c005b99808c8e97edf4d176b621 Author: Marja Hölttä <marja@chromium.org> Date: Thu Oct 12 13:10:07 2017 Fix UTF-8 decoding (invalid byte sequences crossing chunks). Bug: 773320 , 758236 Change-Id: Iac00d898d1ace857a98e635f3aebdd2f384755df Reviewed-on: https://chromium-review.googlesource.com/711846 Reviewed-by: Yuta Kitamura <yutak@chromium.org> Commit-Queue: Marja Hölttä <marja@chromium.org> Cr-Commit-Position: refs/heads/master@{#508326} [add] https://crrev.com/79806338f4762c005b99808c8e97edf4d176b621/third_party/WebKit/LayoutTests/fast/encoding/resources/utf-8-invalid-chars-at-chunk-boundary.js [add] https://crrev.com/79806338f4762c005b99808c8e97edf4d176b621/third_party/WebKit/LayoutTests/fast/encoding/utf-8-invalid-chars-at-chunk-boundary-expected.txt [add] https://crrev.com/79806338f4762c005b99808c8e97edf4d176b621/third_party/WebKit/LayoutTests/fast/encoding/utf-8-invalid-chars-at-chunk-boundary.html [modify] https://crrev.com/79806338f4762c005b99808c8e97edf4d176b621/third_party/WebKit/Source/platform/wtf/text/TextCodecUTF8.cpp [modify] https://crrev.com/79806338f4762c005b99808c8e97edf4d176b621/third_party/WebKit/Source/platform/wtf/text/TextCodecUTF8.h
,
Oct 13 2017
,
Oct 14 2017
> https://bugs.webkit.org/show_bug.cgi?id=178207 << WebKit bug submitted. Thanks for the bug! |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by marja@chromium.org
, Oct 10 2017