Script streaming: invalid handling for U+FEFF inside UTF-8 scripts |
|||
Issue description
If the script in question doesn't contain a BOM, but contains the character U+FEFF ("zero-width no-break space") as the first non-ascii character of the first chunk, it's handled incorrectly.
This happens in Utf8ChunkSource::ConvertToUtf16: first there are some ascii chars which are copied to the buffer. Then kUtf8Bom occurs, and ConvertToUtf16 happily skips it because it doesn't realize we're not at the beginning.
The fix is trivial.
This bug was only exposed because FLAG_experimental_preparser_scope_analysis CHECKs exact source positions - normally if positions are off-by-one in situations where the relevant tokens are surrounded with whitespace, nobody notices. :/
,
Aug 24 2017
,
Aug 25 2017
commit 4e45342994fb8d526a25c404a06c56ec7e5b2b9c Author: Marja Hölttä <marja@chromium.org> Date: Thu Aug 24 13:31:03 2017 +0200 [script streaming] Fix U+feff handling. U+feff is the UTF BOM but if it occurs inside the text, it's a "zero-width no-break space". However, the UTF-8 decoder in script streaming still thought it's a BOM and skipped it. The correct way to handle it would be to create a U+feff code point instead - the Scanner will then handle it as whitespace. This is a discrepancy between the Blink UTF-8 decoder and the V8 UTF-8 decoder, and caused the source positions be off by one. This bug went unnoticed, since normally off-by-one in this situation doesn't make the code to break. BUG= chromium:758508 ,chromium:758236 Change-Id: Ib92a3ee65c402e21b77e42537db2a021cff55379 Reviewed-on: https://chromium-review.googlesource.com/632096 Reviewed-by: Adam Klein <adamk@chromium.org> Commit-Queue: Marja Hölttä <marja@chromium.org> Cr-Commit-Position: refs/heads/master@{#47583} |
|||
►
Sign in to add a comment |
|||
Comment 1 by marja@chromium.org
, Aug 24 2017