Accents in ANSI encoded javascript provoke Uncaught SyntaxError: Unexpected identifier in follwing javascript
Reported by
cdmaho...@gmail.com,
Sep 29 2016
|
||||||||||
Issue description
UserAgent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2874.0 Safari/537.36
Steps to reproduce the problem:
1 Load a javascript file which includes accents in comments and saved with ANSI encoding. For example:
// Añadidos las opciones tileColumns, tileRows, showWarning
function TPrintMap(options)
{
}
2 Execute code from a separate javascript file:
var printMap = new TPrintMap({...});
What is the expected behavior?
Up until yesterday Canary executed the code without problems.
What went wrong?
Canary developer tools show the error "Uncaught SyntaxError: Unexpected identifier" for the line of javascript code following the accent. In the above example "Añadidos" appears as "A�adidos" in developer tools, and the error is marked (squiggly red underline) for the text "Map(options)"
Did this work before? Yes Up until today (previous builds of Canary.)
Chrome version: 55.0.2874.0 Channel: canary
OS Version: 6.1 (Windows 7, Windows Server 2008 R2)
Flash Version:
Saving the javascript file as UTF-8 solves the problem. Not sure if Chrome's behaviour is correct when processing incorrectly encoded javascript. Report problem just in case!
,
Sep 30 2016
Problem started yesterday (Thursday) with v55.0.2874.0, presumably downloaded on Wednesday. I don't know what the previous version was but it will have been the one downloaded on Tuesday, assuming there was the usual dail update. I've reproduced the problem on v55.0.2875.0 using a stripped down page and javascript. The curious thing is the error only happens when the javascript file size reaches 30720 bytes - any smaller and everything works as expected (though in some cases I have been able to load slightly larger files if the first js file executed succesfully...) I've attached the three files I've used for testing. The html loads two javascript files (ANSI encoding, including accents) and calls code in each of them. The js files are mostly padding, the slightly smaller files provokes no problems, the larger file provokes the unexpected error message. Changing the accented char for unaccented (in the case ñ for n) also stops the problem. Tested on Windows 7 with IIS 7.5. Headers on the served js are: HTTP/1.1 200 OK Content-Type: application/x-javascript Last-Modified: Fri, 30 Sep 2016 08:46:28 GMT Accept-Ranges: bytes ETag: "fb9b022f71ad21:0" Vary: Accept-Encoding Server: Microsoft-IIS/7.5 X-Powered-By: ASP.NET Date: Fri, 30 Sep 2016 08:46:33 GMT Content-Length: 30721
,
Sep 30 2016
,
Sep 30 2016
BISECT: 419248 (known good) - 419263 (first known bad). https://chromium.googlesource.com/chromium/src/+log/284a2b2247742cfa9e7039dbce4bf75900272bd6..343264820eb254bdab3dbeef0082e8e1bb307532?pretty=fuller Suspecting https://codereview.chromium.org/2265873002
,
Oct 3 2016
Tested the same on win10 chrome version 55.0.2874.0 and 55.0.2878.0 - Observed the error as shown in the screenshot Could you please let us know if i am missing something in reproducing the issue. A screenshot would be helpful.
,
Oct 3 2016
Attached is screenshot of error as seen in developer tools (version 55.0.2876.0 canary (64-bit)). Below is the message copied in text format.
Note that the error occurs when loading the page from a virtual folder. loading the page as a file does not provoke the error.
testansiOK.js:4 printMapANSIOK() undefined
2016-10-03 10:49:51.816 testansiFail.js:2 Uncaught SyntaxError: Unexpected identifier
at testencoding.html:14
(anonymous) @ testencoding.html:14
,
Oct 3 2016
https://codereview.chromium.org/2265873002 couldn't cause this. Not only is net/ not response-body-encoding-aware, but that's a websockets change, and none of the test files even use websockets.
,
Oct 3 2016
@mmenke, then it might be the V8 update (the last one in bisect log) that has many parser-related changes: https://chromium.googlesource.com/v8/v8/+log/7f777213..66c91bb5?pretty=fuller
,
Oct 3 2016
Adding Javascript label, removing myself from bug.
,
Oct 4 2016
,
Oct 4 2016
Thanks for the repro. Reproduces fine on current Mac Canary 55.0.2880.0 canary (64-bit). Repro steps:
1.) Download the three files mentioned in #2
a.) files with JS file ending go to ./js directory
b.) HTML file goes to ./ directory
2.) "python -m SimpleHTTPServer 8000" in the ./ directory
3.) Open "localhost:8000" on Canary
4.) Open HTML file
5.) Open DevTools->Console
6.) See error message "Uncaught SyntaxError: Unexpected identifier
at testencoding.html:14"
Please bisect.
,
Oct 4 2016
,
Oct 4 2016
FWIW I've posted the bisect in #4, using the (obvious) steps mentioned in #11.
,
Oct 5 2016
#12: Thanks for cc:. I think this is indeed mine. #0 / #2 / #4 / #11: Thanks for repro. #4 / #13: Thanks for bisect. I disagree w/ the suspected CL, though. The V8 roll in the bisect range contains crrev.com/2314663002, which changes code that deals specifically w/ Latin1 input handling to V8. The threshold for script streaming is 30 * 1024B. And since the 'ok' file is one byte less while the 'fail' is one byte more, this looks like a combination of Latin1 + script streaming. Which was changed in that V8 roll.
,
Oct 5 2016
Ah, this bug is fun. And with "fun" I mean terrible. What happens is: - Those resources are read as utf-8. But they're really Latin-1. - The accented character (lower-case n w/ tilde) encoded as Latin 1 is invalid utf-8. - The utf-8 decoder(s) replace it with the Unicode replacement character (0xFFFD) - In the streaming case I took great care to decode utf-8 incrementally. And here's the bug: - Only when reading the byte *after* the accented character can it find out that the utf-8 is invalid. The new code correctly returns 0xFFFD, but it also consumes the current character where it notices it, meaning the character after the n-tilde disappears. - In other words: - source: Añad - w/o streaming: 65 65533 97 100 - w/ streaming: 65 65533 100 - This is still fine, except now all character positions are off between the streaming parse & the final parse, meaning that sometimes later the compiler tries to read the source and is one byte off. That causes the "identifier expected" error message. (Which is correct from the compiler's point of view.) An aside: Mis-declaring Latin1 as Utf-8 and hoping it works is a "don't do that" thing, but we should handle it gracefully. But whichever real world use case this is derived from should probably double-check their declared character encodings. Not sure how to fix this, yet. The incremental utf-8 decoding is meant to return (at most) one character per byte position; but this means that in the byte following the invalid byte it would need to return two, namely the replacement character AND the following one.
,
Oct 5 2016
,
Oct 5 2016
The following revision refers to this bug: https://chromium.googlesource.com/v8/v8.git/+/138127a60895be2c1d7b6bea9be194d307e5b91e commit 138127a60895be2c1d7b6bea9be194d307e5b91e Author: vogelheim <vogelheim@chromium.org> Date: Wed Oct 05 17:18:36 2016 Fix bad-char handling in utf-8 streaming streams. Also add test. R=jochen@chromium.org BUG= chromium:651333 , v8:4947 Review-Url: https://codereview.chromium.org/2391273002 Cr-Commit-Position: refs/heads/master@{#40004} [modify] https://crrev.com/138127a60895be2c1d7b6bea9be194d307e5b91e/src/parsing/scanner-character-streams.cc [modify] https://crrev.com/138127a60895be2c1d7b6bea9be194d307e5b91e/src/unicode.cc [modify] https://crrev.com/138127a60895be2c1d7b6bea9be194d307e5b91e/src/unicode.h [modify] https://crrev.com/138127a60895be2c1d7b6bea9be194d307e5b91e/test/cctest/parsing/test-scanner-streams.cc
,
Oct 6 2016
Fixed on tip of tree.
,
Oct 20 2016
Sorry, hadn't looked at this for a while but can now confirm that the problem no longer occurrs with Version 56.0.2895.0 canary (64-bit). Thanks! |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by schenney@chromium.org
, Sep 29 2016Labels: Needs-Feedback