New issue
Advanced search Search tips

Issue 651333 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2016
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug



Sign in to add a comment

Accents in ANSI encoded javascript provoke Uncaught SyntaxError: Unexpected identifier in follwing javascript

Reported by cdmaho...@gmail.com, Sep 29 2016

Issue description

UserAgent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2874.0 Safari/537.36

Steps to reproduce the problem:
1 Load a javascript file which includes accents in comments and saved with ANSI encoding. For example:
// Añadidos las opciones tileColumns, tileRows, showWarning
function TPrintMap(options)
{
}

2 Execute code from a separate javascript file:
var printMap = new TPrintMap({...});

What is the expected behavior?
Up until yesterday Canary executed the code without problems. 

What went wrong?
Canary developer tools show the error "Uncaught SyntaxError: Unexpected identifier" for the line of javascript code following the accent. In the above example "Añadidos" appears as "A�adidos" in developer tools, and the error is marked (squiggly red underline) for the text "Map(options)"

Did this work before? Yes Up until today (previous builds of Canary.)

Chrome version: 55.0.2874.0  Channel: canary
OS Version: 6.1 (Windows 7, Windows Server 2008 R2)
Flash Version: 

Saving the javascript file as UTF-8 solves the problem. Not sure if Chrome's behaviour is correct when processing incorrectly encoded javascript. Report problem just in case!
 
Components: -Blink Blink>JavaScript Blink>Loader
Labels: Needs-Feedback
Could you please provide us with a test case that may be used to isolate the change in Chrome? Or maybe you know which version of Chrome it worked in before (even what day would be useful).

Your report suggests it is very recent but I could find no plausible candidate between the most recent 2 Canary on Windows.

Comment 2 by cdmaho...@gmail.com, Sep 30 2016

Problem started yesterday (Thursday) with v55.0.2874.0, presumably downloaded on Wednesday. I don't know what the previous version was but it will have been the one downloaded on Tuesday, assuming there was the usual dail update.

I've reproduced the problem on v55.0.2875.0 using a stripped down page and javascript. The curious thing is the error only happens when the javascript file size reaches 30720 bytes - any smaller and everything works as expected (though in some cases I have been able to load slightly larger files if the first js file executed succesfully...)

I've attached the three files I've used for testing. The html loads two javascript files (ANSI encoding, including accents) and calls code in each of them. The js files are mostly padding, the slightly smaller files provokes no problems, the larger file provokes the unexpected error message. Changing the accented char for unaccented (in the case ñ for n) also stops the problem.

Tested on Windows 7 with IIS 7.5. Headers on the served js are:
HTTP/1.1 200 OK
Content-Type: application/x-javascript
Last-Modified: Fri, 30 Sep 2016 08:46:28 GMT
Accept-Ranges: bytes
ETag: "fb9b022f71ad21:0"
Vary: Accept-Encoding
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Fri, 30 Sep 2016 08:46:33 GMT
Content-Length: 30721
testansiOK.js
30.0 KB View Download
testansiFail.js
30.0 KB View Download
testencoding.html
483 bytes View Download
Labels: -Needs-Feedback Needs-Bisect
Cc: mmenke@chromium.org maksim.s...@intel.com
Labels: Needs-Feedback
Tested the same on win10 chrome version 55.0.2874.0 and 55.0.2878.0 - Observed the error as shown in the screenshot

Could you please let us know if i am missing something in reproducing the issue. A screenshot would be helpful.
651333.png
23.2 KB View Download
Attached is screenshot of error as seen in developer tools (version 55.0.2876.0 canary (64-bit)). Below is the message copied in text format.

Note that the error occurs when loading the page from a virtual folder. loading the page as a file does not provoke the error.


testansiOK.js:4 printMapANSIOK() undefined
2016-10-03 10:49:51.816 testansiFail.js:2 Uncaught SyntaxError: Unexpected identifier
    at testencoding.html:14
(anonymous) @ testencoding.html:14

Unexpected Identifier.png
33.8 KB View Download
https://codereview.chromium.org/2265873002 couldn't cause this.  Not only is net/ not response-body-encoding-aware, but that's a websockets change, and none of the test files even use websockets.

Comment 8 by woxxom@gmail.com, Oct 3 2016

@mmenke, then it might be the V8 update (the last one in bisect log) that has many parser-related changes:
https://chromium.googlesource.com/v8/v8/+log/7f777213..66c91bb5?pretty=fuller

Cc: -maksim.s...@intel.com -mmenke@chromium.org
Adding Javascript label, removing myself from bug.
Labels: TE-NeedsTriageFromMTV
Labels: -Pri-2 ReleaseBlock-Stable M-55 Pri-1
Status: Available (was: Unconfirmed)
Thanks for the repro. Reproduces fine on current Mac Canary 55.0.2880.0 canary (64-bit). Repro steps:

1.) Download the three files mentioned in #2
   a.) files with JS file ending go to ./js directory
   b.) HTML file goes to ./ directory
2.) "python -m SimpleHTTPServer 8000" in the ./ directory
3.) Open "localhost:8000" on Canary
4.) Open HTML file
5.) Open DevTools->Console
6.) See error message "Uncaught SyntaxError: Unexpected identifier
    at testencoding.html:14"

Please bisect.
Cc: vogelheim@chromium.org

Comment 13 by woxxom@gmail.com, Oct 4 2016

FWIW I've posted the bisect in #4, using the (obvious) steps mentioned in #11.
Cc: -vogelheim@chromium.org
Owner: vogelheim@chromium.org
Status: Started (was: Available)
#12: Thanks for cc:. I think this is indeed mine.

#0 / #2 / #4 / #11: Thanks for repro.

#4 / #13: Thanks for bisect. I disagree w/ the suspected CL, though. The V8 roll in the bisect range contains crrev.com/2314663002, which changes code that deals specifically w/ Latin1 input handling to V8.

The threshold for script streaming is 30 * 1024B. And since the 'ok' file is one byte less while the 'fail' is one byte more, this looks like a combination of Latin1 + script streaming. Which was changed in that V8 roll.


Ah, this bug is fun. And with "fun" I mean terrible. What happens is:

- Those resources are read as utf-8. But they're really Latin-1.
- The accented character (lower-case n w/ tilde) encoded as Latin 1 is invalid utf-8.
- The utf-8 decoder(s) replace it with the Unicode replacement character (0xFFFD)
- In the streaming case I took great care to decode utf-8 incrementally. And here's the bug:
- Only when reading the byte *after* the accented character can it find out that the utf-8 is invalid. The new code correctly returns 0xFFFD, but it also consumes the current character where it notices it, meaning the character after the n-tilde disappears.
- In other words:
  - source: Añad
  - w/o streaming: 65 65533 97 100
  - w/ streaming: 65 65533 100
- This is still fine, except now all character positions are off between the streaming parse & the final parse, meaning that sometimes later the compiler tries to read the source and is one byte off. That causes the "identifier expected" error message. (Which is correct from the compiler's point of view.)


An aside: Mis-declaring Latin1 as Utf-8 and hoping it works is a "don't do that" thing, but we should handle it gracefully. But whichever real world use case this is derived from should probably double-check their declared character encodings.


Not sure how to fix this, yet. The incremental utf-8 decoding is meant to return (at most) one character per byte position; but this means that in the byte following the invalid byte it would need to return two, namely the replacement character AND the following one.
Components: -Blink>Loader
Labels: -OS-Windows -TE-NeedsTriageFromMTV -Needs-Feedback -Arch-x86_64 -Needs-Bisect Arch-All OS-All
Status: Fixed (was: Started)
Fixed on tip of tree.
Sorry, hadn't looked at this for a while but can now confirm that the problem no longer occurrs with Version 56.0.2895.0 canary (64-bit).

Thanks!

Sign in to add a comment