Javascript file view displays garbled characters
Reported by
hamay1...@gmail.com,
Apr 12 2017
|
|||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36 Steps to reproduce the problem: 1. open URL https://hamayapp.appspot.com/static/sp_interpreter_oldZ_v708.js 2. search a word '0x41' (by Ctrl-f) and goto 2nd search result (2/5) What is the expected behavior? Japanese kanji characters should be displayed correctly. What went wrong? Garbled Japanese kanji characters are displayed from line 1098 colmun 51. Did this work before? N/A Does this work in other browsers? Yes Chrome version: 57.0.2987.133 Channel: stable OS Version: 6.3 Flash Version: I could not create a minimal case. From line 1 to 1097 is no problem. After line 1098, all Japanese kanji characters are garbled.
,
Apr 21 2017
Tested this issue on Windows 10 using reported chrome version # 57.0.2987.133 Stable & latest chrome version stable# 58.0.3029.81 by following steps mentioned below and Unable to repro this issue. Repro Steps: 1.Navigated to URL https://hamayapp.appspot.com/static/sp_interpreter_oldZ_v708.js 2. Using Ctrl+f searched for word '0x41' and went to 2nd search result (2/5) hamay1010@ Could you please find the attachment and confirm if anything is missed in triaging the issue. Please try to upgrade to latest stable 58.0.3029.81 and update the thread if issue still exists. Thanks!
,
Apr 21 2017
I can reproduce the issue. Windows 8.1 (64bit) (Language setting : Japanese) Chrome : 58.0.3029.81 (64-bit)
,
Apr 21 2017
Thank you for providing more feedback. Adding requester "jbanavatu@chromium.org" to the cc list and removing "Needs-Feedback" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 25 2017
Able to reproduce the issue on Windows-7, Mac-10.12.4 and Linux Ubuntu-14.04 using chrome stable version 58.0.3029.81 and canary 60.0.3079.0. This is regression issue, broken in M54. Manual Bisect: -------------- Good Build-54.0.2803.0 -(Revision-406716) Bad Build- 54.0.2804.0-(Revision-407025) Per revision bisect Tool Info: ------------------------------ https://chromium.googlesource.com/chromium/src/+log/79f7b784a97cbb22f11064a05b621b0def87eab3..f0829bf6d80a9109b399580fe48d8c3e1c66eeed Review-Url: https://codereview.chromium.org/1894913002 jinsukkim@ Kindly take a look and please help us to reassign this issue to a right owner if not with respect to this change. Thanks.!
,
Apr 26 2017
I see the screenshots in https://bugs.chromium.org/p/chromium/issues/detail?id=710861#c2 is different from the reported one in that all the Japanese characters in the document (not just from the second 0x41 and forward) are broken. In this sense, the bug is gone. I suspect the bug is same as Issue 698605 , and got fixed in 58.0.3029.83. The issue of all the broken chars is a different matter, an unfortunate side effect of an intended change. Please see https://bugs.chromium.org/p/chromium/issues/detail?id=691985#c3 for the background. Apology for the inconvenience it may cause. But I believe the overall direction that encourages web publishers to specify text encoding (preferably in http header) is desirable. There are a couple of ways to view the unlabelled (i.e. without charset info) UTF-8: 1) Install Chrome text encoding extension (Google 'chrome encoding extension') 2) Save it locally and open the file - local files' encoding can detected without issue
,
Apr 27 2017
1) I installed Chrome text encoding extension and selected UTF-8 menu but the problem is same as #3. 2) Save it locally and open the file is executing the javascript file. Dangerous for security. Save it locally and drag and drop the file into Chrome works well, but I don't want to do so every time. I think one file should be guessed as one encoding. If the guess is wrong, user should be able to select the encoding (by using extenshon is ok). Current Chrome changes the encoding at the middle of one file automatically. So user cannot do anything to solve the problem.
,
Apr 27 2017
Reopening since the bug is reported to persist even with the extension.
,
Apr 27 2017
I found a similar issue. https://bugs.chromium.org/p/chromium/issues/detail?id=698078 I'm sorry, if it is not auto-detecting encoding problem.
,
Apr 27 2017
hamary10101@ Would you mind trying out the attached text file at https://bugs.chromium.org/p/chromium/issues/detail?id=698605#c5 and see if the problem described there happens as well - i.e. the last 10 characters on line 207 return ÛÛßßÜÜÛллл ? That will help debug the issue.
,
Apr 27 2017
One more thing I'd like to have your help for - would you also see if different choice of fonts change anything? Just want to rule out what is being looked into Issue 698078. My testing with 59.0.3029.81 on Windows 10 shows: - all the chars are broken, not just partially. I'm not able to reproduce what you see. - only one extension (Encoding Menu) works but not the other one (Set Character Encoding) I don't know the inside of the extensions so this doesn't surprise me.
,
Apr 28 2017
I tried #10 and attached a screen shot. (It seems that line 207 is no problem but font is different) I attached a rendered fonts information too. Next, I will try #11.
,
Apr 28 2017
I tried #11 and attached screen shots. (the problem is same as #3.)
,
Apr 28 2017
I found that when Chrome's language setting is 'English (United States)', there are no problem. When Chrome's language setting is 'Japanese', the problem occurs.
,
May 6 2017
I found that Chrome has been updated and the problem can be reproduced only in the incognito window. (58.0.3029.96 (64-bit) stable, automatically updated) In the normal window, there seems to be no problem. I don't know what has changed from 57.0.2987.133 ... In the incognito window, if the page was cached, pressing Shift + F5 will reproduces the problem.
,
May 10 2017
Unfortunately no one has been able to reproduce the issue yet except the reporter. All the Japanese get garbled and it's expected regardless of the Chrome version (57.0.2987 or 58.0.3029). I'm still clueless. cc'ing tkent@ in case he can shed some clue or even reproduce the issue if he has a Windows 10 machine available. tkent could you help? Maybe not encoding-related though.
,
May 11 2017
I couldn't reproduce this with Windows 10, Japanese environment, the same extension, and Incognito window. This might be related to network speed. Accessing *.appspot.com from Google offices may be much faster than usual.
,
Jun 14 2017
Lowering down the priority as this doesn't have a milestone.
,
Jul 23 2017
The problem still occurs. Chrome: 59.0.3071.115 (stable) (64bit) OS: Windows 8.1 (64bit) (Language Setting: Japanese)
,
Jul 23 2017
The problem occurs on android phone too. Chrome: 59.0.3071.125 OS: Android 6.0.1; SO-02J Build/34.1.B.2.32
,
Jul 23 2017
On developer tools (elements tab), a pre tag is separated by some blocks, and garbled characters begin with the second block.
,
Jul 28 2017
The problem still occurs in the following environment. Chrome: 60.0.3112.78 (stable) (64bit) OS: Windows 8.1 (64bit) (Language Setting: Japanese)
,
Sep 17 2017
The problem still occurs in the following environment. Chrome: 61.0.3163.91 (stable) (64bit) OS: Windows 8.1 (64bit) (Language Setting: Japanese)
,
Nov 9 2017
The problem still occurs in the following environment. Chrome: 62.0.3202.89 (stable) (64bit) OS: Windows 8.1 (64bit) (Language Setting: Japanese) Chrome: 62.0.3202.84 OS: Android 6.0.1; SO-02J Build/34.1.B.2.32 Six months have passed since the first report. Recently, I traveled to Kyushu in Japan, and the same problem occurred. I wonder issue 597488 and 244358 should be reopened.
,
Feb 21 2018
The problem still occurs in the following environment. Chrome: 64.0.3282.167 (stable) (64bit) OS: Windows 8.1 (64bit) (Language Setting: Japanese) Chrome: 64.0.3282.137 OS: Android 6.0.1; SO-02J Build/34.1.B.2.32
,
May 9 2018
The problem still occurs in the following environment. Chrome: 66.0.3359.139 (stable) (64bit) OS: Windows 8.1 (64bit) (Language Setting: Japanese) Chrome: 66.0.3359.126 OS: Android 6.0.1; SO-02J Build/34.1.B.2.32 One year has passed since the first report. I can reproduce the problem stably (100%). I wonder issue 597488 and 244358 should be reopened.
,
Jul 10
The problem still occurs in the following environment. Chrome: 67.0.3396.99 (stable) (64bit) OS: Windows 8.1 (64bit) (Language Setting: Japanese) Chrome: 67.0.3396.87 OS: Android 6.0.1; SO-02J Build/34.1.B.2.32 One year and three months has passed since the first report. I can reproduce the problem stably (100%). I wonder issue 597488 and 244358 should be reopened.
,
Jul 11
Will take another look. Having this problem also on Android means this has nothing to do with Windows OS or Japanese local on it.
,
Jul 11
I can't reproduce the issue on Android with the latest version of Chrome as reported, but I found a clue. TL;DR; http://crrev.com/2697213002 has a side effect that could cause the reported bug. Will upload a fix. TextResourceDecoder is a per-document object that is responsible for decoding the associated document which is divided by chunk when fed to TextResourceDecoder::Decode. Decoder performs content sniffing to detect encoding if necessary. The encoding detection by content sniffing is done against the first chunk (max 1K if the chunk is bigger than that). If detection fails (i.e. an API |DetectEncoding| returning false) for any reason, it keeps using the default encoding and makes another attempts for the next chunks. Otherwise it switches to the returned encoding, and uses it for the rest of the document. The CL above introduced a bug in this behavior, by regarding auto-detected UTF-8 as detection failure (with the rationale described in the linked bug thread). This allows TextResourceDecoder, when given an unlabelled UTF-8 document, to keep trying to sniff it beyond the first chunk. The reported URL (js file) is unlabelled UTF-8 which meets the all the conditions leading to this corner case. It is okay for most of the documents since the detector works quite well, returning consistent results over all the subsequent chunks. But things can make unexpected turn if the detector makes a wrong guess for, say, the 10th chunk. Decoder will switch to the wrong encoding, and the rest of the document will be garbled. I believe that's what happens with the reported URL. It may be 100% reproducible for the reporter but not always for others in different environment. My test on Android gives a js file in which all the Japanese letters are garbled, not partially from a certain offset. The first chunk with all ASCII letters was detected as UTF-8 (correct) but the next one was detected as GB18030 (wrong), which explains what I see. I think the reported bug is possible if the detector returns GB18030 for the chunk where the garbled letters first appear, and UTF-8 for the all chunks coming before that. I don't know why the detector makes a wrong guess - it's all based on probabilities which wouldn't guarantee 100% correct result anyway. So the remedy is to recover the old behavior that stops sniffing once the detection returns a meaningful encoding, even for unlabelled UTF-8 documents like before. The only case that lets the sniffing continue should be when the detector returns 'unknown' encoding which really indicates that detection fails.
,
Jul 12
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/efbdada3f735a2631e9dfeed5c5e3f880245ccd3 commit efbdada3f735a2631e9dfeed5c5e3f880245ccd3 Author: Jinsuk Kim <jinsukkim@chromium.org> Date: Thu Jul 12 04:40:43 2018 Stop content sniffing after successful encoding detection Content sniffing should stop once the detector makes a valid guess. http://crrev.com/2697213002 introduced a side effect that has the detector continue to sniff the content, therefore opens the possibility of returning an encoding different from the first guess. It leads to a document decoded with multiple encoding schemes, one of which may not be correct. This CL addresses it by defining a new flag |detection_completed_| to tell the TextResourceDecoder to stop sniffing, even if detector returns false for unlabelled UTF-8 documents. Also added a test verifying the behavior. Bug: 710861 Change-Id: Ic07de3ae08fbb742aa3c24f1e18055348d6acbd8 Reviewed-on: https://chromium-review.googlesource.com/1132904 Reviewed-by: Kent Tamura <tkent@chromium.org> Commit-Queue: Jinsuk Kim <jinsukkim@chromium.org> Cr-Commit-Position: refs/heads/master@{#574489} [modify] https://crrev.com/efbdada3f735a2631e9dfeed5c5e3f880245ccd3/third_party/blink/renderer/core/html/parser/text_resource_decoder.cc [modify] https://crrev.com/efbdada3f735a2631e9dfeed5c5e3f880245ccd3/third_party/blink/renderer/core/html/parser/text_resource_decoder.h [modify] https://crrev.com/efbdada3f735a2631e9dfeed5c5e3f880245ccd3/third_party/blink/renderer/core/html/parser/text_resource_decoder_test.cc [modify] https://crrev.com/efbdada3f735a2631e9dfeed5c5e3f880245ccd3/third_party/blink/renderer/platform/text/text_encoding_detector.cc [modify] https://crrev.com/efbdada3f735a2631e9dfeed5c5e3f880245ccd3/third_party/blink/renderer/platform/wtf/text/text_encoding.cc [modify] https://crrev.com/efbdada3f735a2631e9dfeed5c5e3f880245ccd3/third_party/blink/renderer/platform/wtf/text/text_encoding.h
,
Jul 12
The fix got landed. hamay1010@ could you please test it to see if it really addresses the issue? The canary version will have the fix in a couple of days.
,
Jul 13
> The fix got landed. hamay1010@ could you please test it to see if it really addresses the issue? The canary version will have the fix in a couple of days. Thank you. I will try it. > It may be 100% reproducible for the reporter but not always for others in different environment. My test on Android gives a js file in which all the Japanese letters are garbled, not partially from a certain offset. Uhm, I guess that you don't have any Japanese fonts (e.g. MS Gothic) in your PC, do you? Any way, Thanks for your investigation and fix. I'm glad to this progress!
,
Jul 13
> Uhm, I guess that you don't have any Japanese fonts (e.g. MS Gothic) in your PC, do you? Android comes with Japanese locale/fonts preinstalled. The problem is because detector sees the text as encoded in GB18030 not UTF-8, which is unfortunate.
,
Jul 16
I confirmed that the issue was fixed in the following environment. Chrome: 69.0.3493.0(Official Build)canary (64bit) OS: Windows 8.1 (64bit) (Language Setting: Japanese) Chrome: 69.0.3491.0 canary OS: Android 6.0.1; SO-02J Build/34.1.B.2.32 Thanks! Really! |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by nyerramilli@chromium.org
, Apr 17 2017