New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 710861 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 12
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Mac
Pri: 3
Type: Bug-Regression



Sign in to add a comment

Javascript file view displays garbled characters

Reported by hamay1...@gmail.com, Apr 12 2017

Issue description

UserAgent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36

Steps to reproduce the problem:
1. open URL
https://hamayapp.appspot.com/static/sp_interpreter_oldZ_v708.js
2. search a word '0x41' (by Ctrl-f) and goto 2nd search result (2/5)

What is the expected behavior?
Japanese kanji characters should be displayed correctly.

What went wrong?
Garbled Japanese kanji characters are displayed from line 1098 colmun 51.

Did this work before? N/A 

Does this work in other browsers? Yes

Chrome version: 57.0.2987.133  Channel: stable
OS Version: 6.3
Flash Version: 

I could not create a minimal case.
From line 1 to 1097 is no problem.
After line 1098, all Japanese kanji characters are garbled.
 
image2.png
61.5 KB View Download
Labels: Needs-Triage-M57 Needs-Bisect
Cc: jbanavatu@chromium.org
Labels: Needs-Feedback
Tested this issue on Windows 10 using reported chrome version # 57.0.2987.133 Stable & latest chrome version stable# 58.0.3029.81  by following steps mentioned below and Unable to repro this issue.

Repro Steps:
1.Navigated to URL
https://hamayapp.appspot.com/static/sp_interpreter_oldZ_v708.js
2. Using Ctrl+f searched for word '0x41' and went to 2nd search result (2/5)


hamay1010@ Could you please find the attachment and confirm if anything is missed in triaging the issue. Please try to upgrade to latest stable 58.0.3029.81 and update the thread if issue still exists.

Thanks!
Screenshot (16).png
440 KB View Download

Comment 3 by hamay1...@gmail.com, Apr 21 2017

I can reproduce the issue.

Windows 8.1 (64bit) (Language setting : Japanese)
Chrome : 58.0.3029.81 (64-bit)

image0012.png
54.5 KB View Download
image0013.png
49.7 KB View Download
image0014.png
71.5 KB View Download
Project Member

Comment 4 by sheriffbot@chromium.org, Apr 21 2017

Labels: -Needs-Feedback
Thank you for providing more feedback. Adding requester "jbanavatu@chromium.org" to the cc list and removing "Needs-Feedback" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Cc: sureshkumari@chromium.org
Labels: -Type-Bug -Pri-2 -Needs-Bisect -Needs-Triage-M57 hasbisect-per-revision M-60 OS-Linux OS-Mac Pri-1 Type-Bug-Regression
Owner: jinsuk...@chromium.org
Status: Assigned (was: Unconfirmed)
Able to reproduce the issue on Windows-7, Mac-10.12.4 and Linux Ubuntu-14.04 using chrome stable version 58.0.3029.81 and canary 60.0.3079.0. This is regression issue, broken in M54.
Manual Bisect:
--------------
Good Build-54.0.2803.0 -(Revision-406716)
Bad Build- 54.0.2804.0-(Revision-407025)
Per revision bisect Tool Info:
------------------------------
https://chromium.googlesource.com/chromium/src/+log/79f7b784a97cbb22f11064a05b621b0def87eab3..f0829bf6d80a9109b399580fe48d8c3e1c66eeed

Review-Url: https://codereview.chromium.org/1894913002
jinsukkim@ Kindly take a look and please help us to reassign this issue to a right owner if not with respect to this change.
Thanks.!
Status: WontFix (was: Assigned)
I see the screenshots in https://bugs.chromium.org/p/chromium/issues/detail?id=710861#c2 is different from the reported one in that all the Japanese characters in the document (not just from the second 0x41 and forward) are broken. In this sense, the bug is gone. I suspect the bug is same as  Issue 698605 , and got fixed in 58.0.3029.83.

The issue of all the broken chars is a different matter, an unfortunate side effect of an intended change. Please see https://bugs.chromium.org/p/chromium/issues/detail?id=691985#c3 for the background. Apology for the inconvenience it may cause. But I believe the overall direction that encourages web publishers to specify text encoding (preferably in http header) is desirable.

There are a couple of ways to view the unlabelled (i.e. without charset info) UTF-8:
1) Install Chrome text encoding extension (Google 'chrome encoding extension')
2) Save it locally and open the file - local files' encoding can detected without issue
 

Comment 7 by hamay1...@gmail.com, Apr 27 2017

1) I installed Chrome text encoding extension and selected UTF-8 menu but the problem is same as #3.

2) Save it locally and open the file is executing the javascript file. Dangerous for security.
   Save it locally and drag and drop the file into Chrome works well, but I don't want to do so every time.

I think one file should be guessed as one encoding.

If the guess is wrong, user should be able to select the encoding (by using extenshon is ok).

Current Chrome changes the encoding at the middle of one file automatically.
So user cannot do anything to solve the problem.

Status: Assigned (was: WontFix)
Reopening since the bug is reported to persist even with the extension.

Comment 9 by hamay1...@gmail.com, Apr 27 2017

I found a similar issue.
https://bugs.chromium.org/p/chromium/issues/detail?id=698078

I'm sorry, if it is not auto-detecting encoding problem.

hamary10101@ Would you mind trying out the attached text file at https://bugs.chromium.org/p/chromium/issues/detail?id=698605#c5  and see if the problem described there happens as well - i.e. the last 10 characters on line 207 return ÛÛßßÜÜÛллл ? That will help debug the issue.
One more thing I'd like to have your help for - would you also see if different choice of fonts change anything? Just want to rule out what is being looked into Issue 698078.

My testing with 59.0.3029.81 on Windows 10 shows:
 - all the chars are broken, not just partially. I'm not able to reproduce what you see.
 - only one extension (Encoding Menu) works but not the other one (Set Character Encoding) I don't know the inside of the extensions so this doesn't surprise me.


I tried #10 and attached a screen shot.
(It seems that line 207 is no problem but font is different)

I attached a rendered fonts information too.

Next, I will try #11.

image0021.png
74.2 KB View Download
image0022.png
83.2 KB View Download
I tried #11 and attached screen shots.
(the problem is same as #3.)

image0031.png
78.6 KB View Download
image0032.png
123 KB View Download
image0033.png
185 KB View Download
image0034.png
78.7 KB View Download
I found that when Chrome's language setting is 'English (United States)',
there are no problem.

When Chrome's language setting is 'Japanese', the problem occurs.

image0041.png
65.8 KB View Download
image0042.png
58.2 KB View Download
image0043.png
124 KB View Download
image0044.png
63.0 KB View Download
I found that Chrome has been updated and the problem can be reproduced only in the incognito window.
(58.0.3029.96 (64-bit) stable, automatically updated)

In the normal window, there seems to be no problem.
I don't know what has changed from 57.0.2987.133 ...

In the incognito window, if the page was cached, pressing Shift + F5 will reproduces the problem.

image0051.png
58.9 KB View Download
image0052.png
48.2 KB View Download
Cc: tkent@chromium.org
Unfortunately no one has been able to reproduce the issue yet except the reporter. All the Japanese get garbled and it's expected regardless of the Chrome version (57.0.2987 or 58.0.3029). I'm still clueless. 

cc'ing tkent@ in case he can shed some clue or even reproduce the issue if he has a Windows 10 machine available. tkent could you help? Maybe not encoding-related though.

Comment 17 by tkent@chromium.org, May 11 2017

Components: -Blink>ViewSource Blink>TextEncoding
I couldn't reproduce this with Windows 10, Japanese environment, the same extension, and Incognito window.

This might be related to network speed. Accessing *.appspot.com from Google offices may be much faster than usual.

Comment 18 Deleted

Comment 19 Deleted

Comment 20 Deleted

Comment 21 Deleted

Comment 22 Deleted

Labels: -Pri-1 -M-60 Pri-3
Lowering down the priority as this doesn't have a milestone.
The problem still occurs.

Chrome: 59.0.3071.115 (stable) (64bit)
OS: Windows 8.1 (64bit) (Language Setting: Japanese)

image0071.png
55.4 KB View Download
The problem occurs on android phone too.

Chrome: 59.0.3071.125
OS: Android 6.0.1; SO-02J Build/34.1.B.2.32

image0081.png
187 KB View Download
On developer tools (elements tab), a pre tag is separated by some blocks,
and garbled characters begin with the second block.

image0072.png
71.5 KB View Download

Comment 27 Deleted

The problem still occurs in the following environment.

Chrome: 60.0.3112.78 (stable) (64bit)
OS: Windows 8.1 (64bit) (Language Setting: Japanese)

image0077.png
54.7 KB View Download
The problem still occurs in the following environment.

Chrome: 61.0.3163.91 (stable) (64bit)
OS: Windows 8.1 (64bit) (Language Setting: Japanese)

image0082.png
56.2 KB View Download
The problem still occurs in the following environment.

Chrome: 62.0.3202.89 (stable) (64bit)
OS: Windows 8.1 (64bit) (Language Setting: Japanese)

Chrome: 62.0.3202.84
OS: Android 6.0.1; SO-02J Build/34.1.B.2.32

Six months have passed since the first report.

Recently, I traveled to Kyushu in Japan, and the same problem occurred.

I wonder  issue 597488  and  244358  should be reopened.

image0091.png
49.3 KB View Download
image0092.png
177 KB View Download
The problem still occurs in the following environment.

Chrome: 64.0.3282.167 (stable) (64bit)
OS: Windows 8.1 (64bit) (Language Setting: Japanese)

Chrome: 64.0.3282.137
OS: Android 6.0.1; SO-02J Build/34.1.B.2.32

image0101.png
56.7 KB View Download
image0102.png
182 KB View Download
The problem still occurs in the following environment.

Chrome: 66.0.3359.139 (stable) (64bit)
OS: Windows 8.1 (64bit) (Language Setting: Japanese)

Chrome: 66.0.3359.126
OS: Android 6.0.1; SO-02J Build/34.1.B.2.32

One year has passed since the first report.
I can reproduce the problem stably (100%).
I wonder  issue 597488  and  244358  should be reopened.

image0111.png
52.7 KB View Download
image0112.png
184 KB View Download
The problem still occurs in the following environment.

Chrome: 67.0.3396.99 (stable) (64bit)
OS: Windows 8.1 (64bit) (Language Setting: Japanese)

Chrome: 67.0.3396.87
OS: Android 6.0.1; SO-02J Build/34.1.B.2.32

One year and three months has passed since the first report.
I can reproduce the problem stably (100%).
I wonder  issue 597488  and  244358  should be reopened.

image0121.png
49.8 KB View Download
image0122.png
184 KB View Download
Status: Started (was: Assigned)
Will take another look. Having this problem also on Android means this has nothing to do with Windows OS or Japanese local on it. 
I can't reproduce the issue on Android with the latest version of Chrome as reported, but I found a clue. TL;DR; http://crrev.com/2697213002 has a side effect that could cause the reported bug. Will upload a fix.

TextResourceDecoder is a per-document object that is responsible for decoding the associated document which is divided by chunk when fed to TextResourceDecoder::Decode. Decoder performs content sniffing to detect encoding if necessary.

The encoding detection by content sniffing is done against the first chunk (max 1K if the chunk is bigger than that). If detection fails (i.e. an API |DetectEncoding| returning false) for any reason, it keeps using the default encoding and makes another attempts for the next chunks. Otherwise it switches to the returned encoding, and uses it for the rest of the document.

The CL above introduced a bug in this behavior, by regarding auto-detected UTF-8 as detection failure (with the rationale described in the linked bug thread). This allows TextResourceDecoder, when given an unlabelled UTF-8 document, to keep trying to sniff it beyond the first chunk. The reported URL (js file) is unlabelled UTF-8 which meets the all the conditions leading to this corner case.

It is okay for most of the documents since the detector works quite well, returning consistent results over all the subsequent chunks. But things can make unexpected turn if the detector makes a wrong guess for, say, the 10th chunk. Decoder will switch to the wrong encoding, and the rest of the document will be garbled. I believe that's what happens with the reported URL.

It may be 100% reproducible for the reporter but not always for others in different environment. My test on Android gives a js file in which all the Japanese letters are garbled, not partially from a certain offset. The first chunk with all ASCII letters was detected as UTF-8 (correct) but the next one was detected as GB18030 (wrong), which explains what I see. I think the reported bug is possible if the detector returns GB18030 for the chunk where the garbled letters first appear, and UTF-8 for the all chunks coming before that. I don't know why the detector makes a wrong guess - it's all based on probabilities which wouldn't guarantee 100% correct result anyway.

So the remedy is to recover the old behavior that stops sniffing once the detection returns a meaningful encoding, even for unlabelled UTF-8 documents like before. The only case that lets the sniffing continue should be when the detector returns 'unknown' encoding which really indicates that detection fails.

Project Member

Comment 36 by bugdroid1@chromium.org, Jul 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/efbdada3f735a2631e9dfeed5c5e3f880245ccd3

commit efbdada3f735a2631e9dfeed5c5e3f880245ccd3
Author: Jinsuk Kim <jinsukkim@chromium.org>
Date: Thu Jul 12 04:40:43 2018

Stop content sniffing after successful encoding detection

Content sniffing should stop once the detector makes a valid
guess. http://crrev.com/2697213002 introduced a side effect
that has the detector continue to sniff the content, therefore
opens the possibility of returning an encoding different from
the first guess. It leads to a document decoded with multiple
encoding schemes, one of which may not be correct.

This CL addresses it by defining a new flag |detection_completed_|
to tell the TextResourceDecoder to stop sniffing, even if
detector returns false for unlabelled UTF-8 documents. Also added
a test verifying the behavior.

Bug:  710861 
Change-Id: Ic07de3ae08fbb742aa3c24f1e18055348d6acbd8
Reviewed-on: https://chromium-review.googlesource.com/1132904
Reviewed-by: Kent Tamura <tkent@chromium.org>
Commit-Queue: Jinsuk Kim <jinsukkim@chromium.org>
Cr-Commit-Position: refs/heads/master@{#574489}
[modify] https://crrev.com/efbdada3f735a2631e9dfeed5c5e3f880245ccd3/third_party/blink/renderer/core/html/parser/text_resource_decoder.cc
[modify] https://crrev.com/efbdada3f735a2631e9dfeed5c5e3f880245ccd3/third_party/blink/renderer/core/html/parser/text_resource_decoder.h
[modify] https://crrev.com/efbdada3f735a2631e9dfeed5c5e3f880245ccd3/third_party/blink/renderer/core/html/parser/text_resource_decoder_test.cc
[modify] https://crrev.com/efbdada3f735a2631e9dfeed5c5e3f880245ccd3/third_party/blink/renderer/platform/text/text_encoding_detector.cc
[modify] https://crrev.com/efbdada3f735a2631e9dfeed5c5e3f880245ccd3/third_party/blink/renderer/platform/wtf/text/text_encoding.cc
[modify] https://crrev.com/efbdada3f735a2631e9dfeed5c5e3f880245ccd3/third_party/blink/renderer/platform/wtf/text/text_encoding.h

Status: Fixed (was: Started)
The fix got landed. hamay1010@ could you please test it to see if it really addresses the issue? The canary version will have the fix in a couple of days.
> The fix got landed. hamay1010@ could you please test it to see if it really addresses the issue? The canary version will have the fix in a couple of days.

Thank you. I will try it.

> It may be 100% reproducible for the reporter but not always for others in different environment. My test on Android gives a js file in which all the Japanese letters are garbled, not partially from a certain offset.

Uhm, I guess that you don't have any Japanese fonts (e.g. MS Gothic) in your PC, do you?

Any way, Thanks for your investigation and fix.

I'm glad to this progress!

> Uhm, I guess that you don't have any Japanese fonts (e.g. MS Gothic) in your PC, do you?

Android comes with Japanese locale/fonts preinstalled. The problem is because detector sees the text as encoded in GB18030 not UTF-8, which is unfortunate.
I confirmed that the issue was fixed in the following environment.

Chrome: 69.0.3493.0(Official Build)canary (64bit)
OS: Windows 8.1 (64bit) (Language Setting: Japanese)

Chrome: 69.0.3491.0 canary
OS: Android 6.0.1; SO-02J Build/34.1.B.2.32

Thanks! Really!

Sign in to add a comment