New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 682978 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Wrong encoding detection for chart.fisco.co.jp

Project Member Reported by tkent@chromium.org, Jan 20 2017

Issue description

Chrome Version: 55+
OS: All

Reported in Japanese help forum:
https://productforums.google.com/forum/?utm_medium=email&utm_source=footer#!msg/chrome-ja/IOEIO92rnRI/Obyc7HnRCQAJ

What steps will reproduce the problem?
(1) Open http://chart.fisco.co.jp/fisco/cgi-bin/index.cgi
(2) Observe

What is the expected result?
Readable Japanese text. The page should be decoded as "EUC-JP".

What happens instead?
Garbled text. The page is decoded as "GBK".

Please use labels and text to provide additional information.

This is not a regression. However users can't choose the correct encoding due to the removal of encoding menu.

I confirmed this can be resolved by passing the URL as |url_hint| parameter or passing JAPANESE as |language_hint| parameter to CompactEncDet::DetectEncoding().


 
actual.png
27.1 KB View Download
expected.png
26.4 KB View Download

Comment 1 by tkent@chromium.org, Jan 20 2017

Owner: tkent@chromium.org
Status: Started (was: Untriaged)
Oh I just checked the forum thread to find that the link was already shared among people.
Project Member

Comment 4 by bugdroid1@chromium.org, Jan 23 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/269b6bc66eb96f69d61d814ea7ec1ffef695eee3

commit 269b6bc66eb96f69d61d814ea7ec1ffef695eee3
Author: tkent <tkent@chromium.org>
Date: Mon Jan 23 03:19:54 2017

Pass more hints to encoding detector.

http://chart.fisco.co.jp/fisco/cgi-bin/index.cgi was incorrectly detected as GBK
though it should be EUC-JP. Passing URL or user language "ja" fixes this issue.

* Add |hintUrl| and |hintUserLanguage| arguments to detectTextEncoding().

* TextResourceDecoder passes defaultLanguage() to detectTextEncoding().

* TextResourceDecoderBuilder passes URL information to TextResourceDecoder, and
  TextResourceDecoder passes it to detectTextEncoding().

BUG= 682978 

Review-Url: https://codereview.chromium.org/2648703003
Cr-Commit-Position: refs/heads/master@{#445317}

[modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp
[modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.h
[modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/core/html/parser/TextResourceDecoderForFuzzing.h
[modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/core/loader/TextResourceDecoderBuilder.cpp
[modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp
[modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/platform/text/TextEncodingDetector.h
[modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/platform/text/TextEncodingDetectorTest.cpp

Comment 5 by tkent@chromium.org, Jan 23 2017

Status: Fixed (was: Started)

Comment 6 by tkent@chromium.org, Jan 23 2017

Labels: Hotlist-Japan

Comment 7 by tkent@chromium.org, Jan 26 2017

Labels: Merge-Request-57
Project Member

Comment 8 by sheriffbot@chromium.org, Jan 26 2017

Labels: -Merge-Request-57 Hotlist-Merge-Approved Merge-Approved-57
Your change meets the bar and is auto-approved for M57. Please go ahead and merge the CL to branch 2987 manually. Please contact milestone owner if you have questions.
Owners: amineer@(clank), cmasso@(bling), ketakid@(cros), govind@(desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 9 by bugdroid1@chromium.org, Jan 26 2017

Labels: -merge-approved-57 merge-merged-2987
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0ad98a1fcf9a049d325c235fd07d317cf31bcf37

commit 0ad98a1fcf9a049d325c235fd07d317cf31bcf37
Author: Kent Tamura <tkent@chromium.org>
Date: Thu Jan 26 04:38:42 2017

Merge "Pass more hints to encoding detector." to M57 branch

http://chart.fisco.co.jp/fisco/cgi-bin/index.cgi was incorrectly detected as GBK
though it should be EUC-JP. Passing URL or user language "ja" fixes this issue.

* Add |hintUrl| and |hintUserLanguage| arguments to detectTextEncoding().

* TextResourceDecoder passes defaultLanguage() to detectTextEncoding().

* TextResourceDecoderBuilder passes URL information to TextResourceDecoder, and
  TextResourceDecoder passes it to detectTextEncoding().

BUG= 682978 

Review-Url: https://codereview.chromium.org/2648703003
Cr-Commit-Position: refs/heads/master@{#445317}
(cherry picked from commit 269b6bc66eb96f69d61d814ea7ec1ffef695eee3)

Review-Url: https://codereview.chromium.org/2655203002 .
Cr-Commit-Position: refs/branch-heads/2987@{#99}
Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943}

[modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp
[modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.h
[modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/core/html/parser/TextResourceDecoderForFuzzing.h
[modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/core/loader/TextResourceDecoderBuilder.cpp
[modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp
[modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/platform/text/TextEncodingDetector.h
[modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/platform/text/TextEncodingDetectorTest.cpp

Cc: aelias@chromium.org
Status: Untriaged (was: Fixed)
Let me reopen this to ask for another look at it. URL as a hint looks fine, but language (system locale) may need reconsideration, since it goes against one of the principles (deterministic encoding detection output). I see that there's a benefit for those who have the right locale on theirs but at the same time it will give confusion to users as they will get different output from device to device they're on.

Is URL hint enough for this case? Should we consider not passing locale as a hint to encoding detector?

https://bugs.chromium.org/p/chromium/issues/detail?id=691985#c17

> Is URL hint enough for this case? Should we consider not passing locale as a hint to encoding detector?

Yes, URL hint is enough.  IMO, language hint for file: resources is necessary practically even if we aim for deterministic detection.

Cc: -jinsuk...@chromium.org tkent@chromium.org
Owner: jinsuk...@chromium.org
Status: Started (was: Untriaged)
Project Member

Comment 13 by bugdroid1@chromium.org, Apr 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9e51d07999515ab5f8567ba8b446e316137eb0b8

commit 9e51d07999515ab5f8567ba8b446e316137eb0b8
Author: jinsukkim <jinsukkim@chromium.org>
Date: Thu Apr 06 10:17:04 2017

Avoid using language hint in encoding detection

In general, avoid passing language hint to encoding detector.
This helps obtain more deterministic detection result regardless
of user system locale. Local file resources still benefit from
the hint, which is made an exception.

BUG= 682978 

Review-Url: https://codereview.chromium.org/2803563004
Cr-Commit-Position: refs/heads/master@{#462412}

[modify] https://crrev.com/9e51d07999515ab5f8567ba8b446e316137eb0b8/third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp
[modify] https://crrev.com/9e51d07999515ab5f8567ba8b446e316137eb0b8/third_party/WebKit/Source/platform/text/TextEncodingDetectorTest.cpp

Status: Fixed (was: Started)

Sign in to add a comment