Wrong encoding detection for chart.fisco.co.jp |
|||||||||
Issue descriptionChrome Version: 55+ OS: All Reported in Japanese help forum: https://productforums.google.com/forum/?utm_medium=email&utm_source=footer#!msg/chrome-ja/IOEIO92rnRI/Obyc7HnRCQAJ What steps will reproduce the problem? (1) Open http://chart.fisco.co.jp/fisco/cgi-bin/index.cgi (2) Observe What is the expected result? Readable Japanese text. The page should be decoded as "EUC-JP". What happens instead? Garbled text. The page is decoded as "GBK". Please use labels and text to provide additional information. This is not a regression. However users can't choose the correct encoding due to the removal of encoding menu. I confirmed this can be resolved by passing the URL as |url_hint| parameter or passing JAPANESE as |language_hint| parameter to CompactEncDet::DetectEncoding().
,
Jan 20 2017
A Chrome extension https://chrome.google.com/webstore/detail/set-character-encoding/bpojelgakakmcfmjfilgdlmhefphglae can come as a quick help.
,
Jan 20 2017
Oh I just checked the forum thread to find that the link was already shared among people.
,
Jan 23 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/269b6bc66eb96f69d61d814ea7ec1ffef695eee3 commit 269b6bc66eb96f69d61d814ea7ec1ffef695eee3 Author: tkent <tkent@chromium.org> Date: Mon Jan 23 03:19:54 2017 Pass more hints to encoding detector. http://chart.fisco.co.jp/fisco/cgi-bin/index.cgi was incorrectly detected as GBK though it should be EUC-JP. Passing URL or user language "ja" fixes this issue. * Add |hintUrl| and |hintUserLanguage| arguments to detectTextEncoding(). * TextResourceDecoder passes defaultLanguage() to detectTextEncoding(). * TextResourceDecoderBuilder passes URL information to TextResourceDecoder, and TextResourceDecoder passes it to detectTextEncoding(). BUG= 682978 Review-Url: https://codereview.chromium.org/2648703003 Cr-Commit-Position: refs/heads/master@{#445317} [modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp [modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.h [modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/core/html/parser/TextResourceDecoderForFuzzing.h [modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/core/loader/TextResourceDecoderBuilder.cpp [modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp [modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/platform/text/TextEncodingDetector.h [modify] https://crrev.com/269b6bc66eb96f69d61d814ea7ec1ffef695eee3/third_party/WebKit/Source/platform/text/TextEncodingDetectorTest.cpp
,
Jan 23 2017
,
Jan 23 2017
,
Jan 26 2017
,
Jan 26 2017
Your change meets the bar and is auto-approved for M57. Please go ahead and merge the CL to branch 2987 manually. Please contact milestone owner if you have questions. Owners: amineer@(clank), cmasso@(bling), ketakid@(cros), govind@(desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jan 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0ad98a1fcf9a049d325c235fd07d317cf31bcf37 commit 0ad98a1fcf9a049d325c235fd07d317cf31bcf37 Author: Kent Tamura <tkent@chromium.org> Date: Thu Jan 26 04:38:42 2017 Merge "Pass more hints to encoding detector." to M57 branch http://chart.fisco.co.jp/fisco/cgi-bin/index.cgi was incorrectly detected as GBK though it should be EUC-JP. Passing URL or user language "ja" fixes this issue. * Add |hintUrl| and |hintUserLanguage| arguments to detectTextEncoding(). * TextResourceDecoder passes defaultLanguage() to detectTextEncoding(). * TextResourceDecoderBuilder passes URL information to TextResourceDecoder, and TextResourceDecoder passes it to detectTextEncoding(). BUG= 682978 Review-Url: https://codereview.chromium.org/2648703003 Cr-Commit-Position: refs/heads/master@{#445317} (cherry picked from commit 269b6bc66eb96f69d61d814ea7ec1ffef695eee3) Review-Url: https://codereview.chromium.org/2655203002 . Cr-Commit-Position: refs/branch-heads/2987@{#99} Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943} [modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp [modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.h [modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/core/html/parser/TextResourceDecoderForFuzzing.h [modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/core/loader/TextResourceDecoderBuilder.cpp [modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp [modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/platform/text/TextEncodingDetector.h [modify] https://crrev.com/0ad98a1fcf9a049d325c235fd07d317cf31bcf37/third_party/WebKit/Source/platform/text/TextEncodingDetectorTest.cpp
,
Apr 6 2017
Let me reopen this to ask for another look at it. URL as a hint looks fine, but language (system locale) may need reconsideration, since it goes against one of the principles (deterministic encoding detection output). I see that there's a benefit for those who have the right locale on theirs but at the same time it will give confusion to users as they will get different output from device to device they're on. Is URL hint enough for this case? Should we consider not passing locale as a hint to encoding detector? https://bugs.chromium.org/p/chromium/issues/detail?id=691985#c17
,
Apr 6 2017
> Is URL hint enough for this case? Should we consider not passing locale as a hint to encoding detector? Yes, URL hint is enough. IMO, language hint for file: resources is necessary practically even if we aim for deterministic detection.
,
Apr 6 2017
,
Apr 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9e51d07999515ab5f8567ba8b446e316137eb0b8 commit 9e51d07999515ab5f8567ba8b446e316137eb0b8 Author: jinsukkim <jinsukkim@chromium.org> Date: Thu Apr 06 10:17:04 2017 Avoid using language hint in encoding detection In general, avoid passing language hint to encoding detector. This helps obtain more deterministic detection result regardless of user system locale. Local file resources still benefit from the hint, which is made an exception. BUG= 682978 Review-Url: https://codereview.chromium.org/2803563004 Cr-Commit-Position: refs/heads/master@{#462412} [modify] https://crrev.com/9e51d07999515ab5f8567ba8b446e316137eb0b8/third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp [modify] https://crrev.com/9e51d07999515ab5f8567ba8b446e316137eb0b8/third_party/WebKit/Source/platform/text/TextEncodingDetectorTest.cpp
,
Apr 6 2017
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by tkent@chromium.org
, Jan 20 2017Status: Started (was: Untriaged)