Encoding aliases with -html suffix are web-exposed |
|||||
Issue description
The way chrome's ICU is set up to be compatible with "HTML5" (i.e. HTML and the Encoding standard, etc), we end up with "*-html" aliases for ICU-provided codecs and these are exposed to the web:
Repro:
1. Navigate to this URL:
data:text/html;charset=windows-874-html,<script>document.write(document.characterSet)</script>
Expected: windows-1252
Actual: windows-874
The '-html' suffix should make this an unrecognized encoding label and result in the default encoding for HTML (windows-1252), which is the behavior if something like 'no-such-encoding' is used. Instead, it is recognized as an alias for windows-874 so that shows up. Another repro:
1. Run this on the console:
new TextDecoder('windows-874-html').encoding
Works as expected in Firefox.
The list of such nonstandard aliases is:
big5-html
euc-jp-html
euc-kr-html
ibm866-html
iso-8859-10-html
iso-8859-13-html
iso-8859-14-html
iso-8859-15-html
iso-8859-16-html
iso-8859-2-html
iso-8859-3-html
iso-8859-4-html
iso-8859-5-html
iso-8859-6-html
iso-8859-7-html
iso-8859-8-html
koi8-r-html
koi8-u-html
macintosh-html
shift_jis-html
windows-1250-html
windows-1251-html
windows-1252-html
windows-1253-html
windows-1254-html
windows-1255-html
windows-1256-html
windows-1257-html
windows-1258-html
windows-874-html
x-mac-cyrillic-html
,
Jul 21 2017
,
Jul 21 2017
,
Jul 25 2017
I'm afraid I have to keep '-html' in ICU's converter list. So, we have to filter them out in Blink.
,
Jul 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c998cf3857d76d0773b330adb2d9098453c53050 commit c998cf3857d76d0773b330adb2d9098453c53050 Author: Joshua Bell <jsbell@chromium.org> Date: Thu Jul 27 23:40:53 2017 Text Encodings: Add test comparing supported vs. specified encodings Adds a new window.internals API to get the list of supported encoding labels, since that is not web-exposed. A test compares that against the set of encoding labels from the Encoding Standard [1] using a resource file from web-platform-tests. We support all of the standardized labels, but have some extras: * Deviations from the standard for GBK/GB18030 (crbug.com/339862) * Extra UTF-8 aliases from TextCodecUTF8 (crbug.com/747562) * Extra UTF-16 aliases from TextCodecUTF16 (crbug.com/747562) * '-html' suffix aliases for standard encodings ( crbug.com/747558 ) [1] https://encoding.spec.whatwg.org/ Change-Id: I165b6c2aed2595cb9a87bd148a322b488087ff85 Bug: 339862, 747558 ,747562 Change-Id: I165b6c2aed2595cb9a87bd148a322b488087ff85 Reviewed-on: https://chromium-review.googlesource.com/581936 Commit-Queue: Joshua Bell <jsbell@chromium.org> Reviewed-by: Jungshik Shin <jshin@chromium.org> Reviewed-by: Kent Tamura <tkent@chromium.org> Cr-Commit-Position: refs/heads/master@{#490119} [add] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/LayoutTests/fast/encoding/supported-encodings-expected.txt [add] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/LayoutTests/fast/encoding/supported-encodings.html [modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/core/testing/Internals.cpp [modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/core/testing/Internals.h [modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/core/testing/Internals.idl [modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/platform/wtf/text/TextEncodingRegistry.cpp [modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/platform/wtf/text/TextEncodingRegistry.h
,
Jul 28 2017
Does it make sense to add a failing test (it has "FAIL" in its expectation file) and not mark it as failed (not marked as failed in TextExpectations)?
,
Jul 28 2017
Re: #6 - isn't that what we have? https://cs.chromium.org/chromium/src/third_party/WebKit/LayoutTests/fast/encoding/supported-encodings-expected.txt?q=supported-enc&sq=package:chromium&l=1 Am I missing something?
,
Jul 28 2017
Or, if you are referring to "we have to filter them out in Blink" - the plan is to actually fix the issue but just on the Blink side (i.e. in platform/wtf/text/TextEncodingRegistry or something) rather than ICU.
,
Jul 28 2017
No, I am referring exactly to what you have. Since you know the results are wrong, why does the test pass? I would expect the -expected.txt file to contain the right results and the test to fail (and so have an entry in TextExpectations), rather than the -expected.txt file to contain the wrong result and not mark the test as one that fails.
,
Jul 28 2017
Oh, got it. For most layout tests we prefer reference expectations checked into the tree. Especially for script-based layout tests (those that contain assertions and record PASS/FAIL for test cases) this lets us more accurately understand why there are failures and detect changes with more granularity - if we fix 2 out of 10 failures we'll note that. Similarly, if we introduce 1 new failure when there are are already 10 FAIL entries we'll catch that, which we would not if there's just a TestExpectations entry. Also, files that use testharness.js don't even have an -expected.txt file if everything passes. We only need it to annotate which cases are failing. So for most newer script-based layout tests - and anything from web-platform-tests - the mere presence of an -expected.txt file signifies some failure.
,
Jul 28 2017
In other words, the test "passes" because Blink is behaving as expected (no-one has introduced an unexpected behavior change), not because it is conforming to the spec.
,
Jul 28 2017
#12 - got it. Thank you. :)
,
Aug 25 2017
,
Aug 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/cb1d5d3cf025cf1112c8aa01ebb6b5ce06294a19 commit cb1d5d3cf025cf1112c8aa01ebb6b5ce06294a19 Author: Joshua Bell <jsbell@chromium.org> Date: Fri Aug 25 16:09:06 2017 Text Encodings: Don't expose non-standard *-html aliases Chromium's build of ICU includes *-html aliases to manage the encoding labels defined in the Encoding Standard, but these should not be exposed to the web. Filter them out. Bug: 747558 Change-Id: I70dd034ca0a108ac8963243c0ac9e87090457614 Reviewed-on: https://chromium-review.googlesource.com/633902 Reviewed-by: Kent Tamura <tkent@chromium.org> Commit-Queue: Joshua Bell <jsbell@chromium.org> Cr-Commit-Position: refs/heads/master@{#497427} [modify] https://crrev.com/cb1d5d3cf025cf1112c8aa01ebb6b5ce06294a19/third_party/WebKit/LayoutTests/fast/encoding/supported-encodings-expected.txt [modify] https://crrev.com/cb1d5d3cf025cf1112c8aa01ebb6b5ce06294a19/third_party/WebKit/LayoutTests/fast/encoding/supported-encodings.html [modify] https://crrev.com/cb1d5d3cf025cf1112c8aa01ebb6b5ce06294a19/third_party/WebKit/Source/platform/wtf/text/TextCodecICU.cpp
,
Aug 25 2017
,
Aug 25 2017
I hope no one uses those encodings in the wild, wild web...
,
Aug 25 2017
They've never been in other browsers, documented anywhere, or implemented in ICU, and just showed up in Chrome in the last couple of years. Since there's no way to enumerate the aliases via APIs, someone would have needed to find them via inspecting Chrome's modified copy of the ICU tables, knowing it would break in other browsers. So... unlikely to be a real compat issue. Issue 747562 on the other hand deals with encodings that were also exposed in Safari (via webkit) for a long time, so that one requires more careful thought.
,
Aug 25 2017
#18 - I am thinking about code that was compiled to JavaScript that used the ICU API that was mapped to TextEncoder or something like that. I agree that it reasonable that it is not used at all, but hopefully this is not breaking anyone, or a lot of websites.
,
Aug 29 2017
The chance of somebody using this would approach zero, I guess. Why would anybody use 'foo-html' on purpose when 'foo' works just fine?
,
Aug 29 2017
#20 - existing code that was ported to the web, or whatever. Or, in other words, for the same reason those exist in ICU. :) But I agree with you the chance approaches zero. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by jsb...@chromium.org
, Jul 21 2017