New issue
Advanced search Search tips

Issue 747558 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Encoding aliases with -html suffix are web-exposed

Project Member Reported by jsb...@chromium.org, Jul 21 2017

Issue description

The way chrome's ICU is set up to be compatible with "HTML5" (i.e. HTML and the Encoding standard, etc), we end up with "*-html" aliases for ICU-provided codecs and these are exposed to the web:

Repro:

1. Navigate to this URL:

data:text/html;charset=windows-874-html,<script>document.write(document.characterSet)</script>

Expected: windows-1252
Actual: windows-874

The '-html' suffix should make this an unrecognized encoding label and result in the default encoding for HTML (windows-1252), which is the behavior if something like 'no-such-encoding' is used. Instead, it is recognized as an alias for windows-874 so that shows up. Another repro:

1. Run this on the console:

new TextDecoder('windows-874-html').encoding

Works as expected in Firefox.

The list of such nonstandard aliases is:

big5-html
euc-jp-html
euc-kr-html
ibm866-html
iso-8859-10-html
iso-8859-13-html
iso-8859-14-html
iso-8859-15-html
iso-8859-16-html
iso-8859-2-html
iso-8859-3-html
iso-8859-4-html
iso-8859-5-html
iso-8859-6-html
iso-8859-7-html
iso-8859-8-html
koi8-r-html
koi8-u-html
macintosh-html
shift_jis-html
windows-1250-html
windows-1251-html
windows-1252-html
windows-1253-html
windows-1254-html
windows-1255-html
windows-1256-html
windows-1257-html
windows-1258-html
windows-874-html
x-mac-cyrillic-html

 

Comment 1 by jsb...@chromium.org, Jul 21 2017

Summary: Encoding aliases with -html suffix are web-exposed (was: Encoding aliases with -html are web-exposed)

Comment 2 by jsb...@chromium.org, Jul 21 2017

Blocking: 544228

Comment 3 by jsb...@chromium.org, Jul 21 2017

Blocking: -544228

Comment 4 by js...@chromium.org, Jul 25 2017

I'm afraid I have to keep '-html' in ICU's converter list. So, we have to filter them out in Blink. 
Project Member

Comment 5 by bugdroid1@chromium.org, Jul 27 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c998cf3857d76d0773b330adb2d9098453c53050

commit c998cf3857d76d0773b330adb2d9098453c53050
Author: Joshua Bell <jsbell@chromium.org>
Date: Thu Jul 27 23:40:53 2017

Text Encodings: Add test comparing supported vs. specified encodings

Adds a new window.internals API to get the list of supported encoding
labels, since that is not web-exposed. A test compares that against
the set of encoding labels from the Encoding Standard [1] using
a resource file from web-platform-tests.

We support all of the standardized labels, but have some extras:
* Deviations from the standard for GBK/GB18030 (crbug.com/339862)
* Extra UTF-8 aliases from TextCodecUTF8 (crbug.com/747562)
* Extra UTF-16 aliases from TextCodecUTF16 (crbug.com/747562)
* '-html' suffix aliases for standard encodings ( crbug.com/747558 )

[1] https://encoding.spec.whatwg.org/

Change-Id: I165b6c2aed2595cb9a87bd148a322b488087ff85

Bug: 339862, 747558 ,747562
Change-Id: I165b6c2aed2595cb9a87bd148a322b488087ff85
Reviewed-on: https://chromium-review.googlesource.com/581936
Commit-Queue: Joshua Bell <jsbell@chromium.org>
Reviewed-by: Jungshik Shin <jshin@chromium.org>
Reviewed-by: Kent Tamura <tkent@chromium.org>
Cr-Commit-Position: refs/heads/master@{#490119}
[add] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/LayoutTests/fast/encoding/supported-encodings-expected.txt
[add] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/LayoutTests/fast/encoding/supported-encodings.html
[modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/core/testing/Internals.cpp
[modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/core/testing/Internals.h
[modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/core/testing/Internals.idl
[modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/platform/wtf/text/TextEncodingRegistry.cpp
[modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/platform/wtf/text/TextEncodingRegistry.h

Comment 6 by phistuck@gmail.com, Jul 28 2017

Does it make sense to add a failing test (it has "FAIL" in its expectation file) and not mark it as failed (not marked as failed in TextExpectations)?

Comment 7 Deleted

Comment 9 by jsb...@chromium.org, Jul 28 2017

Or, if you are referring to "we have to filter them out in Blink" - the plan is to actually fix the issue but just on the Blink side (i.e. in platform/wtf/text/TextEncodingRegistry or something) rather than ICU.

Comment 10 by phistuck@gmail.com, Jul 28 2017

No, I am referring exactly to what you have. Since you know the results are wrong, why does the test pass? I would expect the -expected.txt file to contain the right results and the test to fail (and so have an entry in TextExpectations), rather than the -expected.txt file to contain the wrong result and not mark the test as one that fails.
Oh, got it.

For most layout tests we prefer reference expectations checked into the tree. Especially for script-based layout tests (those that contain assertions and record PASS/FAIL for test cases) this lets us more accurately understand why there are failures and detect changes with more granularity - if we fix 2 out of 10 failures we'll note that. Similarly, if we introduce 1 new failure when there are are already 10 FAIL entries we'll catch that, which we would not if there's just a TestExpectations entry.

Also, files that use testharness.js don't even have an -expected.txt file if everything passes. We only need it to annotate which cases are failing. So for most newer script-based layout tests - and anything from web-platform-tests -
 the mere presence of an -expected.txt file signifies some failure.

In other words, the test "passes" because Blink is behaving as expected (no-one has introduced an unexpected behavior change), not because it is conforming to the spec.

Comment 13 by phistuck@gmail.com, Jul 28 2017

#12 - got it. Thank you. :)
Status: Started (was: Available)
Project Member

Comment 15 by bugdroid1@chromium.org, Aug 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/cb1d5d3cf025cf1112c8aa01ebb6b5ce06294a19

commit cb1d5d3cf025cf1112c8aa01ebb6b5ce06294a19
Author: Joshua Bell <jsbell@chromium.org>
Date: Fri Aug 25 16:09:06 2017

Text Encodings: Don't expose non-standard *-html aliases

Chromium's build of ICU includes *-html aliases to manage the encoding
labels defined in the Encoding Standard, but these should not be
exposed to the web. Filter them out.

Bug:  747558 
Change-Id: I70dd034ca0a108ac8963243c0ac9e87090457614
Reviewed-on: https://chromium-review.googlesource.com/633902
Reviewed-by: Kent Tamura <tkent@chromium.org>
Commit-Queue: Joshua Bell <jsbell@chromium.org>
Cr-Commit-Position: refs/heads/master@{#497427}
[modify] https://crrev.com/cb1d5d3cf025cf1112c8aa01ebb6b5ce06294a19/third_party/WebKit/LayoutTests/fast/encoding/supported-encodings-expected.txt
[modify] https://crrev.com/cb1d5d3cf025cf1112c8aa01ebb6b5ce06294a19/third_party/WebKit/LayoutTests/fast/encoding/supported-encodings.html
[modify] https://crrev.com/cb1d5d3cf025cf1112c8aa01ebb6b5ce06294a19/third_party/WebKit/Source/platform/wtf/text/TextCodecICU.cpp

Status: Fixed (was: Started)

Comment 17 by phistuck@gmail.com, Aug 25 2017

I hope no one uses those encodings in the wild, wild web...
They've never been in other browsers, documented anywhere, or implemented in ICU, and just showed up in Chrome in the last couple of years. Since there's no way to enumerate the aliases via APIs, someone would have needed to find them via inspecting Chrome's modified copy of the ICU tables, knowing it would break in other browsers.

So... unlikely to be a real compat issue. 

Issue 747562 on the other hand deals with encodings that were also exposed in Safari (via webkit) for a long time, so that one requires more careful thought.

Comment 19 by phistuck@gmail.com, Aug 25 2017

#18 - I am thinking about code that was compiled to JavaScript that used the ICU API that was mapped to TextEncoder or something like that.
I agree that it reasonable that it is not used at all, but hopefully this is not breaking anyone, or a lot of websites.

Comment 20 by js...@chromium.org, Aug 29 2017

The chance of somebody using this would approach zero, I guess. Why would anybody use 'foo-html' on purpose when 'foo' works just fine? 

Comment 21 by phistuck@gmail.com, Aug 29 2017

#20 - existing code that was ported to the web, or whatever.
Or, in other words, for the same reason those exist in ICU. :)

But I agree with you the chance approaches zero.

Sign in to add a comment