New issue
Advanced search Search tips

Issue 747562 link

Starred by 3 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

TextCodecUTF8 and TextCodecUTF16 register nonstandard labels

Project Member Reported by jsb...@chromium.org, Jul 21 2017

Issue description

The Encoding Standard specifies what labels should be supported:

https://encoding.spec.whatwg.org/#names-and-labels

We have extras for a handful of codecs:

TextCodecUTF16::RegisterEncodingNames:

* csunicode
* ucs-2
* unicode
* iso-10646-ucs-2
* unicodefeff
* unicodefffe

TextCodecUTF8::RegisterEncodingNames:

* unicode11utf8
* unicode20utf8
* x-unicode20utf8

These are web-exposed, e.g. navigate to:

data:text/html;charset=unicode11utf8,<script>document.write(document.characterSet)</script>

Expected: windows-1252
Actual: UTF-8

Or run this on the console:

new TextDecoder('unicode11utf8').encoding

Expected: throws
Acutal: 'utf-8'

We should remove or standardize these additional labels.



 

Comment 1 by js...@chromium.org, Jul 25 2017

Let's just kill them in TextCodecUTF16.  

There might be a handful of web pages using those names, but the compat impact would be really small especially for UTF-16. 


Project Member

Comment 2 by bugdroid1@chromium.org, Jul 27 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c998cf3857d76d0773b330adb2d9098453c53050

commit c998cf3857d76d0773b330adb2d9098453c53050
Author: Joshua Bell <jsbell@chromium.org>
Date: Thu Jul 27 23:40:53 2017

Text Encodings: Add test comparing supported vs. specified encodings

Adds a new window.internals API to get the list of supported encoding
labels, since that is not web-exposed. A test compares that against
the set of encoding labels from the Encoding Standard [1] using
a resource file from web-platform-tests.

We support all of the standardized labels, but have some extras:
* Deviations from the standard for GBK/GB18030 (crbug.com/339862)
* Extra UTF-8 aliases from TextCodecUTF8 (crbug.com/747562)
* Extra UTF-16 aliases from TextCodecUTF16 (crbug.com/747562)
* '-html' suffix aliases for standard encodings ( crbug.com/747558 )

[1] https://encoding.spec.whatwg.org/

Change-Id: I165b6c2aed2595cb9a87bd148a322b488087ff85

Bug: 339862, 747558 ,747562
Change-Id: I165b6c2aed2595cb9a87bd148a322b488087ff85
Reviewed-on: https://chromium-review.googlesource.com/581936
Commit-Queue: Joshua Bell <jsbell@chromium.org>
Reviewed-by: Jungshik Shin <jshin@chromium.org>
Reviewed-by: Kent Tamura <tkent@chromium.org>
Cr-Commit-Position: refs/heads/master@{#490119}
[add] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/LayoutTests/fast/encoding/supported-encodings-expected.txt
[add] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/LayoutTests/fast/encoding/supported-encodings.html
[modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/core/testing/Internals.cpp
[modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/core/testing/Internals.h
[modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/core/testing/Internals.idl
[modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/platform/wtf/text/TextEncodingRegistry.cpp
[modify] https://crrev.com/c998cf3857d76d0773b330adb2d9098453c53050/third_party/WebKit/Source/platform/wtf/text/TextEncodingRegistry.h

Comment 3 by jsb...@chromium.org, Aug 25 2017

Note that WebKit still has the UTF-8 and UTF-16 aliases:

https://github.com/WebKit/webkit/blob/5277f6fb92b0c03958265d24a7692142f7bdeaf8/Source/WebCore/platform/text/TextCodecUTF8.cpp

https://github.com/WebKit/webkit/blob/5277f6fb92b0c03958265d24a7692142f7bdeaf8/Source/WebCore/platform/text/TextCodecUTF16.cpp

("Perhaps we can prove some are not used on the web and remove them." is noted with the UTF-8 ones.)

I didn't have Edge handy, but IE11 does not support the UTF-8 aliases (the UTF-16 aliases are harder to test; you also can't use the data: trick with IE/Edge)


Comment 4 by js...@chromium.org, Aug 29 2017

If IE 11 does not support any of UTF-8 aliases, I'd not worry about Edge (the chance of Edge supporting them is almost zero).

As for UTF-16 aliases,  the share (and even the number) of UTF-16 page is extremely small to start with. Multiplying that with the chance of them using 'interesting' labels listed here, we'd talk about a negligible (if non-zero) number of web pages. 

We can get the stats, but I don't think it's worth our time. 

Anyway, keeping them or not, is not terribly important. Either way is fine (standardizing them or removing them to be compliant with the current standard).
Project Member

Comment 5 by sheriffbot@chromium.org, Aug 30

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Status: Available (was: Untriaged)
Labels: -Hotlist-Recharge-Cold

Sign in to add a comment