New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 645783 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

Non-standard PUA in GB18030 mapping leads to tofus on Android

Reported by arthur20...@gmail.com, Sep 10 2016

Issue description

Example URL:
https://rawgit.com/Artoria2e5/misc/HEAD/gb18030-24.html

Steps to reproduce the problem:
See https://github.com/whatwg/encoding/issues/27#issuecomment-246105676.

1. Open a webpage encoded in GB18030 that contains the mentioned 24 characters.
2. Harvest the delicious blocks of tofu.

What is the expected behavior?
It should work in all platforms.

What went wrong?
As detailed in the issue, there are 24 PUA codepoints in the official mapping table for GB18030:2005, all of which have well-established normal Unicode codepoints for as long as 10 years. These GB18030-defined PUA codepoints are missing from many OSes due to their private nature. While the PUA characters in the issue might seem perfectly normal on CrOS and Windows which both includes fonts to be GB 18030 compliant, Android does not have such fonts (as shown by PUA tofus on the UTF-8-encoded issue page).

Does it occur on multiple sites: Yes

Is it a problem with a plugin? No 

Did this work before? No 

Does this work in other browsers? No Basically every browser (see "other comments")

Chrome version: 52.0.2743.98  Channel: stable
OS Version: 5.1
Flash Version: 

The normal move for fixing the bug is not standard compliant until either whatwg or CNSAC updates their standard. Hence this bug report is only a note on a long-standing problem caused by historical problems in a certain standard.
 

Comment 1 by kochi@chromium.org, Sep 12 2016

Cc: kochi@chromium.org
Owner: js...@chromium.org
Status: Assigned (was: Unconfirmed)
Jungshik, could you advice who will be a good assignee?

Comment 2 by kochi@chromium.org, Sep 12 2016

Components: -Blink Blink>Fonts

Comment 3 by js...@chromium.org, Sep 16 2016

Very interesting that I have been arguing for getting rid of mapping to PUA for GB 18030 in the w3c bug tracker (encoding spec) recently.  

I wanted to do that last Dec/early January thinking that GB18030:2005 got rid of all those PUA mappings (just like HKSCS did around the same time). It turned out that GB 18030 did NOT do that even though all the characters mapped to PUA code poitns are ALL encoded as regular Unicode characters. 

Anyway, see https://github.com/whatwg/encoding/issues/27#issuecomment-246105676
and two other bugs referenced therein. 

Comment 4 by js...@chromium.org, Sep 16 2016

Cc: jsb...@chromium.org

Comment 5 by e...@chromium.org, Sep 8 2017

Components: -Blink>Fonts Blink>Loader
Blink uses utf-16 internally, transcoding happens earlier.

Sign in to add a comment