Non-standard PUA in GB18030 mapping leads to tofus on Android
Reported by
arthur20...@gmail.com,
Sep 10 2016
|
||||
Issue descriptionExample URL: https://rawgit.com/Artoria2e5/misc/HEAD/gb18030-24.html Steps to reproduce the problem: See https://github.com/whatwg/encoding/issues/27#issuecomment-246105676. 1. Open a webpage encoded in GB18030 that contains the mentioned 24 characters. 2. Harvest the delicious blocks of tofu. What is the expected behavior? It should work in all platforms. What went wrong? As detailed in the issue, there are 24 PUA codepoints in the official mapping table for GB18030:2005, all of which have well-established normal Unicode codepoints for as long as 10 years. These GB18030-defined PUA codepoints are missing from many OSes due to their private nature. While the PUA characters in the issue might seem perfectly normal on CrOS and Windows which both includes fonts to be GB 18030 compliant, Android does not have such fonts (as shown by PUA tofus on the UTF-8-encoded issue page). Does it occur on multiple sites: Yes Is it a problem with a plugin? No Did this work before? No Does this work in other browsers? No Basically every browser (see "other comments") Chrome version: 52.0.2743.98 Channel: stable OS Version: 5.1 Flash Version: The normal move for fixing the bug is not standard compliant until either whatwg or CNSAC updates their standard. Hence this bug report is only a note on a long-standing problem caused by historical problems in a certain standard.
,
Sep 12 2016
,
Sep 16 2016
Very interesting that I have been arguing for getting rid of mapping to PUA for GB 18030 in the w3c bug tracker (encoding spec) recently. I wanted to do that last Dec/early January thinking that GB18030:2005 got rid of all those PUA mappings (just like HKSCS did around the same time). It turned out that GB 18030 did NOT do that even though all the characters mapped to PUA code poitns are ALL encoded as regular Unicode characters. Anyway, see https://github.com/whatwg/encoding/issues/27#issuecomment-246105676 and two other bugs referenced therein.
,
Sep 16 2016
,
Sep 8 2017
Blink uses utf-16 internally, transcoding happens earlier. |
||||
►
Sign in to add a comment |
||||
Comment 1 by kochi@chromium.org
, Sep 12 2016Owner: js...@chromium.org
Status: Assigned (was: Unconfirmed)