Script Detection not working with ISO-639-3 langcodes (zho, yue, etc.)
Reported by
arthur20...@gmail.com,
May 13 2016
|
||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36 Example URL: https://zh-classical.wikipedia.org/wiki/%E7%B6%AD%E5%9F%BA%E5%A4%A7%E5%85%B8:%E5%8D%B7%E9%A6%96 Steps to reproduce the problem: 1. (Optional) Install Advanced Font Settings and set some easy-to-spot fonts for the target script. For lzh and yue, the target script is Hant (Traditional Han). 2. Open the page with the language names specified in lang attribute 3. Check if the text is rendered with the specified font set for our target script. What is the expected behavior? The page should be rendered with the font specified for our target script. What went wrong? The page is rendered with the font for "Zyyy", which on Windows unfortunately triggers an awful FontLink fallback from Sans font to the Serif face "SimSun". On other OSs this would not look that bad, but it's still a problem. Does it occur on multiple sites: Yes Is it a problem with a plugin? No Did this work before? N/A Does this work in other browsers? N/A Chrome version: 50.0.2661.94 Channel: beta OS Version: 10.0 Flash Version: Shockwave Flash 21.0 r0 Check out the ISO-639-3 macrolanguage list to find some non-zho three-letter languages.
,
May 16 2016
,
Jul 25 2016
Not encoding (which is UTF-8 here); this is around scripts (i.e. font selection)
,
Jul 25 2016
,
Jul 26 2016
,
Jul 26 2016
,
Jul 26 2016
Gecko bug for wuu: https://bugzilla.mozilla.org/show_bug.cgi?id=1244404
,
Jul 26 2016
So...there are 14 languages in Chinese macrolanguage: http://www-01.sil.org/iso639-3/documentation.asp?id=zho I have no idea which we should pick for these. Sent a query to W3C: http://lists.w3.org/Archives/Public/public-i18n-cjk/2016JulSep/0000.html
,
Jul 26 2016
Great to see this going. I am considering to send some technical suggestions at Village Pump so the Wikipedias can use some middle ground alternatives as zh-hant-zho or so. Will that be a valid language attribute value?
,
Jul 27 2016
#9: lots of discussion at W3C and we're close to conclusions. Please feel free to jump in if you're interested in. http://lists.w3.org/Archives/Public/public-i18n-cjk/2016JulSep/thread.html > I am considering to send some technical suggestions at Village Pump so the Wikipedias can use some middle ground alternatives Yes, that'd be great, since this is not something interoperability is guaranteed. I hope W3C publishes a note or QA to encourage but still not guaranteed. > as zh-hant-zho or so. Will that be a valid language attribute value? Close, but no. W3C provides a tool to check the validity: http://r12a.github.io/apps/subtags/?check=zh-hans-yue Languages listed here: http://www-01.sil.org/iso639-3/documentation.asp?id=zho are called "extlang", which has to come after "lang" and before "script", so the correct order is: "zh-yue-hans" I understand it is a bit confusing, since "region" (such as "TW" or "HK") comes after script: "zh-hans-TW" but "extlang" should come before "script" (such as "hans".)
,
Jul 28 2016
This is currently proposed default: http://lists.w3.org/Archives/Public/public-i18n-cjk/2016JulSep/0022.html and the WG is discussing how to publish this list. Likely to be part of CLREQ: http://w3c.github.io/clreq/ but no conclusions yet.
,
Aug 2 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3ec109dc729a647b433136b2328e9da01ba9c8b8 commit 3ec109dc729a647b433136b2328e9da01ba9c8b8 Author: kojii <kojii@chromium.org> Date: Tue Aug 02 08:27:00 2016 More LayoutLocale refactor with additional Chinese support Following the initial LayoutLocale refactoring CL[1], this patch: 1. Support 14 encompassed languages within the Chinese macrolanguage[2]. 2. Add "mo" (Macau) as "Traditional by default", as pointed out by W3C I18N WG and match to Firefox. 3. Better and more spec conformance to parse BCP-47 language tags[3]. 4. Change "und-Zsye" (Emoji) priority from the lowest to the highest. 5. Unify the logic for disambiguation of the Unified Han Ideographs for Linux/Android and Windows. 6. Merge duplicated code in AcceptLanguagesResolver to LayoutLocale. 7. Centralize locale-related methods more to LayoutLocale for better discoverability, caching, and code sharing. [1] https://codereview.chromium.org/2161683002 [2] http://www-01.sil.org/iso639-3/documentation.asp?id=zho [3] https://tools.ietf.org/html/bcp47 BUG= 586517 , 611817 Review-Url: https://codereview.chromium.org/2192703002 Cr-Commit-Position: refs/heads/master@{#409157} [modify] https://crrev.com/3ec109dc729a647b433136b2328e9da01ba9c8b8/third_party/WebKit/Source/platform/LayoutLocale.cpp [modify] https://crrev.com/3ec109dc729a647b433136b2328e9da01ba9c8b8/third_party/WebKit/Source/platform/LayoutLocale.h [modify] https://crrev.com/3ec109dc729a647b433136b2328e9da01ba9c8b8/third_party/WebKit/Source/platform/LayoutLocaleTest.cpp [modify] https://crrev.com/3ec109dc729a647b433136b2328e9da01ba9c8b8/third_party/WebKit/Source/platform/blink_platform.gypi [modify] https://crrev.com/3ec109dc729a647b433136b2328e9da01ba9c8b8/third_party/WebKit/Source/platform/fonts/AcceptLanguagesResolver.cpp [modify] https://crrev.com/3ec109dc729a647b433136b2328e9da01ba9c8b8/third_party/WebKit/Source/platform/fonts/AcceptLanguagesResolver.h [modify] https://crrev.com/3ec109dc729a647b433136b2328e9da01ba9c8b8/third_party/WebKit/Source/platform/fonts/AcceptLanguagesResolverTest.cpp [modify] https://crrev.com/3ec109dc729a647b433136b2328e9da01ba9c8b8/third_party/WebKit/Source/platform/fonts/skia/FontCacheSkia.cpp [modify] https://crrev.com/3ec109dc729a647b433136b2328e9da01ba9c8b8/third_party/WebKit/Source/platform/fonts/win/FontCacheSkiaWin.cpp [modify] https://crrev.com/3ec109dc729a647b433136b2328e9da01ba9c8b8/third_party/WebKit/Source/platform/fonts/win/FontFallbackWin.cpp [modify] https://crrev.com/3ec109dc729a647b433136b2328e9da01ba9c8b8/third_party/WebKit/Source/platform/fonts/win/FontFallbackWin.h [modify] https://crrev.com/3ec109dc729a647b433136b2328e9da01ba9c8b8/third_party/WebKit/Source/platform/text/LocaleToScriptMapping.cpp [modify] https://crrev.com/3ec109dc729a647b433136b2328e9da01ba9c8b8/third_party/WebKit/Source/platform/text/LocaleToScriptMapping.h [delete] https://crrev.com/7095e1641402c57f32d2cc952b4357af13ff8dfc/third_party/WebKit/Source/platform/text/LocaleToScriptMappingTest.cpp
,
Aug 2 2016
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by arthur20...@gmail.com
, May 13 2016To see the correct behavior, just force the lang attribute into something that gets correctly interpreted in Chrome, like "zh-hant". I guess I will do some mediawiki userscript hack for now -- `for (i of document.querySelectorAll("[lang=lzh]")) i.lang='zh-hant'` works just fine. Test cases for other zho languages: yue: https://zh-yue.wikipedia.org/wiki/%E9%A0%AD%E7%89%88 (Hant) wuu: https://wuu.wikipedia.org/wiki/%E5%B0%81%E9%9D%A2 (mostly Hans, some Hant) You can also construct examples for other zho languages like cnm.