New issue
Advanced search Search tips
Starred by 2 users

Issue metadata

Status: Fixed
Closed: Aug 2016
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug

Sign in to add a comment

Issue 611817: Script Detection not working with ISO-639-3 langcodes (zho, yue, etc.)

Reported by, May 13 2016

Issue description

UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36

Example URL:

Steps to reproduce the problem:
1. (Optional) Install Advanced Font Settings and set some easy-to-spot fonts for the target script. For lzh and yue, the target script is Hant (Traditional Han).
2. Open the page with the language names specified in lang attribute
3. Check if the text is rendered with the specified font set for our target script.

What is the expected behavior?
The page should be rendered with the font specified for our target script.

What went wrong?
The page is rendered with the font for "Zyyy", which on Windows unfortunately triggers an awful FontLink fallback from Sans font to the Serif face "SimSun". On other OSs this would not look that bad, but it's still a problem.

Does it occur on multiple sites: Yes

Is it a problem with a plugin? No 

Did this work before? N/A 

Does this work in other browsers? N/A 

Chrome version: 50.0.2661.94  Channel: beta
OS Version: 10.0
Flash Version: Shockwave Flash 21.0 r0

Check out the ISO-639-3 macrolanguage list to find some non-zho three-letter languages.

Comment 1 by, May 13 2016

To see the correct behavior, just force the lang attribute into something that gets correctly interpreted in Chrome, like "zh-hant". I guess I will do some mediawiki userscript hack for now -- `for (i of document.querySelectorAll("[lang=lzh]")) i.lang='zh-hant'` works just fine.

Test cases for other zho languages:

yue: (Hant)
wuu: (mostly Hans, some Hant)

You can also construct examples for other zho languages like cnm.

Comment 2 by, May 16 2016

Components: -Blink Blink>TextEncoding

Comment 3 by, Jul 25 2016

Components: -Blink>TextEncoding Blink>Fonts
Not encoding (which is UTF-8 here); this is around scripts (i.e. font selection)

Comment 4 by, Jul 25 2016

Status: Assigned (was: Unconfirmed)

Comment 6 by, Jul 26 2016

Labels: -OS-Windows OS-All

Comment 8 by, Jul 26 2016

So...there are 14 languages in Chinese macrolanguage:

I have no idea which we should pick for these. Sent a query to W3C:

Comment 9 by, Jul 26 2016

Great to see this going. I am considering to send some technical suggestions at Village Pump so the Wikipedias can use some middle ground alternatives as zh-hant-zho or so. Will that be a valid language attribute value?

Comment 10 by, Jul 27 2016

#9: lots of discussion at W3C and we're close to conclusions. Please feel free to jump in if you're interested in.

> I am considering to send some technical suggestions at Village Pump so the Wikipedias can use some middle ground alternatives

Yes, that'd be great, since this is not something interoperability is guaranteed. I hope W3C publishes a note or QA to encourage but still not guaranteed.

> as zh-hant-zho or so. Will that be a valid language attribute value?

Close, but no. W3C provides a tool to check the validity:

Languages listed here:
are called "extlang", which has to come after "lang" and before "script", so the correct order is:

I understand it is a bit confusing, since "region" (such as "TW" or "HK") comes after script:
but "extlang" should come before "script" (such as "hans".)

Comment 11 by, Jul 28 2016

This is currently proposed default:

and the WG is discussing how to publish this list. Likely to be part of CLREQ:
but no conclusions yet.

Comment 12 by, Aug 2 2016

Project Member
The following revision refers to this bug:

commit 3ec109dc729a647b433136b2328e9da01ba9c8b8
Author: kojii <>
Date: Tue Aug 02 08:27:00 2016

More LayoutLocale refactor with additional Chinese support

Following the initial LayoutLocale refactoring CL[1], this patch:

1. Support 14 encompassed languages within the Chinese macrolanguage[2].
2. Add "mo" (Macau) as "Traditional by default", as pointed out by W3C
   I18N WG and match to Firefox.
3. Better and more spec conformance to parse BCP-47 language tags[3].
4. Change "und-Zsye" (Emoji) priority from the lowest to the highest.
5. Unify the logic for disambiguation of the Unified Han Ideographs for
   Linux/Android and Windows.
6. Merge duplicated code in AcceptLanguagesResolver to LayoutLocale.
7. Centralize locale-related methods more to LayoutLocale for better
   discoverability, caching, and code sharing.


BUG= 586517 ,  611817 

Cr-Commit-Position: refs/heads/master@{#409157}


Comment 13 by, Aug 2 2016

Status: Fixed (was: Assigned)

Sign in to add a comment