This was reported upstream to Mozilla at https://bugzilla.mozilla.org/show_bug.cgi?id=1364283 , which is currently restricted access. Let's keep it restricted until they open it?
Below are the report details reported to Mozilla originally:
Steps to reproduce:
VULNERABILITY DETAILS
Firefox should prevent the “Canadian Syllabics” unicode block from rendering in domain names with characters from other unicode blocks. This was observed in data found in the Certificate Transparency log while seeking to quantify the IDN impersonation/phishing problem (raw data attached).
REPRODUCTION CASE
There are a series of characters in the “CANADIAN SYLLABICS” unicode block which can be used to impersonate other domains. I believe mixing this block with other unicode blocks should be disallowed and the punycode value should be displayed. The characters within this set that I believe could be abused:
http://www.fileformat.info/info/unicode/block/unified_canadian_aboriginal_syllabics/list.htm
(I do not know the registration status of any of the domains below)
http://xn--youtue-084a.com/ -- youtuᖯe.com -- example domain
http://xn--youtbe-z72a.com/ -- youtᑌbe.com -- example domain
http://xn--uny-8wq.com/ -- ᑭuny.com -- example domain
http://xn--oor-hxq.com -- ᑯoor.com -- example domain
http://xn--ego-73q.com/ -- ᒪego.com -- example domain
http://xn--fc-lym.com/ -- fcᒿ.com -- example domain is not fc2.com (alexa top 1m #97) -- this is likely the hardest to see (based on the fonts I’m using)
http://xn--ulu-7sr.com/ -- ᕼulu.com -- example domain
http://invalid.xn--acebook-yp9a.com/ -- ᖴacebook.com -- example domain
Tested with Firefox Nightly 55.0a1 (2017-05-11) (32-bit)
This issue has also been reported to chromium
Actual results:
unicode domains are displayed when mixing the “CANADIAN SYLLABICS” unicode block with other unicode characters.
Expected results:
Punycode values should have been displayed when mixing the “CANADIAN SYLLABICS” unicode block with other unicode characters.
---- background ----
(please excuse the length of this report)
To form the attached lists, I cross referenced the Google CT Pilot log and the Alexa top 1 million domains (only .com domains).
There are a fair number of false positives (non-abusive domain impersonations or python unidecode failures), but I choose not to manually remove them.
---- Other unicode characters observed ----
ĸ, 22, 0x138, "LATIN SMALL LETTER KRA"
96074858, 1509667199, xn--faceboo-jhb.com, facebooĸ.com , ĸ, facebook.com, 3, 1
86142753, 1507679999, xn--autodes-jhb.com, autodesĸ.com , ĸ, autodesk.com, 697, 1
ł, 5, 0x142, "LATIN SMALL LETTER L WITH STROKE"
94011919, 1524055021, xn--ppe-8ka60c.com, àppłe.com , àł, apple.com, 69, 1
94724468, 1500291180, xn--sack-01a.com, słack.com , ł, slack.com, 205, 1
ı, 100, 0x131, "LATIN SMALL LETTER DOTLESS I"
18331655, 1488327078, xn--reddt-q4a.com, reddıt.com , ı, reddit.com, 7, 1
95900673, 1500493680, xn--t-fka.com, tı.com , ı, ti.com, 3235, 1
84518766, 1497998760, xn--gml-kua34j.com, gmȧıl.com , ȧı, gmail.com, 22463, 1
95900424, 1500493860, xn--fat-jua.com, fıat.com , ı, fiat.com, 54102, 1
94504694, 1509148799, xn--curacao-egamng-hgc.com, curacao-egamıng.com , ı, curacao-egaming.com, 524456, 1
94724500, 1500493920, xn--suzu-kza.com, ısuzu.com , ı, isuzu.com, 866480, 1
ì, 25, 0xec, "LATIN SMALL LETTER I WITH GRAVE"
95900680, 1500670920, xn--twttr-7raz.com, twìttèr.com , ìè, twitter.com, 11, 1
85019386, 1507161599, xn--polonex-3ya.com, polonìex.com , ì, poloniex.com, 1595, 1
83724035, 1497798600, xn--gma-pma40b.com, gmaìĺ.com , ìĺ, gmail.com, 22463, 1
---- Special case observed ---
2 interesting domains observed bypasses Chromium checks by using only cyrillic characters:
07022746, 1443571199, xn--80aac5cct.com, таобао.com , таобао, taobao.com, 10, 1
10303999, 1461542399, xn--e1anr4f.com, тіме.com , тіме, time.com, 817, 1
This issue has been raised on the internal UTC mailing list (so I presume Mark Davis has seen it). After some discussion, Mozilla engineers provided the following updates:
"""
It turns out the latest proposed update to UTS#31, aimed at Unicode 10.0, drops the "aspirational scripts" category and merges them into "limited use":
http://www.unicode.org/reports/tr31/tr31-26.html
This will make the UCAS characters no longer eligible to be mixed with Latin in labels, per the "Moderately Restrictive" profile in UTS#39.
I'd suggest, therefore, that we go ahead and make such a change in nsIDNService, in anticipation of the upcoming Unicode changes
"""
Comment 1 by elawrence@chromium.org
, May 23 2017