Issue metadata
Sign in to add a comment
|
script mixing policy : switch from Moderately Restricitve to Highly Restrictive |
||||||||||||||||||||||||
Issue descriptionTwo Armenian letters (o-like and g-like) get special treatment. Depending on fonts, other Armenian characters look like Latin. If you look at the Unicode code chart, none of them look like Latin. ( http://www.unicode.org/charts/PDF/U0530.pdf ) OTOH, at http://unicode.org/cldr/utility/confusables.jsp?a=abcdefghijklmnopqrstuvwxyz0123456789&r=IDNA2008 , they look like Latin. Block them from mixing with Latin and block their Latin-counterparts from mixing with Armenian. For top domains, this does not matter because we use confusability skeleton to detect look-alike domain names and block them. This bug is filed to facilitate merging to 59 branch because 59 branch does not have a top-domain-skeleton-match mechanism.
,
May 27 2017
> Unicode Armenian letters except the letter 'և' (0587). U+0587 is has Identifier_Status=Restricted. So, it's blocked already. No action is necessary.
,
May 28 2017
Like .հայ (Armenian IDN TLD), .ไทย (http://www.thnic.or.th/dot-thai-policy-en/ ;Thai IDN TLD) does not allow mixing of Latin and Thai . Verisign also has a similar policy (its policy of allowing a large number of script=Common and script=Inherited needs to be addressed by revising Chrome's IDN policy): https://www.verisign.com/en_US/channel-resources/domain-registry-products/idn/idn-policy/registration-rules/index.xhtml ; section 3 I suspect that a lot of ccTDL-like IDNs have a similar policy. Exceptions are IDN ccTLDs of Japan, Korea, China, Taiwan and Hong Kong. They do allow mixing of ASCII Latin and their native scripts. That means that even though Chrome's IDN policy allows mixing of Latin and a script (other than Greek and Cyrillic), effectively there is no TLD+1 domain that would 'benefit' from it. Given that only CJK "IDN ccTLD' allows mixing of their native scripts and Latin. We can just do the same (i.e. use Highly Restrictive policy instead of Moderately Restrictive; the latter allows mixing of Latin and any script other than Greek and Cyrillic). We switched to 'Moderately Restrictive' from 'Highly Restrictive' to sync with Firefox. Recently, alarmed by bug 719199 , bug 722639 , we blocked a few cases of script mixing separately without knowing Verisign's policy. Given various national NIC's policy on IDN ccTLD's, switching back to 'highly restrictive' profile is not likely to hurt any "innocent" domains (because domain names that would be blocked by switching back cannot be registered anyway). Obviously, this would affect domain labels beyond TLD+1. Switching back to 'highly restrictive' would make unnecessary individual script + Latin mixture blocking in 'dangerous_patterns' regex (as was done for Canadian Syllabics, Tifinagh. I also plan to do that for Armenian). This would simplify our code. At the same time, it may block some 'innocent' labels beyond TLD+1, but the chance is pretty low.
,
May 29 2017
Hebrew domain name policy (Israel): does not allow mixing Hebrew characters and Latin. http://www.isoc.org.il/files/docs/ISOC-IL_Registration_Rules_v1.5_ENGLISH_-_26.6.2016.pdf https://www.icann.org/sites/default/files/packages/lgr/lgr-second-level-hebrew-30aug16-en.html --------- Chinese 2nd-level LGR has this (Japanese and Korean 2nd-level LGRs have a similar provision): Unlike many other non-Latin 2nd level reference LGRs, the Chinese LGR includes the basic ASCII Latin set (a to z) because it is common practice in Chinese text to mix Han and ASCII. Therefore it does not create confusability or additional security risks in the context of a second level LGR for the Chinese language. It is also supported by current IDNA practice, see [700], [701], and [702]. ---------- https://www.icann.org/resources/pages/second-level-lgr-2015-06-21-en -------------- Indian IDN policy (not sure if it's the latest. it's from 2009) http://meity.gov.in/writereaddata/files/India-IDN-Policy.pdf 3.B has this: B. NOT PERMISSIBLE 1. CODE-PAGE MIXING No mixing of scripts at a given level will NOT be allowed As an example, Latin-Devanagari mixed label is given. In addition, native Indic digits are not allowed. Interestingly, it also disallows ZWJ/ZWNJ. (that does not mean that other countries would do the same.). Moreover, it's published before IDNA 2008 was finalized. https://registry.in/Internationalized_Domain_Names_IDNs has a list of newer IDN policy documents, but each of them are tar.gz with a lot of gzipped files inside. I haven't managed to go through multiple layers of compression/archving. (e.g. https://registry.in/system/files/DEVANAGARI.tar_.gz has a lot of gzipped files inside).
,
May 29 2017
Removing the view restriction. Due/thanks to existing IDN policy, there's no risk factor opening up this bug to the public.
,
May 30 2017
,
Jun 21 2017
Issue 735210 has been merged into this issue.
,
Jun 21 2017
Based on comment 3, it would be nice to suggest to Mozilla that they also switch.
,
Aug 29 2017
,
Aug 29 2017
,
Aug 29 2017
,
Sep 14 2017
Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1399939 to sync up with Mozilla.
,
Sep 28 2017
https://chromium-review.googlesource.com/c/chromium/src/+/688825 is a draft CL. I'll add more tests from bugs blocked by this bug.
,
Sep 28 2017
The CL in comment 13 is out for review. In the meantime, Mozilla also made a switch in ToT (see the mozilla bug in comment 12 ).
,
Oct 4 2017
,
Oct 4 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fd34ee82420c5e5cb04459d6e381944979d8e571 commit fd34ee82420c5e5cb04459d6e381944979d8e571 Author: Jungshik Shin <jshin@chromium.org> Date: Wed Oct 04 23:25:49 2017 Change the script mixing policy to highly restrictive The current script mixing policy (moderately restricitive) allows mixing of Latin-ASCII and one non-Latin script (unless the non-Latin script is Cyrillic or Greek). This CL tightens up the policy to block mixing of Latin-ASCII and a non-Latin script unless the non-Latin script is Chinese (Hanzi, Bopomofo), Japanese (Kanji, Hiragana, Katakana) or Korean (Hangul, Hanja). Major gTLDs (.net/.org/.com) do not allow the registration of a domain that has both Latin and a non-Latin script. The only exception is names with Latin + Chinese/Japanese/Korean scripts. The same is true of ccTLDs with IDNs. Given the above registration rules of major gTLDs and ccTLDs, allowing mixing of Latin and non-Latin other than CJK has no practical effect. In the meantime, domain names in TLDs with a laxer policy on script mixing would be subject to a potential spoofing attempt with the current moderately restrictive script mixing policy. To protect users from those risks, there are a few ad-hoc rules in place. By switching to highly restrictive those ad-hoc rules can be removed simplifying the IDN display policy implementation a bit. This is also coordinated with Mozilla. See https://bugzilla.mozilla.org/show_bug.cgi?id=1399939 . BUG= 726950 , 756226 , 756456 , 756735 , 770465 TEST=components_unittests --gtest_filter=*IDN* Change-Id: Ib96d0d588f7fcda38ffa0ce59e98a5bd5b439116 Reviewed-on: https://chromium-review.googlesource.com/688825 Reviewed-by: Brett Wilson <brettw@chromium.org> Reviewed-by: Lucas Garron <lgarron@chromium.org> Commit-Queue: Jungshik Shin <jshin@chromium.org> Cr-Commit-Position: refs/heads/master@{#506561} [modify] https://crrev.com/fd34ee82420c5e5cb04459d6e381944979d8e571/components/url_formatter/idn_spoof_checker.cc [modify] https://crrev.com/fd34ee82420c5e5cb04459d6e381944979d8e571/components/url_formatter/url_formatter_unittest.cc
,
Oct 4 2017
,
Oct 10 2017
Issue 773051 has been merged into this issue.
,
Oct 16 2017
,
Oct 20 2017
,
Nov 14 2017
Issue 756886 has been merged into this issue.
,
Nov 14 2017
Issue 756866 has been merged into this issue.
,
Nov 14 2017
Issue 756977 has been merged into this issue.
,
Nov 14 2017
Issue 756947 has been merged into this issue.
,
Nov 14 2017
Issue 756893 has been merged into this issue.
,
Nov 14 2017
Issue 757180 has been merged into this issue. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by js...@chromium.org
, May 27 2017