New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 8 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug
Team-Security-UX


Sign in to add a comment
link

Issue 726950: script mixing policy : switch from Moderately Restricitve to Highly Restrictive

Reported by js...@chromium.org, May 27 2017 Project Member

Issue description

Two Armenian letters (o-like and g-like) get special treatment. Depending on fonts, other Armenian characters look like Latin.

If you look at the Unicode code chart, none of them look like Latin. 
( http://www.unicode.org/charts/PDF/U0530.pdf )
OTOH, at http://unicode.org/cldr/utility/confusables.jsp?a=abcdefghijklmnopqrstuvwxyz0123456789&r=IDNA2008 , they look like Latin. 


Block them from mixing with Latin and block their Latin-counterparts from mixing with Armenian. 

For top domains, this does not matter because we use confusability skeleton to detect look-alike domain names and block them. 

This bug is filed to facilitate merging to 59 branch because 59 branch does not have a top-domain-skeleton-match mechanism.
 

Comment 1 by js...@chromium.org, May 27 2017

Summary: Tighten up IDN policy on Armenian + Latin (was: Tighten up IDN policy on Armenian + Latin and hyphen-like characters)
Hyphen-like characters were dealt with earlier this year. 

https://www.amnic.net/policy/en/ has the following policy:

.am domain name might contain only Latin 0-9 numbers, '-' (dash) and ASCII English letters.
.հայ domain name might contain only Latin 0-9 numbers, '-' (dash) and Unicode Armenian letters except the letter 'և' (0587).

Basically, for .հայ , Amnic does not allow mixing Latin and Armenian. So, my CL ( https://codereview.chromium.org/2895103003/ ) should not affect .հայ because it's less restrictive than the Amnic policy.

Comment 2 by js...@chromium.org, May 27 2017

> Unicode Armenian letters except the letter 'և' (0587).

U+0587 is has Identifier_Status=Restricted. So, it's blocked already. No action is necessary.

Comment 3 by js...@chromium.org, May 28 2017

Cc: kenrb@chromium.org emilyschechter@chromium.org lgar...@chromium.org
Summary: Review script mixing policy : Moderately Restricitve vs Highly Restrictive (was: Tighten up IDN policy on Armenian + Latin )
Like .հայ  (Armenian IDN TLD), .ไทย (http://www.thnic.or.th/dot-thai-policy-en/ ;Thai IDN TLD) does not allow mixing of Latin and Thai .

Verisign also has a similar policy (its policy of allowing a large number of script=Common and script=Inherited needs to be addressed by revising Chrome's IDN policy):  https://www.verisign.com/en_US/channel-resources/domain-registry-products/idn/idn-policy/registration-rules/index.xhtml  ; section 3 


I suspect that a lot of ccTDL-like IDNs have a similar policy. Exceptions are IDN ccTLDs of Japan, Korea, China, Taiwan and Hong Kong. They do allow mixing of ASCII Latin and their native scripts.  

That means that even though Chrome's IDN policy allows mixing of Latin and a script (other than Greek and Cyrillic), effectively there is no TLD+1 domain that would 'benefit' from it. 


Given that only CJK "IDN ccTLD' allows mixing of their native scripts and Latin. We can just do the same (i.e. use Highly Restrictive policy instead of Moderately Restrictive; the latter allows mixing of Latin and any script other than Greek and Cyrillic). 

We switched to 'Moderately Restrictive' from 'Highly Restrictive' to sync with Firefox. 

Recently, alarmed by  bug 719199  ,  bug 722639 ,  we blocked a few cases of script mixing separately without knowing Verisign's policy.  

Given various national NIC's policy on IDN ccTLD's, switching back to 'highly restrictive' profile is not likely to hurt any "innocent" domains (because domain names that would be blocked by switching back cannot be registered anyway). Obviously, this would affect domain labels beyond TLD+1. 

Switching back to 'highly restrictive' would make unnecessary individual script + Latin mixture blocking in 'dangerous_patterns' regex (as was done for Canadian Syllabics, Tifinagh. I also plan to do that for Armenian). This would simplify our code. 

At the same time, it may  block some 'innocent' labels beyond TLD+1, but the chance is pretty low.

Comment 4 by js...@chromium.org, May 29 2017

Hebrew domain name policy (Israel): does not allow mixing Hebrew characters and Latin. 

http://www.isoc.org.il/files/docs/ISOC-IL_Registration_Rules_v1.5_ENGLISH_-_26.6.2016.pdf

https://www.icann.org/sites/default/files/packages/lgr/lgr-second-level-hebrew-30aug16-en.html

---------

Chinese 2nd-level LGR has this (Japanese and Korean 2nd-level LGRs have a similar provision):  

Unlike many other non-Latin 2nd level reference LGRs, the Chinese LGR includes the basic ASCII Latin set (a to z) because it is common practice in Chinese text to mix Han and ASCII. Therefore it does not create confusability or additional security risks in the context of a second level LGR for the Chinese language. It is also supported by current IDNA practice, see [700], [701], and [702].

----------

https://www.icann.org/resources/pages/second-level-lgr-2015-06-21-en 


--------------

Indian IDN policy (not sure if it's the latest. it's from 2009)

http://meity.gov.in/writereaddata/files/India-IDN-Policy.pdf

3.B has this:
B. NOT PERMISSIBLE
1. CODE-PAGE MIXING
No mixing of scripts at a given level will NOT be allowed

As an example, Latin-Devanagari mixed label is given.  

In addition, native Indic digits are not allowed. Interestingly, it also disallows ZWJ/ZWNJ. (that does not mean that other countries would do the same.). Moreover, it's published before IDNA 2008 was finalized. 

https://registry.in/Internationalized_Domain_Names_IDNs has a list of newer IDN policy documents, but each of them are tar.gz with a lot of gzipped files inside. I haven't managed to go through multiple layers of compression/archving. (e.g. https://registry.in/system/files/DEVANAGARI.tar_.gz  has a lot of gzipped files inside).

Comment 5 by js...@chromium.org, May 29 2017

Cc: markda...@google.com
Labels: -Restrict-View-SecurityTeam
Removing the view restriction. Due/thanks to existing IDN policy, there's no risk factor opening up this bug to the public.

Comment 6 by lgar...@chromium.org, May 30 2017

Components: UI>Security>UrlFormatting

Comment 7 by dominickn@chromium.org, Jun 21 2017

 Issue 735210  has been merged into this issue.

Comment 8 by pkasting@chromium.org, Jun 21 2017

Based on comment 3, it would be nice to suggest to Mozilla that they also switch.

Comment 9 by js...@chromium.org, Aug 29 2017

Blocking: 756735

Comment 10 by js...@chromium.org, Aug 29 2017

Blocking: 756456

Comment 11 by js...@chromium.org, Aug 29 2017

Blocking: 756226

Comment 12 by js...@chromium.org, Sep 14 2017

Comment 13 by js...@chromium.org, Sep 28 2017

https://chromium-review.googlesource.com/c/chromium/src/+/688825 is a draft CL. I'll add more tests from bugs blocked by this bug.

Comment 14 by js...@chromium.org, Sep 28 2017

The CL in comment 13 is out for review. In the meantime, Mozilla also made a switch in ToT (see the mozilla bug in comment 12 ).

Comment 15 by js...@chromium.org, Oct 4 2017

Blocking: 770465

Comment 16 by bugdroid1@chromium.org, Oct 4 2017

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fd34ee82420c5e5cb04459d6e381944979d8e571

commit fd34ee82420c5e5cb04459d6e381944979d8e571
Author: Jungshik Shin <jshin@chromium.org>
Date: Wed Oct 04 23:25:49 2017

Change the script mixing policy to highly restrictive

The current script mixing policy (moderately restricitive) allows
mixing of Latin-ASCII and one non-Latin script (unless the non-Latin
script is Cyrillic or Greek).

This CL tightens up the policy to block mixing of Latin-ASCII and
a non-Latin script unless the non-Latin script is Chinese (Hanzi,
Bopomofo), Japanese (Kanji, Hiragana, Katakana) or Korean (Hangul,
Hanja).

Major gTLDs (.net/.org/.com) do not allow the registration of
a domain that has both Latin and a non-Latin script. The only
exception is names with Latin + Chinese/Japanese/Korean scripts.
The same is true of ccTLDs with IDNs.

Given the above registration rules of major gTLDs and ccTLDs, allowing
mixing of Latin and non-Latin other than CJK has no practical effect. In
the meantime, domain names in TLDs with a laxer policy on script mixing
would be subject to a potential spoofing attempt with the current
moderately restrictive script mixing policy. To protect users from those
risks, there are a few ad-hoc rules in place.

By switching to highly restrictive those ad-hoc rules can be removed
simplifying the IDN display policy implementation a bit.

This is also coordinated with Mozilla. See
https://bugzilla.mozilla.org/show_bug.cgi?id=1399939 .

BUG= 726950 ,  756226 ,  756456 ,  756735 ,  770465 
TEST=components_unittests --gtest_filter=*IDN*

Change-Id: Ib96d0d588f7fcda38ffa0ce59e98a5bd5b439116
Reviewed-on: https://chromium-review.googlesource.com/688825
Reviewed-by: Brett Wilson <brettw@chromium.org>
Reviewed-by: Lucas Garron <lgarron@chromium.org>
Commit-Queue: Jungshik Shin <jshin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#506561}
[modify] https://crrev.com/fd34ee82420c5e5cb04459d6e381944979d8e571/components/url_formatter/idn_spoof_checker.cc
[modify] https://crrev.com/fd34ee82420c5e5cb04459d6e381944979d8e571/components/url_formatter/url_formatter_unittest.cc

Comment 17 by js...@chromium.org, Oct 4 2017

Status: Fixed (was: Started)
Summary: script mixing policy : switch from Moderately Restricitve to Highly Restrictive (was: Review script mixing policy : Moderately Restricitve vs Highly Restrictive )

Comment 18 by js...@chromium.org, Oct 10 2017

 Issue 773051  has been merged into this issue.

Comment 19 by awhalley@google.com, Oct 16 2017

Labels: reward-topanel

Comment 20 by awhalley@google.com, Oct 20 2017

Labels: -reward-topanel reward-0

Comment 21 by js...@chromium.org, Nov 14 2017

 Issue 756886  has been merged into this issue.

Comment 22 by js...@chromium.org, Nov 14 2017

 Issue 756866  has been merged into this issue.

Comment 23 by js...@chromium.org, Nov 14 2017

 Issue 756977  has been merged into this issue.

Comment 24 by js...@chromium.org, Nov 14 2017

 Issue 756947  has been merged into this issue.

Comment 25 by js...@chromium.org, Nov 14 2017

 Issue 756893  has been merged into this issue.

Comment 26 by js...@chromium.org, Nov 14 2017

 Issue 757180  has been merged into this issue.

Sign in to add a comment