New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 843352 link

Starred by 4 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , iOS , Chrome , Mac , Fuchsia
Pri: 3
Type: Bug
Team-Security-UX



Sign in to add a comment

IDN spoofing guard: characters that look like multiple characters (font/platform variations)

Project Member Reported by js...@chromium.org, May 15 2018

Issue description

Spun off from  bug 817247  . 

Some characters can have multiple look-alike characters. 

For instance, U+0153 (œ) can be arguably mapped to 'ae', 'oe' or 'ce'.  U+04CF (ӏ)  can be mapped to 'i', 'l' or '1'. 

At the moment, U+04CF is mapped to both 'i' and 'l' (and '1' indirectly because 'l' and '1' (digit) share the spoofing skeleton). 

If there are more than one of those characters with multiple 'skeletons', we don't have a good solution. What I tried does not work 
( https://chromium-review.googlesource.com/c/chromium/src/+/974165/6#message-af0b0cffc6cba6bee7713fd2fc4b8532d0a0a1ba and comments thereafter ). 


From  bug 817247  comment 8:

[\u0131\u0269\u026A\u03B9\u0456\u04CF\u13A5\uA647\U000118C3] & [:IdentifierStatus=Allowed:]
=>


 ı 	U+0131	LATIN SMALL LETTER DOTLESS I
 ι 	U+03B9	GREEK SMALL LETTER IOTA
 і 	U+0456	CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
 ӏ 	U+04CF	CYRILLIC SMALL LETTER PALOCHKA

Three more characters that may need a similar treatment. 

They're currently folded to 'i'.  In addition to that, we can map them to 'l' (lowercase L) for the 2nd check and calculate the skeleton.  Then, it'd match 'digit 1' as well because digit 1's skeleton is lowercase L. (see   bug 820068  )



 

Comment 1 by js...@chromium.org, May 15 2018

U+0525 (ԥ) is another example. In some fonts, it can look more like 'll' than 'n'. 

Comment 2 by sffc@google.com, May 18 2018

There is no such thing as one character having multiple skeletons.  Every character has exactly 1 skeleton (technically called "prototype") according to the TR 39 specification:

http://unicode.org/reports/tr39/tr39-1.html

The characters should either be added to the same equivalence class (same prototype).

Mark has some ideas on how to add more flexibility, but that's still in the early design phases.
Owner: js...@chromium.org
Status: Assigned (was: Untriaged)
Assigning to jshin to get out of Enamel triage queue. Please either find a good owner for this or set back to untriaged.

Comment 4 by js...@chromium.org, Jun 1 2018

Cc: est...@chromium.org
Owner: ----
Status: Untriaged (was: Assigned)
Summary: IDN spoofing guard: characters that look like multiple characters (font/platform variations) (was: IDN spoofing guard: mapping one character to multiple characters)
> There is no such thing as one character having multiple skeletons

I know that the current spoofing data does not allow that. This bug is about how to tackle cases in the bug report (comment 0) either by mapping data change (e.g. mapping all i-like, l-like and 1-like characters into a single skeleton would be one way, but I'm not sure of it's ramification), changing mapping format/structure or handling that at a 'higher' level (spoofing detection implementation change, or changing its users - as Chrome). 

Given my recent change, I'm sorry I can't work on this any more. 

Comment 5 by est...@chromium.org, Jun 29 2018

Owner: mea...@chromium.org
Status: Assigned (was: Untriaged)
Issue 901578 has been merged into this issue.

Sign in to add a comment