New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 724018 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac
Pri: 3
Type: Bug

Blocked on:
issue 804688

Blocking:
issue 660384



Sign in to add a comment

Apply IDNA ToASCII even when the input is ASCII

Project Member Reported by annevank...@gmail.com, May 18 2017

Issue description

If ToASCII is not applied to all input (even ASCII) rules are not uniformly enforced. E.g., it means that Unicode labels can be 63 code points after conversion to Punycode, whereas ASCII labels have no limit.

Making it uniform likely requires removing some rules for non-ASCII input, as the web depends on being able to place hyphens in the 3rd and 4th place of a label, and likely also depends on leading and trailing hyphens.

An update to Unicode's UTS #46 likely makes more of this configurable: http://www.unicode.org/reports/tr46/tr46-18.html.

https://url.spec.whatwg.org/#idna already requires UseSTD3ASCIIRules and VerifyDnsLength to be set to false. I propose that the URL Standard also sets CheckHyphens (needed for compatibility) and CheckJoiners (seems silly to restrict a subset of emojis) to false and continues to require applying ToASCII (domain to ASCII as the URL Standard calls it) to all input.

Tests: https://github.com/w3c/web-platform-tests/pull/5976.
 
Labels: OS-Android OS-Chrome OS-Fuchsia OS-Linux OS-Windows
Status: Untriaged (was: Unconfirmed)

Comment 2 by ricea@chromium.org, May 23 2017

I don't concretely understand what needs to be done here. Would it be best to wait for the new tests to land?
I think it would be best to review the tests and give feedback with respect to your thoughts. The main issue is that currently Chrome (and other browsers) have an "ASCII fast path" of sorts that leads to all kinds of inconsistencies.
VerifyDnsLength is enforced inside icu.
- UTS46::nameToASCII() sets UIDNA_ERROR_DOMAIN_NAME_TOO_LONG
- UTS46::process() sets UIDNA_ERROR_LABEL_TOO_LONG
It looks we cannot turn it off. Given the "domain to ASCII" in https://url.spec.whatwg.org/#idna sets it to false, we need to change it or implement by ourselves.

For all ASCII data, DoSimpleHost() is used which doesn't have the limit.

The same reason for empty labels being disallowed.

---

uidna_openUTS46() is not called with UIDNA_USE_STD3_RULES.

---

Turning off leading/trailing hyphen validation is not yet done on the spec side?
https://github.com/jsdom/whatwg-url/pull/90

---

Not sure why setting .host to 'xn--a' is not rejected.
Status: Available (was: Untriaged)
> Turning off leading/trailing hyphen validation is not yet done on the spec side?

That change has been made and deployed. The URL you point to is the repository of a JavaScript implementation of the standard.
Labels: -OS-Fuchsia
Blocking: 660384
Blockedon: 804688
Cc: timothygu@chromium.org mgiuca@chromium.org

Sign in to add a comment