New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 765922 link

Starred by 4 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Mac
Pri: 2
Type: Bug

Blocking:
issue 660384



Sign in to add a comment

Inconsistency in URL-parsing punycode handling

Reported by jfkth...@gmail.com, Sep 16 2017

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:57.0) Gecko/20100101 Firefox/57.0

Steps to reproduce the problem:
In DevTools console, compare the behaviors of:

(1) new URL("http://xn--google.com")
(2) new URL("http://䕮䕵䕶䕱.com")
(3) new URL("http://xn--x.com")
(4) new URL("http://xn--x.xn--google.com")
(5) new URL("http://xn--x.䕮䕵䕶䕱.com")

Also try navigating to these URLs via the address bar.

What is the expected behavior?
All 5 examples should create a URL object.

Note that (1) and (2) result in identical URLs, because (1) is simply the punycode representation of (2); in both cases, the resulting URL has its hostname set to "xn--google.com".

Example (3) also looks like punycode, but is not actually a valid punycode label. This does not prevent 'new URL()' from parsing it, with a resulting hostname of "xn--x.com", although the difference from (1) can be seen by trying to navigate there: (1) will display as 䕮䕵䕶䕱.com in the address bar (and resolves to a parked-site page), whereas (3) results in a web search because it cannot be resolved.

Example (4) shows that the presence of the invalid-punycode label  as a subdomain does not interfere with parsing the URL as a whole, nor with navigating to the site: this will also lead to a parked-site page for 䕮䕵䕶䕱.com.

Example (5) should behave identically to (4), just like (2) behaves identically to (1).

What went wrong?
Example (5) in the Dev Tools console results in failure:

> Uncaught TypeError: Failed to construct 'URL': Invalid URL

I believe this is incorrect, AFAICT from reading the reading the URL parsing algorithm[1].

The algorithm depends on a "host parser"[2] which in turn uses a "domain to ASCII"[3] algorithm based on Unicode's ToASCII[4]. This basically splits the domain on dots, and then punycode-encodes any labels that contain non-ASCII characters; but I don't see anything that requires an invalid-ACE label like "xn--accountlogin" to result in a validation failure, nor any justification for treating this differently depending on whether a separate label within the domain contained non-ASCII chars (and therefore was punycode-encoded by ToASCII).

[1] https://url.spec.whatwg.org/#concept-basic-url-parser
[2] https://url.spec.whatwg.org/#concept-host-parser
[3] https://url.spec.whatwg.org/#concept-domain-to-ascii
[4] http://www.unicode.org/reports/tr46/#ToASCII

Did this work before? N/A 

Does this work in other browsers? Yes

Chrome version: 60.0.3112.90 (Official Build) (64-bit)  Channel: stable
OS Version: OS X 10.12
Flash Version: Shockwave Flash 23.0 r0

Note that Safari behaves as expected here (examples 4 and 5 both parse to identical URLs), as does Firefox once mozilla bug 1399540 (just landed, to address a couple of somewhat different-but-related issues) is fixed.
 
Labels: Needs-Bisect Needs-Triage-M61 OS-Windows
Cc: susanjuniab@chromium.org
Labels: Needs-Feedback
jfkthame@ thanks for the issue..

Tested this issue on Windows 7 and Mac OS 10.12.6 using the latest Canary 63.0.3218.0 and latest Stable 61.0.3163.91 with the below steps.

1. Launched Chrome and opened the above given URLs
2. Opened Console in Devtools on each page and can see no Uncaught TypeError.

Please find the attached screen-cast for reference.

Tried the same on Firefox and can observe the same behavior.

Request you to please attach the screen-cast of the expected behavior for better understanding of the issue.

Thanks.
765922.webm
7.1 MB View Download

Comment 3 by jfkth...@gmail.com, Sep 19 2017

Thanks for the feedback. I'm not set up to easily record a screencast right now, but am attaching a screenshot that shows the TypeError in devtools (using current Chrome stable on macOS 10.12). This results from simply entering successive "new URL(...)" commands in the console and observing the results returned, as shown in the image.

chrome-url-error.png
298 KB View Download
Project Member

Comment 4 by sheriffbot@chromium.org, Sep 19 2017

Labels: -Needs-Feedback
Thank you for providing more feedback. Adding requester "susanjuniab@chromium.org" to the cc list and removing "Needs-Feedback" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Needs-Bisect M-63 OS-Linux
Status: Untriaged (was: Unconfirmed)
Able to reproduce this issue on Mac 10.12.6, Win-10 and Ubuntu 14.04 using chrome reported version 61.0.3163.91 and latest canary #63.0.3219.0.

This is a non-regression issue as it is observed from M50 old builds. 

Hence, marking it as untriaged to get more inputs from dev team.

Thanks...!!
Cc: js...@chromium.org
Components: -Blink>Network
jshin@, can you take a look?
Components: Internals>Network
Cc: brettw@chromium.org mkwst@chromium.org
Components: Internals>Core
I don't think this is a networking issue, but rather an issue with our URL parser. I wrote a quick unit test:

TEST(GURLTest, Punycode) { 
  EXPECT_EQ(GURL("http://xn--google.com"), GURL("http://䕮䕵䕶䕱.com"));    
  EXPECT_EQ(GURL("http://xn--x.xn--google.com"),
            GURL("http://xn--x.䕮䕵䕶䕱.com"));
}

The first expectation succeeds. The second fails.

../../url/gurl_unittest.cc:68: Failure
      Expected: GURL("http://xn--x.xn--google.com")
      Which is: http://xn--x.xn--google.com/
To be equal to: GURL("http://xn--x.䕮䕵䕶䕱.com")
      Which is: http://xn--x.%E4%95%AE%E4%95%B5%E4%95%B6%E4%95%B1.com/

Maybe //url OWNERS have ideas.
Components: -Internals>Network
Blocking: 660384
Status: Available (was: Untriaged)

Sign in to add a comment