URI parser/Omnibox accept obscure and misleading numeric IPv4 addresses
Reported by
linde.ph...@gmail.com,
Nov 21 2017
|
|||||||
Issue descriptionUserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36 Example URL: http://001122334455 Steps to reproduce the problem: 1. Enter http://001122334455 into address bar 2. Press enter What is the expected behavior? The browser should interpret the host part of the URI as an RFC 1123 compliant host name. What went wrong? The browser parses the host portion of the address as the octal representation of an IP address, i.e. 001122334455 as 9.73.185.45 Did this work before? N/A Chrome version: 62.0.3202.75 Channel: n/a OS Version: SMP Debian 3.16.43-2+deb8u5 (2017-09-19) Flash Version: According to RFC 3986, the dotted parts of an IPv4 address may be octal or hexadecimal, but the address in this case is neither dotted nor an IP literal.
,
Nov 21 2017
I disagree that an entirely numeric hostname is invalid. RFC 952 specifies that at least the first character of a hostname has to be a letter, but in RFC 1123 section 2.1 this requirement is explicitly removed, and it states that host software must support the more liberal syntax where the first character may also be a digit. The way that this surprised me is not so much that it isn't interpreted as a valid hostname (that throws a lot of software off), but that it's interpreted as an octal representation of an IP address. This could be some baggage from older browser implementations, but it's not a standard interpretation of a URI. That said, I agree that it might not be worth the effort if people already depend on the non-standard behavior of the address bar for things like http://0/ and given that all-digit hostnames are pretty rare.
,
Nov 22 2017
palmer@ has expressed interested in being a bit more draconian in our address parsing. Perhaps adding some metrics would be a good first step?
,
Nov 22 2017
I blogged about this (https://noncombatant.org/2017/11/07/problems-of-urls/#DeprecateAndRemoveWeirdHostAddressRepresentations; test code https://noncombatant.org/2017/11/07/problems-of-urls/ipv4-parser.c) The problem is due to inet_aton being a bit of a DWIM interface. I don't think there's a reason to still behave that way now. I'd be amazed if we got "a lot" of complaints from people mapping 0.0.0.0 to localhost and expecting http://0/ to work. I'd call removing this old DWIM feature to be a feature request, moreso than a bug.
,
Nov 22 2017
As Chris notes, this is an artifact of BSD's inet_aton (which infected Windows' inet_addr with the same bug). URL implementations inherited this bug because instead of strictly checking the grammar from RFC 3986, they simply passed it on to inet_aton (and friends) to test if the authority contained an IP, and thus this issue was born. The WHATWG URL Standard defines these as 'invalid' IPv4 addresses, and formalizes the state machine parsing ( see https://url.spec.whatwg.org/#concept-ipv4-parser ). Note that a validation error doesn't necessarily mean 'rejected' - see https://url.spec.whatwg.org/#validation-error - it's just an internal state that the URL was ugly. +1 to palmer's clarification of feature request - it'd be an I2D to deprecate that behaviour, and other user agents that are concerned about either RFC 3986 or URL Standard behaviour 'should' already have the necessary infrastructure to support that deprecation in their clients as well.
,
Nov 27 2017
linde.philip: Per RFC 113, "at least the highest-level component label will be alphabetic.", so a domain can't be purely numeric. The first character can be a number, but the TLD can't be.
,
Nov 27 2017
Sorry, that should be RFC 1123.
,
Nov 28 2017
Regardless of what is defined as a valid hostname, the per #5, the URL standard defines whether any given URL string is interpreted as a domain or an IPv4 address. A careful reading of the host parsing section (https://url.spec.whatwg.org/#host-parsing) suggests that *any* host consisting of 1--4 runs of ASCII digits separated by periods is parsed as an IPv4 address (possibly invalid). Specifically, any host section consisting only of ASCII digits with no periods is interpreted as an IPv4 address. If the value n is < 256, it is considered a valid IP address 0.0.0.n. If 256 <= n < 2^32, it is a validation error, but still returns an IP address of the four bytes of that 32-bit integer. If n > 2^32, the entire URL is invalid (it is not treated as a domain name). Note: This is just my reading of the spec, not necessarily the behaviour of Chrome, but they usually put wacky stuff like this in the spec due to compatibility with a majority of implementations.
,
Nov 28 2017
re: Comment 8 - yeah, that portion of the spec was retro-spec'd from implementations that either implicitly had that behaviour (Windows, by virtue of WinHttpCrackURL/InternetCrackURL + inet_addr, which itself was mirroring Mosaic's non-strict URL parsing) or explicitly had that behaviour (Firefox's URL class, which was intentionally mimicking IE's behaviour, and which Chrome then mimicked). My point in Comment #5 was because that's been incorporated in the WHATWG spec, and UA implementations have since tried to align on the spec, the argument for "This is the lower layer's problem" doesn't hold anymore (as it originally did in the Windows/Mosaic/Netscape parsing case), and it's possible to change the spec and implementations without adding extra 'implementation overhead'.
,
Nov 30 2017
mmenke: Dots are only allowed in domain names, a subset of valid host names for which RFC 920 introduces a limited set of top level components. I don't think that the passage you quoted should be understood as a general requirement for host names. My interpretation is that it's rather a note that a dotted decimal IP address can not also be a valid host name, which when it consists of multiple components must always end in a top level domain like com, org, edu, local, horse etc.
,
Jun 29 2018
Unfortunately, I'm not going to get to this any time soon. Maybe an Enamel friend could zap it? Or an open source contributor?
,
Jul 25
Don't think this is a good first bug, given the issues involved in deciding if this is a good idea.
,
Nov 2
Issue 901398 has been merged into this issue.
,
Nov 2
,
Jan 15
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by mmenke@chromium.org
, Nov 21 2017