New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 25714 link

Starred by 14 users

Issue metadata

Status: Fixed
Closed: Oct 2009
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug-Regression

  • Only users with Commit permission may comment.

Sign in to add a comment

Cannot type in a URL with an underscore in the hostname

Reported by, Oct 24 2009

Issue description

Chrome Version       : (Official Build 29563)
URLs (if applicable) :
OS version               : 10.5.8
Behavior in Safari 3.x/4.x (if applicable): It works
Behavior in Firefox 3.x (if applicable): It works
Behavior in Chrome for Windows: Don't have any Windows machine to test 
it on.

What steps will reproduce the problem?
1. Type in a URL with an _ in the host name

What is the expected result?
The URL is loaded

What happens instead?
It searches on google for the URL

Comment 1 by, Oct 26 2009

Labels: -OS-Mac OS-All Regression
Status: Untriaged
Probably introduced in .

Technically, underscores are not allowed in urls (search for "hostport"), but they are commonly used 

Comment 2 by, Oct 26 2009

Labels: -Pri-2 Pri-1
"Regressions are P1"

Comment 3 by, Oct 26 2009

We have an internal bug where we're discussing this; I'll forward the info from there 
when I get in to work.

Comment 5 by, Oct 26 2009

I kind of wonder if it would make sense to always parse "http://..." as a URL.  It's 
not like that's the beginning of a search term, and it would help people who have 
added their own TLDs.  (I think we should fix the underscore thing separately too.  
Just a thought.)

Comment 6 by, Oct 26 2009

Yes, please. And everything with a trailing slash too. Up until now, when chrome 
searched for something when I wanted to navigate, I just typed an extra '/' at the end to 
make chrome do what I want to.

Comment 7 by, Oct 26 2009

CL specifically was made to stop treating 
everything with "http://" as a URL, though I wonder why. Do we get enough valid 
searches starting with this? I am not sure. Seems logical to revert the CL, though it 
would be great if Peter elaborated about it.

Comment 8 by Deleted ...@, Oct 26 2009

Underscores are invalid in *host* names, but are valid in *domain* names, so is a perfectly valid authority-part in an URL.  If Chrome 
is going to look for underscores, it has to confine its search to the leftmost 
component of the URL authority-part.

Comment 9 by, Oct 26 2009

Google search results include (and my browser will happily navigate to) a URL with an 
underscore in the hostname:

This isn't about valid versus invalid, it's about what people actually do and what 
sites expect to work.

Comment 10 by, Oct 26 2009

RFC 1738, 
   Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
   reserved characters used for their reserved purposes may be used

   unencoded within a URL.

That refers to the "path" part I think. From that same RFC:

httpurl = "http://" hostport [ "/" hpath [ "?" search ]]
hostport = host [ ":" port ]
host = hostname | hostnumber
hostname = *[ domainlabel "." ] toplabel
domainlabel = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit
toplabel = alpha | alpha *[ alphadigit | "-" ] alphadigit
alphadigit = alpha | digit

To me, that looks as if underscores in the host/domain part of the url
are not valid.
(But as Evan said, that's not really related to this bug :-P)
Labels: -Area-Misc Area-BrowserUI
Status: Assigned
@5: The principle behind the change was that anything that can't be navigated to 
shouldn't be attempted, since it's really frustrating when that happens.  And 
technically it's possible to search for a URL with a scheme, it's just rare.

In any case, it's clear that at the very least we need to support underscores in 
hostnames, perhaps with the same rules as for hyphens.

Comment 14 by, Oct 26 2009

I have requested access to some internal Google URL databases so that I can do some 
analyses on what we should allow.
@8: Underscores are no more valid in "domains" than in "hosts".  Read RFC 1738 section 

Again, that's not me saying WONTFIX, that's just me noting what the specs (which 
clearly do not match real life) say.
BTW, this affects HTML5 too, which relies on this construction to validate <input 
type=email>.  We need to find out what the "real" rules are so we can correct both 
Chrome and HTML5.

Comment 17 by, Oct 26 2009

Relevant comments from the now-obsolete internal bug.

1) A comment from bradfitz (who ran a site that involved underscores in hostnames):
"I don't remember the details, other than collecting observations at the time and 
finding that all of DNS servers, webservers, and browsers all violated the specs in 
different ways, but generally all towards over-accepting in various ways.
Underscores in URLs didn't work for all users, but most, so for those who it did work, 
we just had LiveJournal issue a 3xx to the hypen-equivalent URL, so those would be 
hopefully permalinked more.  (but hypens and underscores aren't interchangeable... 
that was just our convention.)"

2) We should fix HTML5 with whatever we conclude.  [ooh, bug comment collision, I see 
this is in comment 16 now]
Whatever we end up doing, shouldn't we send a background request for anything that 
looks remotely like an URL and popup a "did you mean"? It was very frustrating for me 
trying to get to some internal page and not being able to.

Could we perhaps revert until we know what 
we want to do here (it doesn't sound like the Real Fix will get in today)? This is 
significantly affecting my productivity, because several internal URLs contain '_'s.
I think the change to force validation against the RFC 1738 is incorrect, and I 
suspect we will never be able to validate to this level: since DNS systems don't 
validate to this level and neither do other browsers, users will not expect Chrome 
to. There are no real rules for this, people enter all kinds of random crap in domain 
names (including spaces, which we're forced to support).

That validation was added to work around a certain bug in input classification, but I 
suspect the fix is causing more problems that the bug did. I think we should change 
the hard check to a heuristic checking the cases that are causing things to be mis 
classified, and treat things the things the heuristic changes from URL to searches to 
follow the same rules as single-word queries: i.e. allow you to arrow to the URL in 
the popup to navigate, and to fire off the a check of the hostname for the "did you 
mean" infobar.
@19: GURL supports domains with spaces, but the omnibox doesn't, and hasn't ever.  
Also, we don't have the ability to arrow to navigate with UNKNOWN inputs anymore, due 
to some quality changes made a while ago.  Also, getting the accidental search 
infobar for a URL like the one in this bug just appears utterly broken, so that's not 
a real solution.

I am convinced there's a wider set of characters/rules that we can use.  For example, 
I don't think any real-world URLs use '!' in the hostname, or the double-quote 
character that triggered the original fix.  All we need to do is widen our current 
rules to cover all the cases that happen in the real world.
@18: Reverting that checkin doesn't fix this issue, it only hides it when you type a 
scheme.  "" is still going to be treated as a search.
@21: But "" and "" would work, right? This is what I usually use 
if I get a search when I meant navigate.
The first would work but the second wouldn't.

Have patience.  Evan is getting data, hopefully I'll be ready to patch before the end 
of today.
Fixed in r30245.
Status: Fixed
The following revision refers to this bug: 

r30245 | | 2009-10-27 14:06:11 -0700 (Tue, 27 Oct 2009) | 7 lines
Changed paths:

Loosen RFC 1738 compliance check to allow underscores where we already allowed hyphens, to match real-world needs.

I don't believe further loosening will be required but that data will hopefully be coming soon.  In the meantime people are asking for this fix.

BUG= 25714 
TEST=Entring "" in the omnibox should default to navigate, not search
Review URL:

Hooray! Thanks.

Out of interest: Is there some way to force chrome to use something for navigation and 
not for search, if I ever come across a case where the heuristic is wrong?
Not for this particular heuristic, which is why I'm keenly interested in the additional 
data that I'm trying to get from Evan.

Comment 29 by, Oct 27 2009

Unfortunately, I have contacted the Right People for this and it turns out to be huge 
and complicated and involve subscribing to multiple whatwg lists, so I am unlikely to 
be able to resolve this.  I will write a summary.
I don't agree that this is fixed, there needs to be a reliable way to override the 
browser's heuristic. At minimum, anything starting with "http://" should activate the 
"did you mean to navigate to ..." alternative. It's incredibly frustrating when the 
browser completely refuses to open a valid URL, and I've recently had to start up 
Firefox multiple times just to be able to access web pages.

The comment in the file completely misses the point:

  // See if the hostname is valid.  While IE and GURL allow hostnames to contain
  // many other characters (perhaps for weird intranet machines), it's extremely
  // unlikely that a user would be trying to type those in for anything other
  // than a search query.

It's not about users "trying to type" something. I frequently paste URLs into the 
address bar, and those need to work. Heuristics are helpful, but they'll never be 
perfect, and it's insulting to the user to effectively say "you don't know what 
you're doing, let me do something completely different instead. Here are the zero 
search results that you must have been looking for."

It's really a very simple request, a web browser needs to provide a "go to URL" 
feature. A "maybe go to this URL if I feel like it" feature is not the same thing.

For example, it looks like the URL heuristics check for dotted-quad IP addresses and 
known TLDs. What about IPv6 addresses, newly added or intranet-specific TLDs, or 
internationalized domain names? I'm ok with heuristics suggesting search, or even 
doing search by default, as long as there's a way to correct the browser when it 
guesses wrong.
Klaus.Weidner: That seems like a valid concern. Can you file a new bug for that (just 
reference this bug and paste your comment)?
IPv6 and IDN both work correctly today.

It's possible I should return UNKNOWN here instead of QUERY.  I filed  bug 26341  on 
looking into that.
thakis, I copied comment 30 to the new  bug 26341  that pkasting just filed.

Apologies for ranting, and I didn't mean to imply that IPv6 or IDN are not working 
right. My point was that it's unrealistic to expect a heuristic to work perfectly in 
every case for all current and future usages, and that there needs to be a mechanism 
to override it for the (hopefully rare) cases where it guessed wrong.

Comment 34 Deleted

Comment 35 by Deleted ...@, Nov 18 2009

My web site can't be typed in in Chrome :(

Comment 36 by, Nov 18 2009

Nerd42: this has been fixed.  Try using a recent version of Chrome.
Labels: -Regression bulkmove Type-Regression
Chrome Version       : (Official Build 29563)
URLs (if applicable) :
OS version               : 10.5.8
Behavior in Safari 3.x/4.x (if applicable): It works
Behavior in Firefox 3.x (if applicable): It works
Behavior in Chrome for Windows: Don't have any Windows machine to test 
it on.

What steps will reproduce the problem?
1. Type in a URL with an _ in the host name

What is the expected result?
The URL is loaded

What happens instead?
It searches on google for the URL
it seems underscores are permitted in domains or subdomains,

requesting to reopen, a URL like doesn't seem to load (triggers a google search instead)
That doesn't load when you force a load anyway.

Don't ever post on many-years-old closed bugs.  Always open new bugs.  This bug is about allowing underscores in cases where real-world servers allow them.  This has indeed been fixed for some time.  The example you give is a case that not only is disallowed by the relevant RFCs (your linked article is incomplete in its analysis; it matters _where_ in the name the underscore appears) but also not seemingly used in the real world.  Therefore, it's WONTFIX unless you can demonstrate some real websites that require this.  (And again, if you can, do so on a different bug.)
Project Member

Comment 41 by, Oct 14 2012

Labels: Restrict-AddIssueComment-Commit
This issue has been closed for some time. No one will pay attention to new comments.
If you are seeing this bug or have new data, please click New Issue to start a new bug.
Project Member

Comment 42 by, Mar 9 2013

Labels: -Type-Regression Type-Bug-Regression

Sign in to add a comment