New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 698660 link

Starred by 8 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac
Pri: 3
Type: Bug


Show other hotlists

Hotlists containing this issue:
HSTS-Preload


Sign in to add a comment

URLFetcher stop-on-redirect behavior should ignore internal redirects

Project Member Reported by mkwst@chromium.org, Mar 6 2017

Issue description

Chrome Version: (copy from chrome://version)
OS: (e.g. Win7, OSX 10.9.5, etc...)

What steps will reproduce the problem?
(1) Type "google" into the omnibox.
(2) Hit enter.

What is the expected result?

Search for "google".

What happens instead?

Search for "google", and a navigational aid to `http://google/`.

I think the new `.google` gTLD is confusing our heuristics here.

pkasting@, would you mind triaging this into the appropriate queue/priority?
 
Screen Shot 2017-03-06 at 8.31.25 AM.png
35.5 KB View Download
Labels: Needs-Feedback
We don't base this off the TLD list, but rather off the effects of actual HTTP requests.  IOW, this generally means you have a reachable host called google/ on your intranet, in which case this is WAI.

Check "curl -I http://google/".

Comment 2 by mkwst@chromium.org, Mar 6 2017

I don't have a host named `google` on my intranet.

> [09:14] $ curl -I http://google
> curl: (6) Could not resolve host: google

Likewise:

> [09:14] $ dig google
>
> ; <<>> DiG 9.8.3-P1 <<>> google
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28307
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>
> ;; QUESTION SECTION:
> ;google.				IN	A
>
> ;; Query time: 2 msec
> ;; SERVER: 192.168.1.1#53(192.168.1.1)
> ;; WHEN: Mon Mar  6 09:17:04 2017
> ;; MSG SIZE  rcvd: 24

I was assuming that the new gTLD `.google` (e.g. `https://blog.google/`) had something to do with the behavior I'm seeing. That might not be the case, though, as I'm not seeing the same behavior for other new TLDs like `yandex` or `paris`.

Is there any additional information I can give you about my network that might explain the behavior? I've heard anecdotes from two other people regarding this behavior, I'm sure I could ask them for the same information.
Oh I forgot, also make sure you don't have HTTPS Anywhere or other similar proxies/extensions installed that redirect all HTTP to HTTPS requests, or you get behaviors like this.  See  bug 521009 .

If it's not that either, I'd say capture a network trace of this.

Comment 4 by mkwst@chromium.org, Mar 6 2017

Labels: OS-Android OS-Chrome OS-Linux OS-Windows
This looks relevant: we've added `.google` as a preloaded HSTS domain, which means we generate an internal redirect for `http://google/`. Perhaps that internal redirect is being treated as "success" for the host lookup? That seems consistent with  bug 521009 .

------

1397723: URL_REQUEST
http://google/
Start Time: 2017-03-06 11:30:38.079

t=2809 [st=0] +REQUEST_ALIVE  [dt=1]
t=2809 [st=0]    URL_REQUEST_DELEGATE  [dt=0]
t=2809 [st=0]   +URL_REQUEST_START_JOB  [dt=1]
                 --> load_flags = 64 (DO_NOT_SAVE_COOKIES)
                 --> method = "HEAD"
                 --> priority = "LOWEST"
                 --> url = "http://google/"
t=2809 [st=0]      URL_REQUEST_REDIRECT_JOB
                   --> reason = "HSTS"
t=2809 [st=0]      URL_REQUEST_FAKE_RESPONSE_HEADERS_CREATED
                   --> HTTP/1.1 307 Internal Redirect
                       Location: https://google/
                       Non-Authoritative-Reason: HSTS
t=2810 [st=1]      URL_REQUEST_DELEGATE  [dt=0]
t=2810 [st=1]      CANCELLED
t=2810 [st=1] -REQUEST_ALIVE

------

Though this seems like something of a niche case ("google" is the only TLD that's HSTS preloaded), "google" seems like a word folks might unintentionally search for with some frequency. Perhaps we could teach `URLFetcher::SetStopOnRedirect` to ignore "Internal Redirect" responses? That might be a relatively small change to `URLFetcherCore::OnReceivedRedirect`.

WDYT, Ryan?


Comment 5 by mkwst@chromium.org, Mar 6 2017

Summary: Omnibox hostname heuristics misunderstand internal redirects. (was: New gTLDs are confusing omnibox navigation heuristics.)
Cc: mmenke@chromium.org rdsmith@chromium.org
I mentioned to Mike out-of-band, but this is similar in effect to Issue 669785 / Issue 479620. While those apply to either A-record bearing labels or (now fixed  Issue 670093  / 127.0.53.53 bearing resolutions), the effect was similar.

I'm ccing mmenke@ & rdsmith@ for the suggestion in Comment #4 - I think we've been hesitant to add more knobs and whistles to URLFetcher, since URLFetcher was supposed to be the "mostly knob-and-whistleless, easy-to-use version of URLRequest". 

That said, from looking at the effective usage of SetStopOnRedirect (DIAL and the Data Reduction Proxy are the only two, besides this), in terms of actual effect, it should be eminently safe and arguably reasonable to do.
Cc: jmukthavaram@chromium.org
 Issue 695209  has been merged into this issue.
Cc: -mmenke@chromium.org pkasting@chromium.org
Owner: mmenke@chromium.org
Summary: URLFetcher stop-on-redirect behavior should ignore internal redirects (was: Omnibox hostname heuristics misunderstand internal redirects.)
At least from the perspective of this consumer, such an internal redirect should not be considered a "redirect" for purposes of stopping.  I can't speak to DIAL/Data Reduction.  Retitling bug to be about the simple proposal, for purposes of argument.

Assigning to mmenke to triage/fix from the network stack side, since it seems like the fix is either to change the behavior of "stop on redirect" as my new title suggests, or to add Yet Another Switch to tweak this somewhere.

We had at least one external person report this as well (see duped-in bug) and I would imagine "google" is not the most uncommon search term.  So we should try to fix sooner rather than later.
I'm not sure we want URLFetcher to know what an internal redirect is.  I'm also not entirely sure *I* know what an internal redirect is.  Are extensions internal?  Is the first visit to http://foo.com/ that redirects us to https://foo.com, which adds an HSTS entry, really fundamentally different from subsequent requests for http://foo.com, where we use the HSTS redirects?  Are the about: to chrome:// redirects "internal redirects"?  What about redirects created by the ProtocolHandlerRegistry, whatever that is?

Is the google HSTS entry is because we expect it to be usable alone at some point?  Or is it because there's no "only match subdomains" option for HSTS?  In the former case, as soon as Google makes sense alone, we're in the same boat.
@comment 9: There should be no A-labels on bare gTLDs, so "https://google" should never make sense (per SSAC053 report - https://www.icann.org/groups/ssac/documents). "google" is included because all subdomains of this gTLD are expected to be HSTS applied. In an ideal world, we might someday hope for "com" to be an entry, but until then... :)

'internal' redirects are effectively all redirects handled by non-network entities, which includes both extensions and HSTS (aka anything using NetworkDelegate and friends). So the answer to all of your questions is 'yes'
And FWIW, ignoring extension-created redirects would fix some cases of  bug 521009 , so having all the answers here be "yes", and ignoring all these cases, still is good (again, from the perspective of this consumer).
Components: Internals>Network>DomainSecurityPolicy
For naming, SetStopOnRedirect => SetStopOnNetworkRedirect perhaps?
Don't think that name works, as it means something entirely different from HttpResponseInfo::network_accessed.  I remain very skeptical of whether we want to expose this via the URLRequest API.
@mmenke: Do you have an alternative proposal? I understand the skepticism, but I would argue the problem is legitimate, so we should find a solution.
Could I get access to issue 479620?  Without seeing it, I'm not sure what we're trying to fix with issue 669785, so don't think I have enough context to come up with an alternate solution that solves those issues as well.
(If I just wanted to solve this particular issue, the answer is obvious - don't create bogus redirects when trying to make requests to URLs that should never make sense)
@comment 17: I'm not sure I understand who you suggest the 'agency' belongs to, but these URLs may or may not make sense (it's more of a guideline than a rule), especially in the presence of DNS suffix search on the client.

As per @comment 11, the desire is clear from (what looks like) the consumers, which is "don't do redirects"

But I don't think from an API perspective we want a "Don't do HSTS" flag. Seems a giant API footgun. Maybe I'm wrong though. So "Stop on internally generated redirects, like HSTS" seems right.
That wouldn't solve this particular issue completely, because there are URLs that would plausibly not be "should never make sense" (so we might create bogus redirects to them), but they don't actually exist in the wild.

The solution in comment 17 solves the particular case of "google" because it happens to be a gTLD, but the behavior here is just as broken for non-gTLDs, and stopping on these internal redirects is still just as undesirable.
(For clarity, comment 19 was directed at comment 17, not comment 18)
My main reluctance here is API bloat - we'd need to add yet more cruft to URLRequest (Or HttpResponseInfo or RedirectInfo), and expose it to the world.  We'd also need to audit all URLRequestJob classes (Not just redirects, I guess?) and figure out which are "internal".  And of course, with the mojo switch, we suddenly need to add this to IPCs, too, which is even weirder.
Labels: -Needs-Feedback
Status: Assigned (was: Untriaged)
Just so we don't stall out on this issue: What should our next steps be?

From talking offline with Matt, I have a better understanding of the concerns expressed in Comment #21, which is that we don't surface any form of distinction about these internalized redirects - a redirect is a redirect is a redirect. We don't attach any explicit metadata (e.g. in the synthesized redirect) that can be trusted, and we don't attach any internal metadata to the code to indicate it's a request chain.

An alternative solution, with its own set of tradeoffs, no doubt, is for the omnibox code to not treat redirects as successful signals. That will change from false positives (for google) to false negatives (for intranet sites that redirect). pkasting, what's your sense of the trade-offs with that approach?
We added stop-on-redirect for a couple of reasons:
* Sites with lots of redirects, especially through slow machines (common in Google, where lots of intranet sites end up being random small servers run as one-offs by some team) took time to follow the redirects.  This delayed the appearance of the infobar, sometimes by several seconds, which felt really broken.
* Some servers behind redirects responded to HEAD with an error and GET with a valid response, in violation of the HTTP spec.  Stop-on-redirects reduced the number of cases where this error made us believe there was no server.
* IIRC, there were auth issues, where somehow the internal corp auth pages were interfering with things, and stopping on redirects allowed us to treat as successful things that would otherwise fail if you weren't authenticated yet?  I don't recall clearly.

It would not be the end of the world to go back to not having stop-on-redirects, but it would definitely be a bad UX.  If the decision at the network stack level was that adding the kind of metadata you describe, so these can be distinguished, is unacceptable, then I would probably put in some kind of hardcoded check for this specific case, as well as the solution elsewhere to detect "HTTPS Everywhere" in the IntranetRedirectDetector.  These solutions are fragile and imperfect, but they'd be better than dropping stop-on-redirects.
Can HSTS entries normally be added to non-navigable domains?  If this is just an issue with the hard-coded HSTS table, perhaps the solution is to fix that?

If we're also concerned about the offline case, of course, that doesn't work.
@Comment 25: Yes, and it's not an issue with the HSTS list, because it's a totally valid expression.
It may be simpler to gate on was_network_accessed - this does mean we could follow a couple cached redirects that didn't require revalidation, but it doesn't require a new API, and means we at most only have one slow request (Modulo ServiceWorker and the like being slow), so I think it may be good enough.
We could also gate on was_network_accessed() || was_cached(), but I wouldn't be surprised if other caches set was_cached() to true (Like SW's).
What is the proposal to solve the case where an ISP network assigns a DNS server via DHCP that forges DNS responses for invalid domains (NX domain hijacking)? Chromium attempts to detect this by using short names without a TLD, but what if ISPs change that behavior and stop spoofing short names so it is no longer detectable?

The omnibox should be able to determine when I type "foo" into the omnibox that I mean to go to my default search engine provider to query that term. If Chromium cannot reliably figure that out because internal or external detection of redirects is ambiguous, then please provide an option for users to DISABLE OMNIBOX SHORT NAME HOST ATTEMPTS. Eg. If I never run chromium on a network that requires short host names, then please allow users to turn it off via an option. Advanced users know that they don't have a hostname "foo" and they instead just want the search engine result. Adding this simple option will remove the ambiguity and allow users to have control over this ambiguity which is now showing itself as a problem at an even deeper level in chromium now. Just let us disable it. As you already mentioned, it is confusing to others using extensions that act as "internal" proxies as well.
Basically, we cross that bridge when we come to it.  We currently use several different probes with varying character sequences and varying lengths, so while it's still possible to detect these, it's not just trivially simple, and I'm not aware of any ISPs working to defeat this.
[pkasting]:  Thoughts on comment #27?
I don't know how often redirects are cached, so it's hard to judge practical impact.

From a semantic perspective, I'm uncomfortable that the code seems to be getting further away from expressing its core intent, which is to ask "is this a navigable target".  Speaking about whether this is an internally-generated redirect more clearly speaks to "is there actually a server here" than speaking about "was the network accessed", which is more indirect.  I can't tell you how to make these tradeoffs, but it would be nice to optimize for "code expresses exactly what we want" rather than "minimal API impact".
I'm not so sure...The cached entry tells you "was there a server there", rather than whether there still is (think offline, for instance).  Anyhow, I don't think we want to update all 47 URLRequestJob subclasses to indicate if a response is internal or external.
It looks like we've stalled out on the discussion about where to put the logic to fix this.  mmenke@, it sounds like you've got all the information you need to make a decision about a design.  Can you please do so?  thanks!

We'll start moving off URLFetcher in the forseeable future, and I don't want to waste time adding features to it before then.
"The foreseeable future" == next few months?  Next year?  ...

Is there a bug (moving off URLFetcher) that we should simply block this bug on?
I'm working on making the mojo interface available right now, for the system request context, with the profile's main request contexts hopefully coming in the next 2 or 3 weeks.   It's unclear if using the Mojo interface is simple enough for this use case, or if we want to wait for the URLFetcher replacement, but I think the Mojo interface may be enough, if you guys want to switch over to it.  I'm unsure if there will be a perf impact, though (Perf is currently delaying rolling out a Mojo interface to web initiated requests).
CC tommycli@, as he's been recently thinking about this area as part of bug 437999.  Maybe even his fix "Omnibox Alternate Navigation Infobar: Fix navigation notifications" https://chromium.googlesource.com/chromium/src.git/+/333ba76c06e6a5805508a492e58a3b3b28f98393
handles this case?  If not, at least he may come at this with fresh eyes.

A random idea:
Can we hack ChromeOmniboxNavigationObserver so that, if the navigation returns successfully with the original parameters (don't follow redirects), we check to see if the resulting NavigationEntry is the original requested URL, just with HTTPS replaced with HTTP.  If so, we then shoot the HTTPS request off?  Only if that request returns successfully (again with stop-on-redirect) then do we display the alternate nav ("did you mean?) infobar?

In other words, we stick with the stop-on-redirect behavior that pkasting@ claims is important (see comment #24) except, at the omnibox level, we explicitly follow at most one redirect, a redirect from http://x -> https://x.

I believe this should solve both the google HSTS issue and the "https everywhere" extension issue.

Thoughts?
I'm certainly fine with that.  When the new API is in, can even decide selectively whether to follow each redirect (Well, it's already in, but working on making it friendlier to use).
For terminology clarity, we don't have a NavigationEntry, so I think you're proposing to read GetURL() off the URLFetcher(), and check that this really is a redirect (3xx response code?  Or will that not catch internal redirects?).

I think that'd probably work.  I'd still prefer some kind of visibility of "what generated this redirect" (and then we just ignore internally-generated ones), but this would suffice.  The API to decide whether to follow or stop on the redirects would be a boon either way.
You can indeed get the URL when it stops on redirects.
Owner: mpear...@chromium.org
Since this sounds like a plan, I'll tentatively assign it to myself, likely for omnibox bug fixit time in September.
Status: Started (was: Assigned)
Project Member

Comment 45 by bugdroid1@chromium.org, Sep 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/16b2e01283be45df79b0b1c2e089ab21c15fcb02

commit 16b2e01283be45df79b0b1c2e089ab21c15fcb02
Author: Mark Pearson <mpearson@chromium.org>
Date: Fri Sep 15 23:14:18 2017

Accidental Search Infobar - Follow Redirect to HTTPS

Normally if the alternative navigation infobar logic (a.k.a. the
did-you-mean infobar logic) sees a redirect in response
to its request, it assumes the request will succeed if followed.
(Otherwise, why bother setting up the redirect?)  In these
cases, it displays the did-you-mean infobar immediately.

This assumption, though usually correct, is wrong in certain contexts*.
This change causes Chrome to follow a redirect if the redirect
is from http to https with no other changes (e.g., from http://example/
to https://example/, no path changes or anything).

The rest of the logic remains the same.  In particular, if that
second request fails, the infobar is not displayed.  If the request
succeeds, it is.  Also, if that request redirects again, Chrome
optimistically assumes the request will end up succeeding and
displays the infobar without following the next redirect.


*Contexts in which the initial assumption is wrong:
- Users who have the HTTPS Everywhere extension enabled with the setting
  "Block all unencrypted requests".  (All requests get redirected to
  https://.)  Note that this setting is *not* the default for the extension.
- Users who enter "google" in the omnibox (or any other preloaded HSTS
  domain name, though google is the only one I'm aware of at the moment).
  For these Chrome generates an internal redirect to https://google/.
- Users on networks that return 3xx redirects to the https version for all
  requests for local sites.


Tested interactively in the top two contexts listed above.
1. Previously the did-you-mean infobar was displayed on all single-word
   omnibox inputs.  After this change, it is generally not displayed
   (except on those where clicking usually succeed).
2. Previously "google" would always trigger the infobar.  Now it does not,
   at least on my network.  (The internal request to https://google/ does
   not succeed.)
I have also verified with LOG(INFO) logging during my experiments that
Chrome does not send additional URL fetches (following redirects)
at times I wouldn't expect.  I only saw new fetchers being created
at times that make sense with this change.

I plan to test this programmatically.  I am adding tests in separate
change because the alternate nav infobar triggering logic is not
currently tested, and it requires some work to set up a harness.
That work is tracked in  bug 764843 .  When adding the harness and unit
tests, I will be sure to unit test this new functionality.

Bug:  698660 , 521009 
Change-Id: I62265191cb409c76f4c0cfc85db474a6fbb7e197
Reviewed-on: https://chromium-review.googlesource.com/664921
Commit-Queue: Mark Pearson <mpearson@chromium.org>
Reviewed-by: Peter Kasting <pkasting@chromium.org>
Cr-Commit-Position: refs/heads/master@{#502432}
[modify] https://crrev.com/16b2e01283be45df79b0b1c2e089ab21c15fcb02/chrome/browser/ui/omnibox/chrome_omnibox_navigation_observer.cc
[modify] https://crrev.com/16b2e01283be45df79b0b1c2e089ab21c15fcb02/chrome/browser/ui/omnibox/chrome_omnibox_navigation_observer.h

Status: Fixed (was: Started)
Fixed!

Thanks for mmenke@ and rsleevi@ for your enlightening discussions.  I learned a lot about foibles of and planned changes to net/ from them.  That said, I should apologize that we took so much of your time for these discussions when it turns out it was reasonably straightforward to fix this simply on the omnibox end without a new parameter to URLFetcher.
No apologies necessary - it's certainly understandable, given how many sharp edges here (some Google-specific, some generic).  I'm really happy that your solution ended up being so simple and elegant - and really glad you came up with it :)

Sign in to add a comment