URLFetcher stop-on-redirect behavior should ignore internal redirects |
||||||||||
Issue descriptionChrome Version: (copy from chrome://version) OS: (e.g. Win7, OSX 10.9.5, etc...) What steps will reproduce the problem? (1) Type "google" into the omnibox. (2) Hit enter. What is the expected result? Search for "google". What happens instead? Search for "google", and a navigational aid to `http://google/`. I think the new `.google` gTLD is confusing our heuristics here. pkasting@, would you mind triaging this into the appropriate queue/priority?
,
Mar 6 2017
I don't have a host named `google` on my intranet. > [09:14] $ curl -I http://google > curl: (6) Could not resolve host: google Likewise: > [09:14] $ dig google > > ; <<>> DiG 9.8.3-P1 <<>> google > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28307 > ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 > > ;; QUESTION SECTION: > ;google. IN A > > ;; Query time: 2 msec > ;; SERVER: 192.168.1.1#53(192.168.1.1) > ;; WHEN: Mon Mar 6 09:17:04 2017 > ;; MSG SIZE rcvd: 24 I was assuming that the new gTLD `.google` (e.g. `https://blog.google/`) had something to do with the behavior I'm seeing. That might not be the case, though, as I'm not seeing the same behavior for other new TLDs like `yandex` or `paris`. Is there any additional information I can give you about my network that might explain the behavior? I've heard anecdotes from two other people regarding this behavior, I'm sure I could ask them for the same information.
,
Mar 6 2017
Oh I forgot, also make sure you don't have HTTPS Anywhere or other similar proxies/extensions installed that redirect all HTTP to HTTPS requests, or you get behaviors like this. See bug 521009 . If it's not that either, I'd say capture a network trace of this.
,
Mar 6 2017
This looks relevant: we've added `.google` as a preloaded HSTS domain, which means we generate an internal redirect for `http://google/`. Perhaps that internal redirect is being treated as "success" for the host lookup? That seems consistent with bug 521009 . ------ 1397723: URL_REQUEST http://google/ Start Time: 2017-03-06 11:30:38.079 t=2809 [st=0] +REQUEST_ALIVE [dt=1] t=2809 [st=0] URL_REQUEST_DELEGATE [dt=0] t=2809 [st=0] +URL_REQUEST_START_JOB [dt=1] --> load_flags = 64 (DO_NOT_SAVE_COOKIES) --> method = "HEAD" --> priority = "LOWEST" --> url = "http://google/" t=2809 [st=0] URL_REQUEST_REDIRECT_JOB --> reason = "HSTS" t=2809 [st=0] URL_REQUEST_FAKE_RESPONSE_HEADERS_CREATED --> HTTP/1.1 307 Internal Redirect Location: https://google/ Non-Authoritative-Reason: HSTS t=2810 [st=1] URL_REQUEST_DELEGATE [dt=0] t=2810 [st=1] CANCELLED t=2810 [st=1] -REQUEST_ALIVE ------ Though this seems like something of a niche case ("google" is the only TLD that's HSTS preloaded), "google" seems like a word folks might unintentionally search for with some frequency. Perhaps we could teach `URLFetcher::SetStopOnRedirect` to ignore "Internal Redirect" responses? That might be a relatively small change to `URLFetcherCore::OnReceivedRedirect`. WDYT, Ryan?
,
Mar 6 2017
,
Mar 6 2017
I mentioned to Mike out-of-band, but this is similar in effect to Issue 669785 / Issue 479620. While those apply to either A-record bearing labels or (now fixed Issue 670093 / 127.0.53.53 bearing resolutions), the effect was similar. I'm ccing mmenke@ & rdsmith@ for the suggestion in Comment #4 - I think we've been hesitant to add more knobs and whistles to URLFetcher, since URLFetcher was supposed to be the "mostly knob-and-whistleless, easy-to-use version of URLRequest". That said, from looking at the effective usage of SetStopOnRedirect (DIAL and the Data Reduction Proxy are the only two, besides this), in terms of actual effect, it should be eminently safe and arguably reasonable to do.
,
Mar 6 2017
,
Mar 6 2017
At least from the perspective of this consumer, such an internal redirect should not be considered a "redirect" for purposes of stopping. I can't speak to DIAL/Data Reduction. Retitling bug to be about the simple proposal, for purposes of argument. Assigning to mmenke to triage/fix from the network stack side, since it seems like the fix is either to change the behavior of "stop on redirect" as my new title suggests, or to add Yet Another Switch to tweak this somewhere. We had at least one external person report this as well (see duped-in bug) and I would imagine "google" is not the most uncommon search term. So we should try to fix sooner rather than later.
,
Mar 6 2017
I'm not sure we want URLFetcher to know what an internal redirect is. I'm also not entirely sure *I* know what an internal redirect is. Are extensions internal? Is the first visit to http://foo.com/ that redirects us to https://foo.com, which adds an HSTS entry, really fundamentally different from subsequent requests for http://foo.com, where we use the HSTS redirects? Are the about: to chrome:// redirects "internal redirects"? What about redirects created by the ProtocolHandlerRegistry, whatever that is? Is the google HSTS entry is because we expect it to be usable alone at some point? Or is it because there's no "only match subdomains" option for HSTS? In the former case, as soon as Google makes sense alone, we're in the same boat.
,
Mar 6 2017
@comment 9: There should be no A-labels on bare gTLDs, so "https://google" should never make sense (per SSAC053 report - https://www.icann.org/groups/ssac/documents). "google" is included because all subdomains of this gTLD are expected to be HSTS applied. In an ideal world, we might someday hope for "com" to be an entry, but until then... :) 'internal' redirects are effectively all redirects handled by non-network entities, which includes both extensions and HSTS (aka anything using NetworkDelegate and friends). So the answer to all of your questions is 'yes'
,
Mar 6 2017
And FWIW, ignoring extension-created redirects would fix some cases of bug 521009 , so having all the answers here be "yes", and ignoring all these cases, still is good (again, from the perspective of this consumer).
,
Mar 7 2017
,
Mar 7 2017
For naming, SetStopOnRedirect => SetStopOnNetworkRedirect perhaps?
,
Mar 7 2017
Don't think that name works, as it means something entirely different from HttpResponseInfo::network_accessed. I remain very skeptical of whether we want to expose this via the URLRequest API.
,
Mar 7 2017
@mmenke: Do you have an alternative proposal? I understand the skepticism, but I would argue the problem is legitimate, so we should find a solution.
,
Mar 7 2017
Could I get access to issue 479620? Without seeing it, I'm not sure what we're trying to fix with issue 669785, so don't think I have enough context to come up with an alternate solution that solves those issues as well.
,
Mar 7 2017
(If I just wanted to solve this particular issue, the answer is obvious - don't create bogus redirects when trying to make requests to URLs that should never make sense)
,
Mar 7 2017
@comment 17: I'm not sure I understand who you suggest the 'agency' belongs to, but these URLs may or may not make sense (it's more of a guideline than a rule), especially in the presence of DNS suffix search on the client. As per @comment 11, the desire is clear from (what looks like) the consumers, which is "don't do redirects" But I don't think from an API perspective we want a "Don't do HSTS" flag. Seems a giant API footgun. Maybe I'm wrong though. So "Stop on internally generated redirects, like HSTS" seems right.
,
Mar 7 2017
That wouldn't solve this particular issue completely, because there are URLs that would plausibly not be "should never make sense" (so we might create bogus redirects to them), but they don't actually exist in the wild. The solution in comment 17 solves the particular case of "google" because it happens to be a gTLD, but the behavior here is just as broken for non-gTLDs, and stopping on these internal redirects is still just as undesirable.
,
Mar 7 2017
(For clarity, comment 19 was directed at comment 17, not comment 18)
,
Mar 7 2017
My main reluctance here is API bloat - we'd need to add yet more cruft to URLRequest (Or HttpResponseInfo or RedirectInfo), and expose it to the world. We'd also need to audit all URLRequestJob classes (Not just redirects, I guess?) and figure out which are "internal". And of course, with the mojo switch, we suddenly need to add this to IPCs, too, which is even weirder.
,
Mar 10 2017
,
Mar 10 2017
Just so we don't stall out on this issue: What should our next steps be? From talking offline with Matt, I have a better understanding of the concerns expressed in Comment #21, which is that we don't surface any form of distinction about these internalized redirects - a redirect is a redirect is a redirect. We don't attach any explicit metadata (e.g. in the synthesized redirect) that can be trusted, and we don't attach any internal metadata to the code to indicate it's a request chain. An alternative solution, with its own set of tradeoffs, no doubt, is for the omnibox code to not treat redirects as successful signals. That will change from false positives (for google) to false negatives (for intranet sites that redirect). pkasting, what's your sense of the trade-offs with that approach?
,
Mar 10 2017
We added stop-on-redirect for a couple of reasons: * Sites with lots of redirects, especially through slow machines (common in Google, where lots of intranet sites end up being random small servers run as one-offs by some team) took time to follow the redirects. This delayed the appearance of the infobar, sometimes by several seconds, which felt really broken. * Some servers behind redirects responded to HEAD with an error and GET with a valid response, in violation of the HTTP spec. Stop-on-redirects reduced the number of cases where this error made us believe there was no server. * IIRC, there were auth issues, where somehow the internal corp auth pages were interfering with things, and stopping on redirects allowed us to treat as successful things that would otherwise fail if you weren't authenticated yet? I don't recall clearly. It would not be the end of the world to go back to not having stop-on-redirects, but it would definitely be a bad UX. If the decision at the network stack level was that adding the kind of metadata you describe, so these can be distinguished, is unacceptable, then I would probably put in some kind of hardcoded check for this specific case, as well as the solution elsewhere to detect "HTTPS Everywhere" in the IntranetRedirectDetector. These solutions are fragile and imperfect, but they'd be better than dropping stop-on-redirects.
,
Mar 13 2017
Can HSTS entries normally be added to non-navigable domains? If this is just an issue with the hard-coded HSTS table, perhaps the solution is to fix that? If we're also concerned about the offline case, of course, that doesn't work.
,
Mar 13 2017
@Comment 25: Yes, and it's not an issue with the HSTS list, because it's a totally valid expression.
,
Mar 20 2017
It may be simpler to gate on was_network_accessed - this does mean we could follow a couple cached redirects that didn't require revalidation, but it doesn't require a new API, and means we at most only have one slow request (Modulo ServiceWorker and the like being slow), so I think it may be good enough.
,
Mar 20 2017
We could also gate on was_network_accessed() || was_cached(), but I wouldn't be surprised if other caches set was_cached() to true (Like SW's).
,
Mar 21 2017
What is the proposal to solve the case where an ISP network assigns a DNS server via DHCP that forges DNS responses for invalid domains (NX domain hijacking)? Chromium attempts to detect this by using short names without a TLD, but what if ISPs change that behavior and stop spoofing short names so it is no longer detectable? The omnibox should be able to determine when I type "foo" into the omnibox that I mean to go to my default search engine provider to query that term. If Chromium cannot reliably figure that out because internal or external detection of redirects is ambiguous, then please provide an option for users to DISABLE OMNIBOX SHORT NAME HOST ATTEMPTS. Eg. If I never run chromium on a network that requires short host names, then please allow users to turn it off via an option. Advanced users know that they don't have a hostname "foo" and they instead just want the search engine result. Adding this simple option will remove the ambiguity and allow users to have control over this ambiguity which is now showing itself as a problem at an even deeper level in chromium now. Just let us disable it. As you already mentioned, it is confusing to others using extensions that act as "internal" proxies as well.
,
Mar 21 2017
Basically, we cross that bridge when we come to it. We currently use several different probes with varying character sequences and varying lengths, so while it's still possible to detect these, it's not just trivially simple, and I'm not aware of any ISPs working to defeat this.
,
Mar 23 2017
[pkasting]: Thoughts on comment #27?
,
Mar 23 2017
I don't know how often redirects are cached, so it's hard to judge practical impact. From a semantic perspective, I'm uncomfortable that the code seems to be getting further away from expressing its core intent, which is to ask "is this a navigable target". Speaking about whether this is an internally-generated redirect more clearly speaks to "is there actually a server here" than speaking about "was the network accessed", which is more indirect. I can't tell you how to make these tradeoffs, but it would be nice to optimize for "code expresses exactly what we want" rather than "minimal API impact".
,
Mar 23 2017
I'm not so sure...The cached entry tells you "was there a server there", rather than whether there still is (think offline, for instance). Anyhow, I don't think we want to update all 47 URLRequestJob subclasses to indicate if a response is internal or external.
,
Jun 26 2017
It looks like we've stalled out on the discussion about where to put the logic to fix this. mmenke@, it sounds like you've got all the information you need to make a decision about a design. Can you please do so? thanks!
,
Jun 26 2017
We'll start moving off URLFetcher in the forseeable future, and I don't want to waste time adding features to it before then.
,
Jun 26 2017
"The foreseeable future" == next few months? Next year? ... Is there a bug (moving off URLFetcher) that we should simply block this bug on?
,
Jun 26 2017
I'm working on making the mojo interface available right now, for the system request context, with the profile's main request contexts hopefully coming in the next 2 or 3 weeks. It's unclear if using the Mojo interface is simple enough for this use case, or if we want to wait for the URLFetcher replacement, but I think the Mojo interface may be enough, if you guys want to switch over to it. I'm unsure if there will be a perf impact, though (Perf is currently delaying rolling out a Mojo interface to web initiated requests).
,
Jul 28 2017
CC tommycli@, as he's been recently thinking about this area as part of bug 437999. Maybe even his fix "Omnibox Alternate Navigation Infobar: Fix navigation notifications" https://chromium.googlesource.com/chromium/src.git/+/333ba76c06e6a5805508a492e58a3b3b28f98393 handles this case? If not, at least he may come at this with fresh eyes.
,
Aug 10 2017
A random idea:
Can we hack ChromeOmniboxNavigationObserver so that, if the navigation returns successfully with the original parameters (don't follow redirects), we check to see if the resulting NavigationEntry is the original requested URL, just with HTTPS replaced with HTTP. If so, we then shoot the HTTPS request off? Only if that request returns successfully (again with stop-on-redirect) then do we display the alternate nav ("did you mean?) infobar?
In other words, we stick with the stop-on-redirect behavior that pkasting@ claims is important (see comment #24) except, at the omnibox level, we explicitly follow at most one redirect, a redirect from http://x -> https://x.
I believe this should solve both the google HSTS issue and the "https everywhere" extension issue.
Thoughts?
,
Aug 10 2017
I'm certainly fine with that. When the new API is in, can even decide selectively whether to follow each redirect (Well, it's already in, but working on making it friendlier to use).
,
Aug 10 2017
For terminology clarity, we don't have a NavigationEntry, so I think you're proposing to read GetURL() off the URLFetcher(), and check that this really is a redirect (3xx response code? Or will that not catch internal redirects?). I think that'd probably work. I'd still prefer some kind of visibility of "what generated this redirect" (and then we just ignore internally-generated ones), but this would suffice. The API to decide whether to follow or stop on the redirects would be a boon either way.
,
Aug 10 2017
You can indeed get the URL when it stops on redirects.
,
Aug 10 2017
Since this sounds like a plan, I'll tentatively assign it to myself, likely for omnibox bug fixit time in September.
,
Aug 31 2017
,
Sep 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/16b2e01283be45df79b0b1c2e089ab21c15fcb02 commit 16b2e01283be45df79b0b1c2e089ab21c15fcb02 Author: Mark Pearson <mpearson@chromium.org> Date: Fri Sep 15 23:14:18 2017 Accidental Search Infobar - Follow Redirect to HTTPS Normally if the alternative navigation infobar logic (a.k.a. the did-you-mean infobar logic) sees a redirect in response to its request, it assumes the request will succeed if followed. (Otherwise, why bother setting up the redirect?) In these cases, it displays the did-you-mean infobar immediately. This assumption, though usually correct, is wrong in certain contexts*. This change causes Chrome to follow a redirect if the redirect is from http to https with no other changes (e.g., from http://example/ to https://example/, no path changes or anything). The rest of the logic remains the same. In particular, if that second request fails, the infobar is not displayed. If the request succeeds, it is. Also, if that request redirects again, Chrome optimistically assumes the request will end up succeeding and displays the infobar without following the next redirect. *Contexts in which the initial assumption is wrong: - Users who have the HTTPS Everywhere extension enabled with the setting "Block all unencrypted requests". (All requests get redirected to https://.) Note that this setting is *not* the default for the extension. - Users who enter "google" in the omnibox (or any other preloaded HSTS domain name, though google is the only one I'm aware of at the moment). For these Chrome generates an internal redirect to https://google/. - Users on networks that return 3xx redirects to the https version for all requests for local sites. Tested interactively in the top two contexts listed above. 1. Previously the did-you-mean infobar was displayed on all single-word omnibox inputs. After this change, it is generally not displayed (except on those where clicking usually succeed). 2. Previously "google" would always trigger the infobar. Now it does not, at least on my network. (The internal request to https://google/ does not succeed.) I have also verified with LOG(INFO) logging during my experiments that Chrome does not send additional URL fetches (following redirects) at times I wouldn't expect. I only saw new fetchers being created at times that make sense with this change. I plan to test this programmatically. I am adding tests in separate change because the alternate nav infobar triggering logic is not currently tested, and it requires some work to set up a harness. That work is tracked in bug 764843 . When adding the harness and unit tests, I will be sure to unit test this new functionality. Bug: 698660 , 521009 Change-Id: I62265191cb409c76f4c0cfc85db474a6fbb7e197 Reviewed-on: https://chromium-review.googlesource.com/664921 Commit-Queue: Mark Pearson <mpearson@chromium.org> Reviewed-by: Peter Kasting <pkasting@chromium.org> Cr-Commit-Position: refs/heads/master@{#502432} [modify] https://crrev.com/16b2e01283be45df79b0b1c2e089ab21c15fcb02/chrome/browser/ui/omnibox/chrome_omnibox_navigation_observer.cc [modify] https://crrev.com/16b2e01283be45df79b0b1c2e089ab21c15fcb02/chrome/browser/ui/omnibox/chrome_omnibox_navigation_observer.h
,
Sep 16 2017
Fixed! Thanks for mmenke@ and rsleevi@ for your enlightening discussions. I learned a lot about foibles of and planned changes to net/ from them. That said, I should apologize that we took so much of your time for these discussions when it turns out it was reasonably straightforward to fix this simply on the omnibox end without a new parameter to URLFetcher.
,
Sep 18 2017
No apologies necessary - it's certainly understandable, given how many sharp edges here (some Google-specific, some generic). I'm really happy that your solution ended up being so simple and elegant - and really glad you came up with it :) |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by pkasting@chromium.org
, Mar 6 2017