Issue metadata
Sign in to add a comment
|
NTLM proxy auth + TLS 1.3 version interference results in failure
Reported by
bob.geis...@gmail.com,
Mar 19 2018
|
||||||||||||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36 Example URL: www.fidelity.com Steps to reproduce the problem: 1. Enterprise PAC file configured to send all internet traffic over proxy for NT user tagging of internet traffic for internal auditing purposes. 2. Navigate to www.fidelity.com 3. bootstrap.js from https://nexus.ensighten.com/fidelity/hpoptasync/Bootstrap.js will remain pending for an indefinite amount of time What is the expected behavior? The website typically loads within 3 seconds, this behavior presented itself after the update from v64 to v65. What went wrong? According to net-internals https://nexus.ensighten.com/fidelity/hpoptasync/Bootstrap.js experiences an SSL handshake error during page load, the weird part is that it should timeout, but it never appears to time out. Attached is a net-export JSON file capturing the specifics of the issue as it is occuring. Did this work before? Yes 64.0.3282.186 Chrome version: 65.0.3325.146 Channel: stable OS Version: Server 2016 Standard Flash Version:
,
Mar 20 2018
Thanks for filing the issue! The issue seems to be out of scope for triaging it from our end as this is related to configuring Enterprise PAC(Proxy Auto-Configuration) file which sends all internet traffic over proxy for some internal auditing purposes, hence adding label "TE-NeedsTriageHelp" and requesting someone from "Internals>Network" / "Blink>JavaScript" team(s) to have a look into it and help in further triaging the issue. Adding component Blink>JavaScript, please remove if not applicable.
,
Mar 21 2018
Judging by the net-log it looks like a repeating cycle of:
1. PROXY_CLIENT_SOCKET_WRAPPER initiates request to proxy
2. proxy rejects with "HTTP/1.1 407 Proxy Authentication Required"
3. PROXY_CLIENT_SOCKET_WRAPPER re-initiates request to proxy with auth
4. proxy accepts with "HTTP/1.1 200 Connection established"
5. SSL_CONNECT_JOB reuses socket but then fails with:
SSL_VERSION_INTERFERENCE_PROBE
--> net_error = -101 (ERR_CONNECTION_RESET)
6. SSL_CONNECT_JOB then spawns new HTTP_PROXY_CONNECT_JOB which in turn spawns a new PROXY_CLIENT_SOCKET_WRAPPER and we go back to step #1.
,
Mar 21 2018
Oh yuck. Well, the infinite loop certainly shouldn't be happening. It seems the SSLConnectJob-level retry is interacting badly with NTLM's socket affinity. That a lower-level piece (socket auth) expects the retry to happen at HttpNetworkTransaction means a ConnectJob-level retry doesn't work well. That's awkward... perhaps we need to move the retry back up to HttpNetworkTransaction, but then there are other socket pool issues. I'll need to ponder that. (+mmenke if you have thoughts) Separately, that the connection is being reset at all is probably a sign of a buggy proxy. What kind of proxy are you running?
,
Mar 21 2018
,
Mar 21 2018
> there are other socket pool issues Actually, in this instance, I think the socket pool issues aren't that bad. We'll need to bifurcate the pools again, but the second copy is always doomed to fail, so it's not too bad? This does, however, mean that the whole idea of doing probes at the connect job to avoid bifurcating socket pools does not work.
,
Mar 21 2018
Hrm, actually I'm not positive I fully understand how this proxy auth flow works. I suspect the conclusion is right, but I'd like to understand the trigger. Matt, I may try to grab you tomorrow (or more likely Friday, given the weather) to try to understand this.
,
Mar 21 2018
Auth is weird. For tunnels, we request a socket from the proxy socket pool, it makes an auth wrapper socket, gets a socket from a lower layer pool, and then we send the request (Without credentials, unless we pre-emptively attach them). If we get a challenge, we put together a response using <magic> and try to respond. If we need a new socket (because the first was closed, or because we got a connection close response), we'll create a new one with the credentials...We'll even do this if the auth scheme in use our progress in multi-round auth means we aren't allowed to use a new socket (I don't think that latter part's the problem here). Why is SSL_CONNECT_JOB spawning a new proxy connect job in this case? We need to use the same ProxyClientSocketWrapper for auth negotiation. Changing one in the middle seems not good.
,
Mar 21 2018
SSLConnectJob, or someone in the stack, tries to connect twice sometimes. If we hit error codes correlated with defective middleboxes, we retry without TLS 1.3, just to map the error code accordingly. Previously, this sort of thing would be done in the HttpNetworkTransaction, but since it required different SSLConfigs, we'd need to then bifurcate the socket pools. At some point we chatted about this and decided to go with SSLConnectJob-level retries for new ones, this one added here: https://codereview.chromium.org/2800853008 But since proxy auth (below SSLConnectJob) appears to tear down everything and continue with a new SSLConnectJob, we lose that retry state and end up bouncing repeatedly rather than flagging version interference. :-( I'm still not fully following how this flow works... it looks like we hit HttpStreamRequest::RestartTunnelWithProxyAuth and ultimately calls ProxyClientSocket::RestartWithAuth. But I don't see how the HttpStreamFactoryImpl::Job could have gotten its hands on the ProxyClientSocket, skipping the SSLConnectJob?
,
Mar 21 2018
@David Ben The proxy in question is a Blucoat appliance. Firmware Version: SGOS 6.6.5.2 Proxy Edition It only tags named users to web traffic, no filtering or anything of the sort. Thanks!
,
Mar 21 2018
Thank you for providing more feedback. Adding the requester to the cc list. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Mar 22 2018
I believe the HttpStream for the tunnel is passed directly to the HttpStreamFactoryImplJob (HttpStreamFactoryImpl::Job::OnHttpsProxyTunnelResponseCallback), which passes it on up to the NetworkTransaction, in this specific case, so auth doesn't have to be sent through the SSL layer.
,
Mar 23 2018
(For posterity, the socket is smuggled out of ClientSocketHandle::set_pending_http_proxy_connection, which is slightly distressing, but it explains why the SSLConnectJob got lost. From chatting with mmenke, it sounds like going back to HttpNetworkTransaction with socket pool bifurcating is the way to go. :-/
,
Mar 23 2018
Issue 823757 has been merged into this issue.
,
Mar 23 2018
Looks like David is investigating.
,
Mar 30 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9 commit 83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9 Author: David Benjamin <davidben@chromium.org> Date: Fri Mar 30 01:07:52 2018 Bound the number of RestartWith* calls to HttpNetworkTransaction. This is a simple mitigation, but not a complete fix, for crbug.com/488043 and crbug.com/823387 . It is intended to improve robustness against both those sorts of bugs, and also against servers or proxies which repeatedly request auth for whatever reason. Although RestartWith* calls are typically associated with a user prompt, but in other scenarios (remembered preferences, extensions, multi-leg authentication), they may be triggered automatically. To avoid looping forever, bound the number of restarts. This will be followed by the complete fix for crbug.com/823387 , but this is intended to be small enough to hopefully merge. Bug: 488043, 823387 Change-Id: Ieb0193ba7aad93efe6e7e1e4cf3355c1fb53f811 Reviewed-on: https://chromium-review.googlesource.com/986677 Reviewed-by: Matt Menke <mmenke@chromium.org> Commit-Queue: David Benjamin <davidben@chromium.org> Cr-Commit-Position: refs/heads/master@{#547050} [modify] https://crrev.com/83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9/net/base/net_error_list.h [modify] https://crrev.com/83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9/net/http/http_network_transaction.cc [modify] https://crrev.com/83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9/net/http/http_network_transaction.h [modify] https://crrev.com/83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9/net/http/http_network_transaction_unittest.cc [modify] https://crrev.com/83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9/tools/metrics/histograms/enums.xml
,
Mar 30 2018
David, we also have RestartWith* in proxy_resolving_client_socket.cc (used by remoting, webrtc, gcm and a few others). Do we need to cap the retries like you've done in http_network_transaction?
,
Apr 2 2018
xunjieli: Probably. Though for this particular issue, I'll move the error-mapping retry up to HttpNetworkTransaction, so it won't affect ProxyResolvingClientSocket. TPMs: I'd like to merge this to M66, if this is okay. This issue effects very few deployments and doesn't cause things that would otherwise succeed to fail, but it causes Chrome to retry an unhealthy amount in that situation. The above CL mitigates the worst of it for M66. For the complete fix, https://chromium-review.googlesource.com/c/chromium/src/+/990812 is in flight for M67. Though, if you prefer, we can also get that landed and merge it instead. (It's less complex than it looks. Most of it is deleting dead code.)
,
Apr 2 2018
This bug requires manual review: M66 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), josafat@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 2 2018
proxy_resolving_client_socket is limited by the number of proxies, in terms of retrying, and doesn't accept auth credentials from the user, but rather relies on the auth cache (Hrm...so it may not work with NTLM or negotiate at all?), so I don't think we need to do anything for it here.
,
Apr 3 2018
Approving merge for M66. branch:3359 - let's just target the mitigation for now, and full fix for M67.
,
Apr 3 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9da1059ba8fb64f341a63b6c34d6e974c2658b90 commit 9da1059ba8fb64f341a63b6c34d6e974c2658b90 Author: David Benjamin <davidben@chromium.org> Date: Tue Apr 03 20:47:05 2018 Bound the number of RestartWith* calls to HttpNetworkTransaction. This is a simple mitigation, but not a complete fix, for crbug.com/488043 and crbug.com/823387 . It is intended to improve robustness against both those sorts of bugs, and also against servers or proxies which repeatedly request auth for whatever reason. Although RestartWith* calls are typically associated with a user prompt, but in other scenarios (remembered preferences, extensions, multi-leg authentication), they may be triggered automatically. To avoid looping forever, bound the number of restarts. This will be followed by the complete fix for crbug.com/823387 , but this is intended to be small enough to hopefully merge. Bug: 488043, 823387 Change-Id: Ieb0193ba7aad93efe6e7e1e4cf3355c1fb53f811 Reviewed-on: https://chromium-review.googlesource.com/986677 Reviewed-by: Matt Menke <mmenke@chromium.org> Commit-Queue: David Benjamin <davidben@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#547050}(cherry picked from commit 83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9) Reviewed-on: https://chromium-review.googlesource.com/993832 Reviewed-by: Steven Valdez <svaldez@chromium.org> Cr-Commit-Position: refs/branch-heads/3359@{#562} Cr-Branched-From: 66afc5e5d10127546cc4b98b9117aff588b5e66b-refs/heads/master@{#540276} [modify] https://crrev.com/9da1059ba8fb64f341a63b6c34d6e974c2658b90/net/base/net_error_list.h [modify] https://crrev.com/9da1059ba8fb64f341a63b6c34d6e974c2658b90/net/http/http_network_transaction.cc [modify] https://crrev.com/9da1059ba8fb64f341a63b6c34d6e974c2658b90/net/http/http_network_transaction.h [modify] https://crrev.com/9da1059ba8fb64f341a63b6c34d6e974c2658b90/net/http/http_network_transaction_unittest.cc [modify] https://crrev.com/9da1059ba8fb64f341a63b6c34d6e974c2658b90/tools/metrics/histograms/enums.xml
,
Apr 6 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b commit 5cb9113e5d86fbf82e51145a26eb3e196c94ea5b Author: David Benjamin <davidben@chromium.org> Date: Fri Apr 06 05:54:49 2018 Move the TLS version interference probe to HttpNetworkTransaction. Having the logic in the SSLConnectJob, while easier on the socket pools, interacts badly with the proxy tunneling logic, which bypasses the SSLConnectJob layer and so we lose the state on whether we were retrying. The end result is, if the origin always resets the connection, we retry without bound. This doesn't cause connections that would otherwise succeeded to fail, but it's still not great. Instead, go back to our tried and true mechanism of trying at HttpNetworkTransaction, at the cost of more socket pool bifurcation. Note this comes with a slight change in metrics, due to things shifting around: Net.SSL_Connection_Error will report the original error code (e.g. ERR_CONNECTION_RESET) rather than ERR_SSL_VERSION_INTERFERENCE. Net.ErrorCodesForMainFrame and user-visible errors remain the same. Additionally, rather than do a lot of work to plumb Net.SSLVersionInterferenceDetails's information up, I've just removed that metric. It was useful when we were developing the protocol workarounds for the various non-compliant enterprise middleboxes, but now that the workarounds are applied, we don't need it anymore. On the test side of things, the existing ERR_SSL_VERSION_INTERFERENCE tests were sufficiently high-level that they cover this new implementation too. I've additionally added a regression test to cover this particular issue. Bug: 823387 Change-Id: Ia0b5b18c4cfa65fed55f7c1d937623ab37ff741d Reviewed-on: https://chromium-review.googlesource.com/990812 Reviewed-by: Ilya Sherman <isherman@chromium.org> Reviewed-by: Matt Menke <mmenke@chromium.org> Reviewed-by: Steven Valdez <svaldez@chromium.org> Commit-Queue: David Benjamin <davidben@chromium.org> Cr-Commit-Position: refs/heads/master@{#548691} [modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/http/http_network_transaction.cc [modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/http/http_network_transaction.h [modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/http/http_network_transaction_unittest.cc [modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/client_socket_pool_manager.cc [modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/ssl_client_socket.cc [modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/ssl_client_socket.h [modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/ssl_client_socket_impl.cc [modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/ssl_client_socket_impl.h [modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/ssl_client_socket_pool.cc [modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/ssl_client_socket_pool.h [modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/tools/metrics/histograms/enums.xml [modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/tools/metrics/histograms/histograms.xml
,
Apr 6 2018
Alright, the full fix should now be in M67, and M66 will contain a mitigation to make it slightly less bad.
,
Apr 11 2018
,
Apr 17 2018
davidben@ Could you please help us with the repro steps to verify this fix from TE-End Thank You...
,
Apr 17 2018
yanglee@, can you please get it verified on Latest Stable RC#66.0.3359.117 as this requires "NTLM authenticating proxy" ?
,
Apr 18 2018
+ yining and Jyothi who will work repro. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by krajshree@chromium.org
, Mar 20 2018