New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 823387 link

Starred by 7 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug-Regression



Sign in to add a comment

NTLM proxy auth + TLS 1.3 version interference results in failure

Reported by bob.geis...@gmail.com, Mar 19 2018

Issue description

UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36

Example URL:
www.fidelity.com

Steps to reproduce the problem:
1. Enterprise PAC file configured to send all internet traffic over proxy for NT user tagging of internet traffic for internal auditing purposes. 
2. Navigate to www.fidelity.com
3. bootstrap.js from https://nexus.ensighten.com/fidelity/hpoptasync/Bootstrap.js will remain pending for an indefinite amount of time

What is the expected behavior?
The website typically loads within 3 seconds, this behavior presented itself after the update from v64 to v65.

What went wrong?
According to net-internals https://nexus.ensighten.com/fidelity/hpoptasync/Bootstrap.js experiences an SSL handshake error during page load, the weird part is that it should timeout, but it never appears to time out. Attached is a net-export JSON file capturing the specifics of the issue as it is occuring.

Did this work before? Yes 64.0.3282.186 

Chrome version: 65.0.3325.146  Channel: stable
OS Version: Server 2016 Standard
Flash Version:
 
chrome-net-export-log.json
1.2 MB View Download
Labels: Needs-Bisect Needs-Triage-M65
Cc: vamshi.kommuri@chromium.org
Components: Blink>JavaScript
Labels: Triaged-ET TE-NeedsTriageHelp
Thanks for filing the issue!

The issue seems to be out of scope for triaging it from our end as this is related to configuring Enterprise PAC(Proxy Auto-Configuration) file which sends all internet traffic over proxy for some internal auditing purposes, hence adding label "TE-NeedsTriageHelp" and requesting someone from  "Internals>Network" / "Blink>JavaScript" team(s) to have a look into it and help in further triaging the issue.

Adding component Blink>JavaScript, please remove if not applicable.
Components: -Internals>Network Internals>Network>SSL
Judging by the net-log it looks like a repeating cycle of:
1. PROXY_CLIENT_SOCKET_WRAPPER initiates request to proxy
2. proxy rejects with "HTTP/1.1 407 Proxy Authentication Required"
3. PROXY_CLIENT_SOCKET_WRAPPER re-initiates request to proxy with auth
4. proxy accepts with "HTTP/1.1 200 Connection established"
5. SSL_CONNECT_JOB reuses socket but then fails with:
                     SSL_VERSION_INTERFERENCE_PROBE
                      --> net_error = -101 (ERR_CONNECTION_RESET)
6. SSL_CONNECT_JOB then spawns new HTTP_PROXY_CONNECT_JOB which in turn spawns a new PROXY_CLIENT_SOCKET_WRAPPER and we go back to step #1.
Cc: mmenke@chromium.org
Components: -Blink>JavaScript Internals>Network>Proxy
Oh yuck. Well, the infinite loop certainly shouldn't be happening. It seems the SSLConnectJob-level retry is interacting badly with NTLM's socket affinity. That a lower-level piece (socket auth) expects the retry to happen at HttpNetworkTransaction means a ConnectJob-level retry doesn't work well. That's awkward... perhaps we need to move the retry back up to HttpNetworkTransaction, but then there are other socket pool issues. I'll need to ponder that. (+mmenke if you have thoughts)

Separately, that the connection is being reset at all is probably a sign of a buggy proxy. What kind of proxy are you running?
Labels: -Needs-Bisect Needs-Feedback
> there are other socket pool issues

Actually, in this instance, I think the socket pool issues aren't that bad. We'll need to bifurcate the pools again, but the second copy is always doomed to fail, so it's not too bad?

This does, however, mean that the whole idea of doing probes at the connect job to avoid bifurcating socket pools does not work.
Hrm, actually I'm not positive I fully understand how this proxy auth flow works. I suspect the conclusion is right, but I'd like to understand the trigger. Matt, I may try to grab you tomorrow (or more likely Friday, given the weather) to try to understand this.

Comment 8 by mmenke@chromium.org, Mar 21 2018

Auth is weird.  For tunnels, we request a socket from the proxy socket pool, it makes an auth wrapper socket, gets a socket from a lower layer pool, and then we send the request (Without credentials, unless we pre-emptively attach them).  If we get a challenge, we put together a response using <magic> and try to respond.  If we need a new socket (because the first was closed, or because we got a connection close response), we'll create a new one with the credentials...We'll even do this if the auth scheme in use our progress in multi-round auth means we aren't allowed to use a new socket (I don't think that latter part's the problem here).

Why is SSL_CONNECT_JOB spawning a new proxy connect job in this case?  We need to use the same ProxyClientSocketWrapper for auth negotiation.  Changing one in the middle seems not good.
SSLConnectJob, or someone in the stack, tries to connect twice sometimes. If we hit error codes correlated with defective middleboxes, we retry without TLS 1.3, just to map the error code accordingly. Previously, this sort of thing would be done in the HttpNetworkTransaction, but since it required different SSLConfigs, we'd need to then bifurcate the socket pools. At some point we chatted about this and decided to go with SSLConnectJob-level retries for new ones, this one added here:
https://codereview.chromium.org/2800853008

But since proxy auth (below SSLConnectJob) appears to tear down everything and continue with a new SSLConnectJob, we lose that retry state and end up bouncing repeatedly rather than flagging version interference. :-(

I'm still not fully following how this flow works... it looks like we hit HttpStreamRequest::RestartTunnelWithProxyAuth and ultimately calls ProxyClientSocket::RestartWithAuth. But I don't see how the HttpStreamFactoryImpl::Job could have gotten its hands on the ProxyClientSocket, skipping the SSLConnectJob?
@David Ben

The proxy in question is a Blucoat appliance.

Firmware Version: SGOS 6.6.5.2 Proxy Edition

It only tags named users to web traffic, no filtering or anything of the sort.

Thanks!
Project Member

Comment 11 by sheriffbot@chromium.org, Mar 21 2018

Cc: davidben@chromium.org
Labels: -Needs-Feedback
Thank you for providing more feedback. Adding the requester to the cc list.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
I believe the HttpStream for the tunnel is passed directly to the HttpStreamFactoryImplJob (HttpStreamFactoryImpl::Job::OnHttpsProxyTunnelResponseCallback), which passes it on up to the NetworkTransaction, in this specific case, so auth doesn't have to be sent through the SSL layer.
(For posterity, the socket is smuggled out of ClientSocketHandle::set_pending_http_proxy_connection, which is slightly distressing, but it explains why the SSLConnectJob got lost.

From chatting with mmenke, it sounds like going back to HttpNetworkTransaction with socket pool bifurcating is the way to go. :-/
Issue 823757 has been merged into this issue.
Components: Enterprise
Owner: davidben@chromium.org
Status: Assigned (was: Unconfirmed)
Summary: NTLM proxy auth + TLS 1.3 version interference results in failure (was: Occasionally cross domain java script will remain pending for nearly an hour.)
Looks like David is investigating.
Project Member

Comment 16 by bugdroid1@chromium.org, Mar 30 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9

commit 83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9
Author: David Benjamin <davidben@chromium.org>
Date: Fri Mar 30 01:07:52 2018

Bound the number of RestartWith* calls to HttpNetworkTransaction.

This is a simple mitigation, but not a complete fix, for
crbug.com/488043 and  crbug.com/823387 . It is intended to improve
robustness against both those sorts of bugs, and also against servers or
proxies which repeatedly request auth for whatever reason.

Although RestartWith* calls are typically associated with a user prompt,
but in other scenarios (remembered preferences, extensions, multi-leg
authentication), they may be triggered automatically. To avoid looping
forever, bound the number of restarts.

This will be followed by the complete fix for  crbug.com/823387 , but this
is intended to be small enough to hopefully merge.

Bug: 488043,  823387 
Change-Id: Ieb0193ba7aad93efe6e7e1e4cf3355c1fb53f811
Reviewed-on: https://chromium-review.googlesource.com/986677
Reviewed-by: Matt Menke <mmenke@chromium.org>
Commit-Queue: David Benjamin <davidben@chromium.org>
Cr-Commit-Position: refs/heads/master@{#547050}
[modify] https://crrev.com/83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9/net/base/net_error_list.h
[modify] https://crrev.com/83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9/net/http/http_network_transaction.cc
[modify] https://crrev.com/83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9/net/http/http_network_transaction.h
[modify] https://crrev.com/83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9/net/http/http_network_transaction_unittest.cc
[modify] https://crrev.com/83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9/tools/metrics/histograms/enums.xml

David, we also have RestartWith* in proxy_resolving_client_socket.cc (used by remoting, webrtc, gcm and a few others). Do we need to cap the retries like you've done in http_network_transaction?
Cc: lassey@chromium.org
Labels: Merge-Request-66
xunjieli: Probably. Though for this particular issue, I'll move the error-mapping retry up to HttpNetworkTransaction, so it won't affect ProxyResolvingClientSocket.

TPMs: I'd like to merge this to M66, if this is okay. This issue effects very few deployments and doesn't cause things that would otherwise succeed to fail, but it causes Chrome to retry an unhealthy amount in that situation. The above CL mitigates the worst of it for M66.

For the complete fix, https://chromium-review.googlesource.com/c/chromium/src/+/990812 is in flight for M67. Though, if you prefer, we can also get that landed and merge it instead. (It's less complex than it looks. Most of it is deleting dead code.)
Project Member

Comment 19 by sheriffbot@chromium.org, Apr 2 2018

Labels: -Merge-Request-66 Merge-Review-66 Hotlist-Merge-Review
This bug requires manual review: M66 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), josafat@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
proxy_resolving_client_socket is limited by the number of proxies, in terms of retrying, and doesn't accept auth credentials from the user, but rather relies on the auth cache (Hrm...so it may not work with NTLM or negotiate at all?), so I don't think we need to do anything for it here.
Labels: -Merge-Review-66 Merge-Approved-66
Approving merge for M66. branch:3359 - let's just target the mitigation for now, and full fix for M67. 
Project Member

Comment 22 by bugdroid1@chromium.org, Apr 3 2018

Labels: -merge-approved-66 merge-merged-3359
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9da1059ba8fb64f341a63b6c34d6e974c2658b90

commit 9da1059ba8fb64f341a63b6c34d6e974c2658b90
Author: David Benjamin <davidben@chromium.org>
Date: Tue Apr 03 20:47:05 2018

Bound the number of RestartWith* calls to HttpNetworkTransaction.

This is a simple mitigation, but not a complete fix, for
crbug.com/488043 and  crbug.com/823387 . It is intended to improve
robustness against both those sorts of bugs, and also against servers or
proxies which repeatedly request auth for whatever reason.

Although RestartWith* calls are typically associated with a user prompt,
but in other scenarios (remembered preferences, extensions, multi-leg
authentication), they may be triggered automatically. To avoid looping
forever, bound the number of restarts.

This will be followed by the complete fix for  crbug.com/823387 , but this
is intended to be small enough to hopefully merge.

Bug: 488043,  823387 
Change-Id: Ieb0193ba7aad93efe6e7e1e4cf3355c1fb53f811
Reviewed-on: https://chromium-review.googlesource.com/986677
Reviewed-by: Matt Menke <mmenke@chromium.org>
Commit-Queue: David Benjamin <davidben@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#547050}(cherry picked from commit 83ddfb3a1acae7a6876d8dd1bf8e70b9a3991fc9)
Reviewed-on: https://chromium-review.googlesource.com/993832
Reviewed-by: Steven Valdez <svaldez@chromium.org>
Cr-Commit-Position: refs/branch-heads/3359@{#562}
Cr-Branched-From: 66afc5e5d10127546cc4b98b9117aff588b5e66b-refs/heads/master@{#540276}
[modify] https://crrev.com/9da1059ba8fb64f341a63b6c34d6e974c2658b90/net/base/net_error_list.h
[modify] https://crrev.com/9da1059ba8fb64f341a63b6c34d6e974c2658b90/net/http/http_network_transaction.cc
[modify] https://crrev.com/9da1059ba8fb64f341a63b6c34d6e974c2658b90/net/http/http_network_transaction.h
[modify] https://crrev.com/9da1059ba8fb64f341a63b6c34d6e974c2658b90/net/http/http_network_transaction_unittest.cc
[modify] https://crrev.com/9da1059ba8fb64f341a63b6c34d6e974c2658b90/tools/metrics/histograms/enums.xml

Project Member

Comment 23 by bugdroid1@chromium.org, Apr 6 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b

commit 5cb9113e5d86fbf82e51145a26eb3e196c94ea5b
Author: David Benjamin <davidben@chromium.org>
Date: Fri Apr 06 05:54:49 2018

Move the TLS version interference probe to HttpNetworkTransaction.

Having the logic in the SSLConnectJob, while easier on the socket pools,
interacts badly with the proxy tunneling logic, which bypasses the
SSLConnectJob layer and so we lose the state on whether we were
retrying. The end result is, if the origin always resets the connection,
we retry without bound. This doesn't cause connections that would
otherwise succeeded to fail, but it's still not great.

Instead, go back to our tried and true mechanism of trying at
HttpNetworkTransaction, at the cost of more socket pool bifurcation.

Note this comes with a slight change in metrics, due to things shifting
around: Net.SSL_Connection_Error will report the original error code
(e.g. ERR_CONNECTION_RESET) rather than ERR_SSL_VERSION_INTERFERENCE.
Net.ErrorCodesForMainFrame and user-visible errors remain the same.

Additionally, rather than do a lot of work to plumb
Net.SSLVersionInterferenceDetails's information up, I've just removed
that metric. It was useful when we were developing the protocol
workarounds for the various non-compliant enterprise middleboxes, but
now that the workarounds are applied, we don't need it anymore.

On the test side of things, the existing ERR_SSL_VERSION_INTERFERENCE
tests were sufficiently high-level that they cover this new
implementation too. I've additionally added a regression test to cover
this particular issue.

Bug:  823387 
Change-Id: Ia0b5b18c4cfa65fed55f7c1d937623ab37ff741d
Reviewed-on: https://chromium-review.googlesource.com/990812
Reviewed-by: Ilya Sherman <isherman@chromium.org>
Reviewed-by: Matt Menke <mmenke@chromium.org>
Reviewed-by: Steven Valdez <svaldez@chromium.org>
Commit-Queue: David Benjamin <davidben@chromium.org>
Cr-Commit-Position: refs/heads/master@{#548691}
[modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/http/http_network_transaction.cc
[modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/http/http_network_transaction.h
[modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/http/http_network_transaction_unittest.cc
[modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/client_socket_pool_manager.cc
[modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/ssl_client_socket.cc
[modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/ssl_client_socket.h
[modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/ssl_client_socket_impl.cc
[modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/ssl_client_socket_impl.h
[modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/ssl_client_socket_pool.cc
[modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/net/socket/ssl_client_socket_pool.h
[modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/tools/metrics/histograms/enums.xml
[modify] https://crrev.com/5cb9113e5d86fbf82e51145a26eb3e196c94ea5b/tools/metrics/histograms/histograms.xml

Labels: M-67
Status: Fixed (was: Assigned)
Alright, the full fix should now be in M67, and M66 will contain a mitigation to make it slightly less bad.
Cc: susan.boorgula@chromium.org
 Issue 830551  has been merged into this issue.
Labels: Needs-Feedback
davidben@ Could you please help us with the repro steps to verify this fix from TE-End

Thank You...
Cc: yanglee@chromium.org
yanglee@, can you please get it verified on Latest Stable RC#66.0.3359.117 as this requires "NTLM authenticating proxy" ?
Cc: yini...@chromium.org jmukthavaram@chromium.org
+ yining and Jyothi who will work repro. 

Sign in to add a comment