Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Issue 694593 BlueCoat and other proxies hang up during TLS 1.3
Starred by 41 users Project Member Reported by jayhlee@google.com, Feb 21 Back to list
Status: Untriaged
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug-Regression


Hotlists containing this issue:
Hotlist-1


Sign in to add a comment
Chrome Version: 56
OS: Chrome and Windows

What steps will reproduce the problem?
(1) BlueCoat 6.5 proxy.
(2) Chrome OS 56 or Chrome browser 56
(3) Attempt to connect to a Google service (youtube, accounts.google.com, etc.

What is the expected result?
Successful connection. Client and proxy may negotiate down to TLS 1.2 instead of TLS 1.3.

What happens instead?
When Chrome attempts to connect via TLS 1.3, BlueCoat hangs up connection.

Further details:

We have at least one very large customer seeing similar issues against BlueCoat. The connection fails with SSL_HANDSHAKE_ERROR / ERR_CONNECTION_CLOSED. Customer found that restricting to TLS 1.2 via policy resolves the issue for Chrome 56 stable. Net internals logs are at:

https://drive.google.com/corp/drive/folders/0B3BtTQPWWixOMk1FNkhMekJnNEU (google.com view only)

Other large EDU customers are seeing similar issues and I'm working to gather details from them on proxy / firewall in use. Suspect many are using SSL / TLS inspection which is common among EDUs.

Marking this as ReleaseBlock-Stable and P1 as I believe this is breaking Chrome for many customers.

Bluecoat version is 6.5 for affected customer. 

 
Cc: svaldez@chromium.org
Cc: cbentzel@chromium.org
Cc: awhalley@chromium.org
Another customer using iBoss filtering solution is also seeing this issue.

This is going to be very problematic for many of our customers. We need to quickly work to provide a workaround solution for customers. Can we find a way to disable Chrome usage of TLS 1.3 at scale for customers (setting flag on each browser does not scale nor does it solve Chrome OS login screen issue where flags can't be set).

Some customers have noticed Chrome 56 does not always use TLS 1.3. For example, if they delete the "Local State" file in the user profile and restart Chrome, it will default to TLS 1.2 at least for awhile. Also, when Chrome OS devices are wiped and re-enrolled, they seem to default to 1.2 for awhile but then start using 1.3.

Can we get an explanation about how/when Chrome decides to use 1.3 and how customers can prevent it's use?
Hello all.

My environment is as follows:
Chromebooks: Upwards of 50,000 (out of 120,000) Chromebooks have updated to OS56. Anywhere upwards of 30% of those 50,000 Chromebooks are stuck in a state of flickering between a login screen and a "Network not available" screen. Occasionally, you can see a SSL_HANDSHAKE_ERROR briefly at the login screen before switching back to the "Network not available" screen.

PCs: 45,000 - 46,000 PCs have updated themselves to Chrome 56. Not all PCs are broken, but some are.

BlueCoat 6.5 (which doesn't appear to have native support for TLS 1.3)

Some solutions I've found thus far to get around the problem:
1. Force TLS 1.2 @ chrome://flags/#ssl-version-max - this works for PCs and Chromebooks, but only for the current user
2. Deleting the "Local State" file inside a Windows user profile (located at "%localappdata%\google\Chrome\User Data\Local State") - this works for PCs and seems to immediately resolve all issues
3. Bypassing the transparent proxy entirely

No explanation as to why deleting the "Local State" file fixes anything, but further digging/pruning of said file leads me to removing the entire "variations_compressed_seed" value, which at least temporary fixes Chrome on a PC. To aide users in fixing this quickly, this is the script that I'm making available to end-users:

tskill "chrome" /A
del "%localappdata%\google\Chrome\User Data\First Run" /f /q
del "%localappdata%\google\Chrome\User Data\Local State" /f /q
if exist "c:\Program Files (x86)\Google\Chrome\Application\chrome.exe" start "" "c:\Program Files (x86)\Google\Chrome\Application\chrome.exe"
if exist "c:\Program Files\Google\Chrome\Application\chrome.exe" start "" "c:\Program Files\Google\Chrome\Application\chrome.exe"

Some other info that I've discovered:
When a Chrome browser is "broken" (aka, can't access Google services), a cipher check will result in the attached "Before Fix.png" while after deleting the "Local State" file results in "After Fix.png".

Chromebooks will be very problematic to fix globally as they can't access any policy server to receive any sort of hotfix/policy change. Wiping/re-enrolling them does seem to work - even if they're on 56. That's still a very time consuming process for as many devices that may be involved.

Is there something that occurs during the upgrade from 55 to 56 that determines if the install will use 1.3 indefinitely? The "default" setting does seem to be 1.3, but what could be the explanation for the machines that are on 56 and working just fine? Those sites are pulling TLS 1.2 certs/handshakes.
Before Fix.png
134 KB View Download
After Fix.png
125 KB View Download
Labels: -Pri-1 Pri-0
(We're waiting on a response from Blue Coat. They were made aware of TLS 1.3 several months ago, but evidently did not test their software per our instructions.)
Update on Chromebooks that have been wiped as a temporary fix: Some devices will work for a few user logins, but it seems like they all eventually revert to their flickering state of being unable to connect. Auto-updates have been pinned to 55, but CBs @ 56 may need to be downgraded to 55 until this problem is resolved.
Labels: Hotlist-Enterprise
Labels: M-57
Cc: agl@chromium.org davidben@chromium.org
We've stopped Finch signaling Chrome to use TLS 1.3, which should be effective now. Thus Chrome that can check-in will receive instructions to disable TLS 1.3 and thus should stay "fixed" (for now).

Thus, if possible, just switching off interception on the BlueCoat device for a while should solve this.

URGENT - PTAL ASAP.

We're getting VERY close to M57 Stable promotion. And 
this issue is marked as M57 stable release blocker. Pls make sure to land the fix and get it merged into the release branch ASAP so it gets enough baking time in Beta (before Stable promotion).

Know that this issue shouldn't block the release?  Remove the ReleaseBlock-Stable label or move to M58.

Thank you.
Labels: -ReleaseBlock-Stable
Labels: ReleaseBlock-Stable
Cc: gkihumba@chromium.org
Labels: -M-57 M-56 Merge-Request-56
Hey TPMs/gkihumba,

As we chatted about out-of-band, backing TLS 1.3 out via field trials didn't work for some existing devices. May we land https://codereview.chromium.org/2711633004/ to M56? Thanks!
Labels: Merge-Approved-56
Project Member Comment 18 by bugdroid1@chromium.org, Feb 22
Labels: -merge-approved-56 merge-merged-2924
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a0b6b5ed613b17e6e87c97d0459d21367139c343

commit a0b6b5ed613b17e6e87c97d0459d21367139c343
Author: David Benjamin <davidben@chromium.org>
Date: Wed Feb 22 22:38:40 2017

Disconnect TLS 1.3 from base::FeatureList in M56.

Finch depends on the net stack so, if it can't talk to the servers, we
can't use Finch to undo a sufficiently broken change. There are
enterprises with broken firewalls that break when both sides negotiate
TLS 1.3.

BUG=694593
R=agl@chromium.org

Review-Url: https://codereview.chromium.org/2711633004 .
Cr-Commit-Position: refs/branch-heads/2924@{#929}
Cr-Branched-From: 3a87aecc31cd1ffe751dd72c04e5a96a1fc8108a-refs/heads/master@{#433059}

[modify] https://crrev.com/a0b6b5ed613b17e6e87c97d0459d21367139c343/components/ssl_config/ssl_config_service_manager_pref.cc
[modify] https://crrev.com/a0b6b5ed613b17e6e87c97d0459d21367139c343/components/ssl_config/ssl_config_service_manager_pref_unittest.cc

We're trying to spin a new M56 build with this change. No promises that it will work out due to other non-related contraints...will update the bug with status.
For anyone following this issue, we are working on a Chrome update that should resolve by disabling TLS 1.3 in Chrome 56. In the meantime, there are a few other workarounds you may wish to try.

To be clear, ultimately this is an issue with proxies/firewalls that are not compatible with TLS 1.3. Please continue to work with your proxy/firewall vendor to update to a version that is compatible with TLS 1.3. A future version of Chrome will re-enable TLS 1.3.

Short-term workarounds:

1) On your internal DNS server, create a temporary A record that points clients4.google.com at 64.233.186.102. Once that's in place, restart Chrome / reboot Chrome devices a few times. It may take up to 30 minutes and a few restarts but devices should get the update to stop using TLS 1.2.  **Important** be sure to remove the DNS A record once this is fixed. Leaving the record in place WILL BREAK DOWN THE LINE.

2) Have the user visit chrome://flags/#ssl-version-max and set to TLS 1.2. This works for Chrome users but not if the problem is occurring on Chrome OS login screen. **Important** be sure users turn this setting back to Default after leaving it on for 1-2 hours. Otherwise the user will not be able to use the more secure TLS 1.3 in the future and is left with a less secure profile.

3) Allow Chrome to connect directly to the Internet for connections to clients4.google.com. This could be done by connecting the device to a tethered phone, using a home network connection, disabling the firewall/proxy that is breaking TLS 1.3 or routing connections to clients4.google.com around this firewall/proxy. Once Chrome is able to connect to clients4.google.com, it should receive the update to disable TLS 1.3 automatically in 1-2 hours time. Restarts may be required.
Is this require a merge to M57? If yes, please request a merge by applying "Merge-Request-57" label. Thank you.
govind: No, that change does not need to be merged to M57.
Comment 23 Deleted
Looks like we can push out a build with the revert to TLS 1.2 later today (9000.91.0, 56.0.2924.110)
Cc: igrigo...@chromium.org
Labels: -Pri-0 -ReleaseBlock-Stable Pri-1
Removing various high-priority labels from this bug as this has, sadly, been backed out of M56. The middleboxes are still broken, but we will resolve this asynchronously now that we have a list of buggy products and contacts with the vendors.

Note these issues are always bugs in the middlebox products. TLS version negotiation is backwards compatible, so a correctly-implemented TLS-terminating proxy should not require changes to work in a TLS-1.3-capable ecosystem. It can simply speak TLS 1.2 at both client <-> proxy and proxy <-> server TLS connections. That these products broke is an indication of defects in their TLS implementations.
Cc: gbirtchnell@chromium.org
Project Member Comment 28 by bugdroid1@chromium.org, Apr 12
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/095ebb57de0053925c4900bace0458f38bf5e051

commit 095ebb57de0053925c4900bace0458f38bf5e051
Author: davidben <davidben@chromium.org>
Date: Wed Apr 12 22:23:34 2017

Add a dedicated error code for TLS 1.3 interference.

From the previous TLS 1.3 launch attempt, we learned that many
firewall, proxy, etc., products are buggy and interfere with TLS 1.3's
deployment, holding back a security and performance improvement across
the web.

To make diagnosing such issues easier, this CL implements a dedicated
error code based on a retry probe. On SSL connection failure, if TLS 1.3
was enabled and the error code is one of a handful which, in the past,
have potentially signaled version intolerance, we retry the connection
with TLS 1.3 disabled. If this connection succeeds, we still reject the
connection (otherwise a network attacker can break the security of the
version negotiation, cf. POODLE) and return
ERR_SSL_VERSION_INTERFERENCE.

This error code should hopefully give an easier target for search
metrics and others, as we otherwise cannot reliably classify
individual errors.

Unfortunately, such a probe is inherently flaky and is itself not
reliable. This error could mean one of three things:

1. This is a transient network error that will be resolved when the user
   reloads.

2. The server is buggy and does not implement TLS version negotiation
   correctly.

3. The user is behind a buggy network middlebox, firewall, or proxy which is
   interfering with TLS 1.3.

Based on server side probes, the lack of TLS 1.3 error reports until it
was enabled on the server, and a protocol change in TLS 1.3 intended to
avoid this, we do not believe (2) is common. (The difference between (2)
and (3) is whether the servers or middleboxes are at fault here.)

(1) is unavoidable. There is no way to reliably distinguish (1) and (3).
We can only make (1) less and less likely by spamming the user's network
with probes, which is undesirable.

Accordingly, though the error string is short and easily searchable, I
have left the network error page fairly non-descript, borrowing from the
ERR_CONNECTION_FAILED text, but with SUGGEST_PROXY_CONFIG and friends
enabled, to hint that users should, if their default reaction of mashing
reload (or the auto-reload feature) doesn't work, look there.

Screentshot:
https://drive.google.com/open?id=0B2ImyA6KAoPULVp3V0xPVEJHQms

BUG=694593,658863
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:closure_compilation

Review-Url: https://codereview.chromium.org/2800853008
Cr-Commit-Position: refs/heads/master@{#464173}

[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/components/error_page/common/localized_error.cc
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/net/base/net_error_list.h
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/net/log/net_log_event_type_list.h
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/net/socket/ssl_client_socket_impl.cc
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/net/socket/ssl_client_socket_pool.cc
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/net/socket/ssl_client_socket_pool.h
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/net/socket/ssl_client_socket_unittest.cc
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/net/ssl/ssl_config.cc
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/net/ssl/ssl_config.h
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/net/test/spawned_test_server/base_test_server.h
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/net/tools/testserver/testserver.py
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/net/url_request/url_request_unittest.cc
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/third_party/tlslite/README.chromium
[add] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/third_party/tlslite/patches/tls13_intolerance.patch
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/third_party/tlslite/tlslite/constants.py
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/third_party/tlslite/tlslite/messages.py
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/third_party/tlslite/tlslite/tlsconnection.py
[modify] https://crrev.com/095ebb57de0053925c4900bace0458f38bf5e051/tools/metrics/histograms/histograms.xml

Comment 29 Deleted
Last Canary build (daf91016aa4560f3c0f72d50eee762eed996fc0e) on macOS with `095ebb57de0053925c4900bace0458f38bf5e051` broke TLS 1.3 on `gmail.com` for me. Was getting `ERR_SSL_VERSION_INTERFERENCE`.

Somehow disabling TLS 1.3, visiting `gmail.com` with TLS 1.2, and then enabling TLS 1.3 again fixed it for me. No clue why that worked.

As far as I'm aware I have no SSL/TLS proxy on this connection..

Not sure if that information is helpful at all, but figured I’d add it in case someone else runs into the same issue.
Hrm. I would not have expected that change to make TLS 1.3 start failing more...

Can you attach a net-internals log? Thanks!
https://dev.chromium.org/for-testers/providing-network-details
Oh, I missed it fixed itself after a restart with flags on and off. That shouldn't happen on the TLS side. You might have picked up a QUIC config in the process which unfortunately masks this sort of thing. (We're looking into eliminating that confounding factor for future experiments.) Do you mind going into about:flags and test with:

1. "Experimental QUIC protocol" set to "Disabled"
2. "Maximum TLS version enabled" set to "TLS 1.3"
3. Restart the browser.
4. Go to about:net-internals.
5. Click the dropdown in the upper-right corner and select "Flush sockets"
6. Visit gmail.com as before.
7. Export the net-internals log per the instructions in the link above.

Thanks!
Yep, was able to reproduce it every time with those steps.

Emailed you the log.
Do you have any anti-virus installed?
Nope.

If something is trying to MITM my TLS/SSL connections, that would be news to me.
(It seems comment #32 is an issue on the server. We'll get that sorted out.)
Comment 37 Deleted
Sign in to add a comment