New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 817191 link

Starred by 2 users

Issue metadata

Status: Archived
Owner: ----
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug



Sign in to add a comment

Chromium not storing TLS session tickets for connections that don't send an HTTP request

Reported by ug...@akamai.com, Feb 28 2018

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36

Steps to reproduce the problem:
I have two pages that I will use to reproduce the problem. First page is https://www.utkarshgoel.in/preconnect.html 
This page has a link tag in the HTML to preconnect to the host "www.foundry.systems".

The second  page is https://dev.utkarshgoel.in/preconnect_with_delayed_request.html that also preconnects to www.foundry.systems. Additionally, this page has a blocking javascript in the head tag that blocks the parser for 5 seconds and afterwards, sends request on the preconnected connection to download an image from www.foundry.systems.

Steps to reproduce the problem:

1. Assuming wireshark is installed on the test machine, use the filter 

ssl.handshake.extensions_server_name=="www.foundry.systems"

to filter for clientHello messages whose servername equals to www.foundry.systems.

2. Open the first page in Chrome browser. After the page is loaded, you will see that a clientHello is captured in wireshark. Since the client is connecting to www.foundry.systems for the first time, the clientHello will have an empty value in the sessionTicket extension of TLS.

3. By further filtering the wireshark captures (perhaps based on client's tcp port number), example:

tcp.port == <client's_tcp_port_number>

you would see that the server sends a NewSessionTicket that the client is expected to cache for session resumptions in the future.

3. Reload the first page after 10-15 seconds and you should see another clientHello in wireshark captures. This clientHello also has an empty value in the sessionTicket extension of TLS.

The steps to reproduce the problem with the second page are as follows:

1. Load the second page in chrome browser, using a different profile on the browser (to avoid interference from connection setup with previous page). You should see a clientHello with an empty sessionTicket, which is expected.

2. Reload the second page after 30-40 seconds and you will see another clientHello in wireshark captures. Notice that the first clientHello has empty session ticket but the next reload of the page has clientHello with a non-empty session ticket. This indicates that the session ticket was not stored on the browser until an HTTP request was made on the preconnected connection to www.foundry.systems

What is the expected behavior?
in the the reload of the first page, the client should advertise a clientHello with a non-empty sessionTicket extension.

Also, the client should cache the TLS session ticket regardless of whether an HTTP request was sent on the connection the last time. 

What went wrong?
For the first page, the browser did not cache the TLS session ticket and advertised an empty session ticket in the client hello when reloading the page.

For the second page, the browser did not cache the TLS session ticket until an HTTP request was sent on the connection.

Did this work before? N/A 

Chrome version: 64.0.3282.167  Channel: stable
OS Version: OS X 10.11.6
Flash Version: 

We have some data that indicate that browsers establish connections that don't end up being used for serving any HTTP requests - meaning that browsers open connections but never send any HTTP requests on them. These connections are established either due to the predictors in chrome or connections that get established for an HTTP request but is either cancelled or moved on to a different connection - leaving the connection idle and unused after establishment. When the browsers don't cache TLS session tickets for such unused connections and advertise an empty sessionTicket in the clientHello, they put unnecessary load on our infrastructure.
 
Components: Internals>Network>SSL

Comment 2 by ug...@akamai.com, Feb 28 2018

I have also attached wireshark captures demonstrating the problem, using the two test pages. The pcap file titled "preconnect_without_request.pcapng" is associated to loading of the first page, where you will see that the two client hellos are of equal length and that the client does not advertise the session ticket when connecting the second time with www.foundry.systems.

The pcap titled "preconnect_with_delayed_request.pcapng" is associated to the second page and shows that the second client hello advertises a session ticket obtained from connecting the first time to www.foundry.systems.
preconnect_without_request.pcapng
135 KB Download
preconnect_with_delayed_request.pcapng
2.5 MB Download
Cc: davidben@chromium.org
Status: Untriaged (was: Unconfirmed)
David: I'm 99% confident this is WontFix. I'm looking for a second-set of eyes to make sure I'm not missing something from the description or the repro.

The Web Platform does not make any guarantees about when (or even if) TLS session tickets or TLS session resumption will be used. When we do make multiple connections, we take the TLS session state at the time that connection is established - which, in the case of parallel connections being established, may mean no state is yet established. 

You can enable connection coalescing for already established sessions by using HTTP/2, but please note that HTTP/2 connection establishment follows the same - we make no guarantees that we will not attempt multiple (potential) HTTP/2 connections in parallel, if we do not yet know at the time the connection is started whether the server supports HTTP/2 (we discard the additional connections if we determine that it does).

TLS Session Ticket and TLS Session Resumption are valuable performance improvements, but they should not be presumed as guaranteed. Further, we've explored tuning these parameters in the past (e.g. to strictly order such that we improve the resumption rate), and found that the overall user experience and TTFB was worse, and it introduced significant more complexity.

Finally, Chrome segments its TLS session cache based on the CORS mode in use (CORS Anonymous vs CORS With Credentials), for various reasons covered in the Fetch spec, so you should also ensure that the credentials mode of your preconnect aligns with the credentials mode that will be used to make the request. This is the Fetch specifications' notion of Fetch groups.

Comment 4 by eveque...@gmail.com, Feb 28 2018

Your comments compare this to coalescing and resumption on parallel connections, both of which are races. You might or might not have the relevant information when another connection starts, so you just use what you have rather than attempting to impose ordering. No argument with that principle. 

However, I don't think it applies here. This is a case where the server has sent a session ticket, but it is seemingly discarded. Not a race condition, but simply never added to whatever store holds the tickets, no matter how long you wait.

Comment 5 by ug...@akamai.com, Feb 28 2018

That is correct. This bug is not about connection coalescing but about the case where the client ignores the session ticket sent by the server.
They both apply, because as noted in the bug description, Chrome has its own set of predictors as to when to make a connection. The Web Platform does not guarantee any ordering of connections.

As for the second part, I also addressed that, by highlighting that the test case seemingly relies on preconnecting under one CORS mode, while fetching the resource under another CORS mode, and thus it is expected that they would not share.

Comment 7 by y...@yoav.ws, Feb 28 2018

Cc: y...@yoav.ws

Comment 8 by y...@yoav.ws, Feb 28 2018

> As for the second part, I also addressed that, by highlighting that the test case seemingly relies on preconnecting under one CORS mode, while fetching the resource under another CORS mode, and thus it is expected that they would not share.

Looking at the test case, it doesn't seem like it's fetching the resource under a different CORS mode (it's fetched as an image, so with credentials by default, matching the preconnect, which doesn't include a `crossorigin` attribute).

More generally, I agree that TLS session resumption provides no guarantees, but it seems like, at least in this scenario, we could improve both perf and server load by opportunistically caching those TLS session tickets at an earlier phase, without worrying much about ordering guarantees. (unless earlier caching in and of itself adds complexity - I'm not familiar with that code)

I'm not sure what proposed optimizations there are - we do cache in memory the moment we've successfully verified the TLS connection. If a session ticket or ID is not used, then it's either due to non-determinism (i.e. we kick off a connection *before* the preconnect request - which we do in some predictive cases) or because it's being dispatched to a different session pool (i.e. CORS/non-CORS)

The only further "optimization" is to delay making a connection if there's >=1 TLS handshake in flight, to wait for that connection to finish (thus establishing the session), and then allowing subsequent TLS handshakes to complete, picking up the resumption. We experimented with that, and found that it was a substantial negative to latency for the general user (ISTR adding somewhere around 11ms of latency on Desktop compared to simply performing the handshakes themselves)

Comment 10 by y...@yoav.ws, Feb 28 2018

According to the test case description, it is suggested that a preconnected connection doesn't cache the TLS session ticket until it is used, in a deterministic way. If I read your comment directly that is not by design.
If that correct?
If so, is it possible that there's a related bug? Or is the test case somehow flawed?
We don't guarantee any behaviour that the test is exercising, so it's not necessarily a bug, more like a feature request. =)

Further, for the reasons I described, "This indicates that the session ticket was not stored on the browser until an HTTP request was made on the preconnected connection to www.foundry.systems" is a false conclusion - because it does not provide any guarantees, that's not a reasonable conclusion to make.

As far as prioritizing investigations/changes, the motivation for this "When the browsers don't cache TLS session tickets for such unused connections and advertise an empty sessionTicket in the clientHello, they put unnecessary load on our infrastructure." is one that is problematic, if only because browsers don't guarantee this behaviour in the first place, so one should not build services assuming these properties will hold (there have been cases in the past where resumption has had to be disabled)
Alternately stated, because clients can't guarantee that they'll do something to reduce server load all the time, it's not worth trying to reduce it in the common case?  That seems analogous to the "leaky boat" fallacy, if not quite the same.

It shouldn't be a CORS issue.  The easiest way to see that is to ignore the image load altogether and just reload the first page again.  The first time you load the page, you connect without a session ticket and receive one at the end of the handshake.  Wait a bit and reload, and you again preconnect... without using the session ticket.  Again, no one has suggested inducing a delay making the connection.  These are separate connections, visible minutes apart in Wireshark; no delay is necessary.

You describe the ticket as being stored "the moment we've successfully verified the TLS connection."  The report seems to indicate that's not happening, or perhaps that the definition of "verified" isn't what we would assume it is.
Another way to tell that CORS is a red herring:  Visit the second page (which sends a request).  Flush the idle connections from net-internals, then visit the first page.  You'll see the preconnect happen using a session ticket, so clearly they are sharing.
> it's not worth trying to reduce it in the common case?

No, that's not a correct framing. It's a feature request, not a bug, and one for something that's not guaranteed.

> perhaps that the definition of "verified" isn't what we would assume it is.

This is also possible. Again, this is not something that the Web Platform guarantees with respect to timing, and so your observations themselves do not represent bugs (nor should it be presumed as platform guarantees). For example, when TLS false start is used, it's possible for the connection to be reported as 'preconnected' before the full handshake has completed. That state machine is resolved once data is written-to or read-from the underlying socket, thus persisting the ticket. Unless/until that data is exchanged, however, the session is not persisted, because the handshake has not truly completed.

This is not about the server not having sent the data, but rather, until Chrome consumes that data (and it does not try to consume it until there is further activity on the socket from a higher layer), it's possible that the remaining data is in an unconsumed kernel buffer, and Chrome's view is the handshake has not yet completed (which is correct).

This is, again, about what the platform guarantees provide/don't provide.

1) There's no guarantee that a preconnect will result in a modification of the TLS session ticket/cache
2) There's no guarantee on the connection ordering (and whether or not the browser will introduce other connections)
3) There's no guarantee that a preconnect will have any impact on false start

In short, you cannot and should not presume that preconnect would guarantee any material effect on the TLS state machine.

Labels: Needs-Feedback
Please always attach a NetLog per https://dev.chromium.org/for-testers/providing-network-details, not just a PCAP.

But yeah this is almost certainly an artifact of False Start (confirmable by NetLog) and is a WontFix. While in theory we could pump the read half in SSLClientSocket, TLS 1.3 has an even stronger version of the same property (tickets are optional and post-handshake) where we especially must not eagerly pump it eagerly, so that's not going to happen.

For HTTP/2 and only HTTP/2 (where the TLS socket consumer itself drives a read loop), we probably could eagerly attach a SpdySession to preconnected sockets, but in the case of HTTP/2, there isn't much point in doing that, because we'll just use the preconnected socket, not establish a new one.

Note that it does NOT follow that processing session tickets on unused preconnect connections will reduce server load. That is only true if *no* sockets to that host were actually used. Assuming one was, we'll get a ticket from there. Notably, what you're may observing from Chrome is our tendency to preconnect sockets for subresources before we learn your server is HTTP/2. Processing tickets from the discarded preconnects won't help you because we already have a ticket from the used connection.
friendly ping.  ugoel@, can you provide the info requested in comment #15?
@reporter: friendly ping, can you provide the info requested in comment #15?
Status: Archived (was: Untriaged)
Archiving due to lack of response. Please file another issue with the requested info if it's still affecting you.

Sign in to add a comment