New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 615413 link

Starred by 7 users

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug

Blocked on:
issue 475060
issue 706974
issue 754395

Blocking:
issue 392575
issue 621621



Sign in to add a comment

Implement "remote" (alternative service and origin have different hostnames) HTTP/2 Alternative Services

Project Member Reported by b...@chromium.org, May 27 2016

Issue description

When a new TLS connection is opened to an Alternative Service for a given origin, RFC7838 [1] Section 2.3. requires that the SNI contains the hostname of the origin, not that of the Alternative Service.

[1] https://tools.ietf.org/html/rfc7838
 
Cc: davidben@chromium.org
This should probably link to  issue #585876  which is the version of this for QUIC. Notably there is important context there for why "send the correct SNI" is not a good framing here. The SNI must not be changed on its own. I think a more accurate description is that Alt-Svcs should be treated as a funny DNS CNAME and (at least when I last saw the implementation) was not.
Also, an important detail that, I think got mentioned in the other discussion, but is probably worth recasting in terms of HTTP/2 and HTTP/1.1 because QUIC cares less about this layering detail:

Say I connect to (1) {origin=origin.com, alternate=alternate.com} and later to (2) {origin=alternate.com, alternate=alternate.com}. The two sockets will be authenticated under their respective origins and will both end up in the socket pool, but they must do so under *different* groups. The socket pool must lookup by origin and not the alternate. At the socket pool level, if a request comes in for alternate.com, socket (1) must never be considered an option.

Only if (1) negotiated HTTP/2, then the HTTP/2 session pooling logic may check (1)'s certificate against alternate.com and then use it for alternate.com's request, as an extension of how cross-host pooling currently works. (It's basically an extension of the IP address check in today's pooling logic.) But this must happen in the HTTP/2 layer, not the socket pools which service both HTTP/1.1 and HTTP/2.

The reason for this is that cross-host pooling of sockets is only legal in HTTP/2, not HTTP/1.1. In HTTP/1.1, sites will break if we start mismatching the SNI and Host header.

Comment 3 by b...@chromium.org, Jun 20 2016

Blocking: 621621
Project Member

Comment 4 by bugdroid1@chromium.org, Jun 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/aa60ff405b46620f283640f1f6571670b65d8fe8

commit aa60ff405b46620f283640f1f6571670b65d8fe8
Author: bnc <bnc@chromium.org>
Date: Wed Jun 22 19:12:42 2016

Change GroupName test to https.

Alternative Services for insecure origins have been disabled in
 https://crbug.com/615497 , making HttpNetworkTransactionTest.GroupNameFor* tests
unrealistic.  This CL changes these tests to use a secure origin instead.  The
alternative service port is changed to ensure that group name has port of
origin, not of alternative service.

BUG=615413

Review-Url: https://codereview.chromium.org/2082403002
Cr-Commit-Position: refs/heads/master@{#401380}

[modify] https://crrev.com/aa60ff405b46620f283640f1f6571670b65d8fe8/net/http/http_network_transaction_unittest.cc

Cc: mmenke@chromium.org
By the way, +mmenke and I chatted about this yesterday. Matt, correct me if I'm summarizing wrong:

So, the current scheme where we make two HttpStreamJobs for Alt-Svc works fine for QUIC, but it's going to require bifurcating socket pool groups at various layers which would wreak havoc on the socket pool's per-group limits. Depending on exactly how this is done, we were able to come up with situations where per-group limits become meaningless and situations where we deadlock due to interactions of limits at each stage. Both of these are, of course, pretty bad.

Instead, we propose doing something like the following:

1. In the HttpStream logic, continue to race two jobs at once, but *only* for "QUIC" and "not-QUIC". Decide whether you do the QUIC on based on whether any alternates have QUIC labelled.

2. On the not-QUIC side, all alternates go in *one* HttpStream job. This is so we *don't* have to bifurcate the socket pools. Instead, you pass the *entire* set of alternates down into the socket pool parameters.

3. Each socket pool handles the parameter accordingly. Specifically, you need that SSL over TCP straight will pass the list along. SSL over HTTP CONNECT proxy or SOCKS proxy or whatever else drops the list. (If we need to, I could imagine implementing Alt-Svc for HTTP CONNECT too, but let's not bother.)

4. At the leaf, TransportConnectJob evaluates the non-QUIC Alt-Svcs list. Right now it does a happy eyeballs and races IPv4 and IPv6 connections where the Socket loops over IP addresses. Add to the mix the Alt-Svc IP addresses in some way or another. Note, however, that TransportConnectJob shouldn't race more than a small constant (probably 2) number of connections at once because otherwise we, again, secretly bump the per-group socket limits and we don't want to do that.

5. Whichever one works, we route up the stack and that socket is used. But since only TransportConnectJob acts on the alternate name, SNI and certificate checks still use the true name. Alt-Svc is *purely* a funny DNS.

We explicitly do *not* enforce that alternate connections use HTTP/2. I believe this was simply a mistake in how the header works. QUIC vs non-QUIC have incompatible handshakes, but everything over TLS/TCP upgrades.

This should solve the "funny kind of CNAME" part of Alt-Svc. To solve the "get rid of DNS lookup in CanPool" half, we have two proposals:

Proposal 1: Just get rid of the DNS lookup in SPDY pooling. (Can we do this?) That would be simplest.

Proposal 2: Okay, if we need to keep that and want an Alt-Svc aware one, replace it with the following:

1. StreamSocket or so has a method GetAlternateUsed or something. It lets you query the name we actually resolved when connecting.

2. The TCP sockets will just pass the Alt-Svc socket along. Funny sockets like proxies will never return anything (the alternate used to connect to the proxy is *NOT* checked for pooling). SSL sockets will pass along the transport's one.

3. SpdySession forwards that value out from its underlying socket.

4. The CanPool logic checks the session key and the alternate used. If either of those intersects with {request hostname, alternates acceptable for request hostname}, we can skip the DNS check as this suffices for the IP address check.

(This is kind of a mess. Alt-Svc is trying to solve too many problems at once and is not a very well-designed header.)

Comment 6 by mmenke@chromium.org, Jun 22 2016

We didn't discuss it, but I assume we'd have to do two DNS lookups in TransportConnectJob, too?  I guess we'd want to do them sequentially rather than in parallel, since we have a fairly tight limit on simultaneous DNS lookups.  Maybe checking the cache for the one we'd normally do second, first.  Once we have one DNS result, we can start trying to connect.

Comment 7 by mmenke@chromium.org, Jun 22 2016

Actually, we do the DNS lookups in parallel in the racing QUIC case, so maybe doing them in parallel makes sense...

Comment 8 by rch@chromium.org, Jun 23 2016

davidben: re comment 5, can you back of up and give a bit of context? You mention, for example, bifurcating socket pool groups but I don't understand what leads to that situation? Are you talking about the case where there is more than one alternative service?

Comment 9 by mmenke@chromium.org, Jun 23 2016

So if we one SSL group map to multiple transport client socket groups, with transport client socket groups not using alt service as part of their group names that bifurcates the pools, and you get weird situations (6 alt service requests going to the same lower level group can blockhole any requests actually targetted at that group).

If you *do* key on alt service, suddenly the group limits break down, and can have 256 connections to the same server.
mmenke: Actually, can the deadlock situation we were worried about happen? Isn't that what the HigherLayeredPool business is intended to solve? (Or maybe it doesn't? CloseOneIdleConnection() without any group information seems a fairly blunt instrument. You'd really want it targeted at a specific group...) Group limits definite break down with alt-svc bifurcation, but it seems the deadlock is something we'd have to solve regardless if any kind of complex high/low-level pool interactions is to be supported (and it already does happen).

rch: For context, the deadlock situation is that we have 6 idle connections to (a.com, alternate.com). The transport socket pool will do group limits based on alternate.com. While the SSL socket pool will do it based on a.com. But that means that when the SSL socket pool is asked to connect to alternate.com directly, the transport socket pool can't service the request until one of a.com's idle sockets are closed. But the SSL socket pool doesn't know its alternate.com request is tied to its idle a.com sockets.
The higher layered pool logic is only done if there are available socket slots in the group.

The problem is if a.com, b.com, etc, use alt.com, we'll see alt.com as having 6 active connections.  So any new connection attempt to use alt.com will just wait for one of those to free up.  The idle logic only triggers if the group has free socket slots, but the pool itself does not.

Comment 12 by b...@chromium.org, Jun 23 2016

Re deadlock:  Why is there a limit on the number of sockets at two places: transport socket pool and SSL socket pool?  Are they not redundant?  I understand that the deadlock issue is that these two limits use different groupnames (keys).  Can we not just generate the groupname (based on origin, not alternative) in the SSL socket pool, and pass it down to the transport socket pool, so that identical keys would be used for the two limits, solving the deadlock issue?

Re pooling logic:  I'm not particularily worried about avoiding DNS lookup.  IP-based pooling happens in SpdySessionPool::FindAvailableSession(), and SpdySessionPool will need to be aware of both origin and destination for each SpdySession anyway.  How that information gets there is indeed more complex.
That dual limit isn't relevant to the deadlock situation, and actually shouldn't affect much of anything.  It does mean we can have high priority requests waiting on the SSL limit, so the lower layer pool hands things out to lower priority requests first, I suppose, but that should be pretty uncommon.

The dual limit is a result of multiple SSL sockets feeding in to a single transport socket group, and the transport socket group having a limit.  As long as those two things are the case, we'll have a problem.
Project Member

Comment 14 by bugdroid1@chromium.org, Jun 27 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a8681534b79e6352c68fe106e2433fa92d0f5671

commit a8681534b79e6352c68fe106e2433fa92d0f5671
Author: bnc <bnc@chromium.org>
Date: Mon Jun 27 12:27:48 2016

Split enable_alternative_service_with_different_host flag.

Split enable_alternative_service_with_different_host flag into separate HTTP/2
and QUIC flags.  Also remove
* unnecessary assigments from tests,
* test in DataReductionProxyIODataTest.TestConstruction: it is moot because both
  HTTP/2 and QUIC are disabled anyway,
* assignment from UrlRequestContextBuilder, see justification below.

HTTP/2 AltSvc for alternative service host different from that of the origin is
currently broken, see linked bug.  Therefore
enable_http2_alternative_service_with_different_host is disabled in production
and for all tests for the time being.  It will be used in writing tests when
fixing the linked bug.  Also, this flag will make it easier to remove flag
enable_alternative_services_for_insecure_origins from tests.  Also, no clients
of the network stack should be able to flip this flag directly because of the
linked bug, that is why it is not added to UrlRequestContextBuilder.

On the other hand, QUIC AltSvc works correctly, thus
enable_quic_alternative_service_with_different_host is enabled by default both
in production and tests.  It is only there temporarily as a kill switch in case
something goes wrong, and will eventually be removed.  Therefore no clients of
the network stack should have the need to flip this flag directly.  Furthermore,
this flag is moot if QUIC is disabled.  That is why it is not added to
UrlRequestContextBuilder.

BUG=615413

Review-Url: https://codereview.chromium.org/2089243002
Cr-Commit-Position: refs/heads/master@{#402154}

[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/components/data_reduction_proxy/core/browser/data_reduction_proxy_io_data_unittest.cc
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/components/network_session_configurator/network_session_configurator.cc
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/components/network_session_configurator/network_session_configurator_unittest.cc
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/jingle/glue/proxy_resolving_client_socket.cc
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/net/http/http_network_session.cc
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/net/http/http_network_session.h
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/net/http/http_network_transaction_unittest.cc
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/net/http/http_stream_factory_impl_job_controller.cc
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/net/http/http_stream_factory_impl_unittest.cc
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/net/quic/quic_network_transaction_unittest.cc
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/net/spdy/spdy_network_transaction_unittest.cc
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/net/spdy/spdy_test_util_common.cc
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/net/spdy/spdy_test_util_common.h
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/net/url_request/url_request_context_builder.cc
[modify] https://crrev.com/a8681534b79e6352c68fe106e2433fa92d0f5671/net/url_request/url_request_context_builder.h

Project Member

Comment 15 by sheriffbot@chromium.org, Jul 8 2016

Labels: -M-53 M-54 MovedFrom-53
Moving this nonessential bug to the next milestone.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 16 by bugdroid1@chromium.org, Jul 13 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9989bd51828a155d65cd9131cee6700d2bb3164e

commit 9989bd51828a155d65cd9131cee6700d2bb3164e
Author: bnc <bnc@chromium.org>
Date: Wed Jul 13 23:09:40 2016

Remove ALTERNATIVE_CERT_NOT_VALID_FOR_ORIGIN error code.

Remove ERR_ALTERNATIVE_CERT_NOT_VALID_FOR_ORIGIN error code, because (1) right
now, remote TCP alternative services (where the origin and the alternative
service have different hostnames) are disabled, and (2) in the future, if an
existing connection to the destination of the alternative service with an origin
different from the origin of the request does not have a valid certificate for
the origin of the request, then a new connection to the destination of the
alternative service will be opened with the origin of the request, and if that
connection does not have a valid certificate for the origin of the request, then
that's simply ERR_CERT_COMMON_NAME_INVALID.  So there is no need for
ERR_ALTERNATIVE_CERT_NOT_VALID_FOR_ORIGIN.

Also remove ValidSpdySessionPool class, because (1) right now there is no need
for the extra cert verification, and (2) in the future,
SpdySessionPool::FindAvailableSession() and
SpdySessionPool::CreateAvailableSessionFromSocket() will take both an origin and
a destination argument, and will make sure that the SpdySession they return has
a certificate that is valid for the origin.

BUG=615413

Review-Url: https://codereview.chromium.org/2152453002
Cr-Commit-Position: refs/heads/master@{#405342}

[modify] https://crrev.com/9989bd51828a155d65cd9131cee6700d2bb3164e/net/base/net_error_list.h
[modify] https://crrev.com/9989bd51828a155d65cd9131cee6700d2bb3164e/net/http/http_network_transaction_unittest.cc
[modify] https://crrev.com/9989bd51828a155d65cd9131cee6700d2bb3164e/net/http/http_stream_factory_impl_job.cc
[modify] https://crrev.com/9989bd51828a155d65cd9131cee6700d2bb3164e/net/http/http_stream_factory_impl_job.h

> We explicitly do *not* enforce that alternate connections use HTTP/2. I believe this was simply a mistake in how the header works. QUIC vs non-QUIC have incompatible handshakes, but everything over TLS/TCP upgrades.

I bumped into mnot today at IETF. He said this was the intention. We are not supposed to think of these as ALPN labels and "h2" refers to the whole HTTP/2, HTTP/1.1, TLS, TCP stack of specs. That is, it is exactly this "entry-point" notion. So I believe Matt's and my proposed rationalization is the correct one.

Comment 18 by b...@chromium.org, Aug 8 2016

https://tools.ietf.org/rfc/rfc7838.txt Section 2.4 last paragraph:  "If the connection to the alternative service does not negotiate the expected protocol (for example, ALPN fails to negotiate h2, or an Upgrade request to h2c is not accepted), the connection to the alternative service MUST be considered to have failed."
Interesting. That does not match what mnot told me.

Anyway, we should not honor that text since it's nuts.

Comment 20 by rch@chromium.org, Aug 9 2016

> Anyway, we should not honor that text since it's nuts.

O_o That doesn't sound like the best approach :> If we think this is nuts, then let's start a conversation on httpbis about fixing the language.
Yeah, I'm not a thread with mnot to try to figure out what the intent was here.
s/I'm not a thread/I'm on a thread/

Comment 23 by b...@chromium.org, Aug 16 2016

Summary: Implement "remote" (alternative service and origin have different hostnames) HTTP/2 Alternative Services (was: HTTP/2 Alternative Services should send the correct SNI)
Changing title per comment #1, to avoid giving the impression that the SNI should be manipulated by itself.

Comment 24 by b...@chromium.org, Sep 20 2016

Blocking: 475060
Random question:  What if we learn about a "remote" alternate service through a connection where the server sends us a self-signed cert?  Seems like trusting that information could be very bad.

Comment 26 by b...@chromium.org, Sep 20 2016

Re #25: unfortunately we currently only check for the scheme when deciding whether to honor an AltSvc response header, at https://cs.chromium.org/chromium/src/net/http/http_network_transaction.cc?l=1219.  (This is for QUIC; remote HTTP/2 alternative service is currently disabled.)  I agree that remote alt-svc should not be allowed for connections with only self-signed certs.  How do I check for that?
It's somewhere in SSLInfo, I believe.

Comment 28 by rch@chromium.org, Sep 20 2016

Do we cache content if the connection uses a self-signed cert?
No, we don't.
Hrm...More generally, how long are we going to cache alt-service info?  An attacker managing to get stuff into the alt-service table seems like it's a lot worse than getting stuff into the cache, since even if the attacker is caught and the cert revoked, the alt-service entry will still be around, potentially talking to another attacker-controller website.

Comment 31 by b...@chromium.org, Sep 20 2016

The Alt-Svc header itself specifies its expiration in seconds, which we currently honor, without any upper limit.  (In practice, the upper limit is something like 2^31 seconds = 68 years.)  It sounds like capping expiration at a reasonable value would help alliviate this security concern.  Maybe a month, or a week?

Comment 32 by rch@chromium.org, Sep 20 2016

Keep in mind that if www.example.com (via hacking) advertises alt.example.com as an alternative, then when chrome uses alt.example.com, it will still need to present a valid certificate for www.example.com. So in the scenario you present, if the cert is revoked, then the connection to alt.example.com will fail.
Ahh, perhaps I misunderstood "Only if (1) negotiated HTTP/2, then the HTTP/2 session pooling logic may check (1)'s certificate against alternate.com"...That's describing for alternate requests, not requests to the original origin.

Comment 34 by rch@chromium.org, Sep 20 2016

Correct. Essentially, alt-svc acts like a CNAME. At the TLS layer, we're still "connecting to the origin" (just at a different IP address). The TLS handshakes sends www.example.com in the SNI and verifies that the cert is valid for www.example.com. If later, we have a request for alt.example.com we'll notice that we have a connection to the IP address for alt.example.com and will engage the "pooling logic". That means we'll look at the certificate on the connection and if it's *also* valid for alt.example.com then we can use the existing connection for this origin.

Comment 35 by bmau...@fb.com, Nov 7 2016

Labels: DevRel-Facebook

Comment 36 by y...@yoav.ws, Nov 29 2016

Cc: y...@yoav.ws

Comment 37 by b...@chromium.org, Apr 12 2017

Blocking: -475060

Comment 38 by b...@chromium.org, Apr 12 2017

Blockedon: 475060

Comment 39 by b...@chromium.org, Apr 13 2017

Blockedon: 706974
Project Member

Comment 40 by bugdroid1@chromium.org, May 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1fc76912bc055188dc76c09f6032096c079d4d86

commit 1fc76912bc055188dc76c09f6032096c079d4d86
Author: bnc <bnc@chromium.org>
Date: Thu May 18 20:43:24 2017

Fix SpdySessionKey for HTTP/2 alternative Jobs.

Change HttpStreamFactoryImpl::Job::GetSpdySessionKey() to construct a
SpdySessionKey from the request URL, not the destination.  This is the
correct behavior, because the only layers that should worry about the
destination are the socket pool and below.

This does not change behavior in production, because
HostPortPair::FromURL(origin_url_) can only differ from |destination_|
for alternative jobs, and HTTP/2 alternative jobs are disabled in
https://crrev.com/2821463002.

BUG=615413,  706974 

Review-Url: https://codereview.chromium.org/2887773006
Cr-Commit-Position: refs/heads/master@{#472919}

[modify] https://crrev.com/1fc76912bc055188dc76c09f6032096c079d4d86/net/http/http_stream_factory_impl_job.cc
[modify] https://crrev.com/1fc76912bc055188dc76c09f6032096c079d4d86/net/http/http_stream_factory_impl_job_controller_unittest.cc

Comment 41 by b...@chromium.org, Aug 10 2017

Blockedon: 754395
Cc: lassey@chromium.org
Brad and I were talking about this a bit this week. A note on the plan in comment #5:

Doing the retry purely at TransportConnectJob has a slight downside because presumably we want a certificate error at the Alt-Svc name to retry connecting to the origin directly, rather than be fatal. That implies we want to loop over Alt-Svc names at the SSLConnectJob, not the TransportConnectJob.

Although, as I write this, that means bifurcating the transport half of pools... or maybe just keying them on the Alt-Svc name? I suspect the latter will result in deadlock around the limits. Maybe we do actually need to de-layer the SSL and TCP pools before we can implement Alt-Svc...

Anyway, supposing that issue isn't fatal, we would then run into  issue #823387 , because we can't have SSLConnectJob-level retries right now. They cause infinite loops with HTTP proxy auth. (See first half of that bug for discussion of what goes wrong.) The solution for ERR_SSL_VERSION_INTERFERENCE handling (bifurcate upper-level pools and retry at HttpNetworkTransaction) is unlikely to work because we actually want pooling behavior.

Thus we probably need to make SSLConnectJob retries actually possible, by fixing the proxy auth implementation. Half-baked thought: rather than bypassing the SSLConnectJob, somehow hand back some wrapper over the old SSLConnectJob instead of dropping it on the floor. When the auth information comes in, resume both SSLConnectJob- and TransportConnectJob-level operations. I have no idea if this is actually feasible. We'd need to stare at that.

But, again, this is supposing the first issue isn't fatal.

Sign in to add a comment