QUIC shouldn't report ERR_CERT_DATABASE_CHANGED for SSL Config changes |
||||||||
Issue description
QUIC Code:
void QuicStreamFactory::OnSSLConfigChanged() {
CloseAllSessions(ERR_CERT_DATABASE_CHANGED, QUIC_CONNECTION_CANCELLED);
}
SPDY code:
void SpdySessionPool::OnSSLConfigChanged() {
CloseCurrentSessions(ERR_NETWORK_CHANGED);
}
Would be good to ensure these two are in harmony.
(Noticed while investigating a large, cross-platform spike in these errors)
,
Sep 27 2016
Interesting! Keeping them in sync makes sense. Looking at both SPDY and QUIC, it's not clear to me that either are doing the right thing here. Should we create a new net error for this case? What are the implications for an SSL config change? I wonder if SPDY and QUIC are both being overly aggressive by closing connections in this case, but perhaps there's a good reason? zhongyi: can you take a look a this?
,
Sep 28 2016
rch: If you examine the graphs linked, and expand the date range, you'll see this seemed to have a hockey-stick like growth, right around the time that QUIC was more widely rolled out. I suspect that this is causing more reports for main frame errors due to how QUIC's stream connection code seems to diverge (I will try to get you logs for Issue 594226 which I still see; somehow missed my radar) The question is: Should QUIC even be looking at the SSLConfig? None of what's alerted on in https://cs.chromium.org/chromium/src/net/ssl/ssl_config_service.cc?rcl=0&l=83 seems to make sense for QUIC.
,
Sep 28 2016
(Huh, I didn't realize that equality check was even there! I suspect we've added tons of things to SSLConfig without updating that. Though that means the event fire too little, not too much.)
,
Sep 28 2016
> The question is: Should QUIC even be looking at the SSLConfig? Yes that was exactly the question I was looking to you for guidance on. (Well, that and SPDY's behavior too). It sounds like we both think we QUIC and SPDY should no longer be SSLConfigService::Observers. Am I understanding you correctly? > rch: If you examine the graphs linked, and expand the date range, you'll see this seemed to have a hockey-stick like growth, right around the time that QUIC was more widely rolled out. If I'm looking at the same graphs you are, the rate of this error code starts at the end of May. QUIC has been on by default for a lot longer than that. I think that what happened in May was that m51 became stable, and the linked graphs restrict to just m51 and later. If you remove that restriction, there is no such hockey stick. There is, however, a huge increase in this error from Feb - Mar 2015. In any case, I believe the current spike is a result of: https://codereview.chromium.org/2109303002 Which actually made the QuicStreamFactory register itself as an observer. Until then it never actually called AddObserver.
,
Sep 29 2016
> we both think we QUIC and SPDY should no longer be SSLConfigService::Observers I only spoke for QUIC :) I believe it's relevant to SPDY, in as much as any underlying TLS sockets being used by SPDY or HTTP/2 should be gracefully closed down and re-established if the configuration changes (just as it does for the cert database, for client certs and trust changes) Those SSL config changes don't affect QUIC though; but if we added QUIC-related config to SSLConfig, I would think it would. Thanks for looking more closely at the graph, and that suggests it is at least coincident with QUIC. David: Do you have any thoughts re: QUIC vs SPDY for observing?
,
Sep 29 2016
So that I understand, can you explain why SSL config changes would require TLS connections to be closed? It seems vaguely plausible but I don't really understand why it should invalidate existing connections. It doesn't sound like a *bad* idea to me either, but I just don't quite understand the issue. (BTW, what does a user do to cause the config to change? Install an enterprise policy, or something) Presumably, when QUIC uses the TLS 1.3 handshake, we'll need to start closing the connection again? For SPDY should we use a more specific error code than ERR_NETWORK_CHANGED? Should we add ERR_SSL_CONFIG_CHANGED?
,
Sep 29 2016
If you disable TLS1.2, we should close existing TLS1.2 sockets. If you disable a ciphersuite, we should close existing sockets that negotiated that ciphersuite. If you disable a fallback, we should close existing sockets that negotiated with that fallback. Yes, I imagine with TLS1.3, if the SSL config changes in a way that affects the TLS1.3 connections, we should close those. Realistically, we shouldn't be seeing the SSL configuration change that often; as David indicated, there may be a bug there. But we also want to make sure we rule out the cause as being due to *SSL* config changes (which is currently flapping as a CERT_DATABASE_CHANGED) and actual flaps of cert database changes (e.g. AliPay's software interacting badly with the Keychain, which caused constant flushes) Regarding more-specific error code, I suppose I have no strong feelings. It may be useful to avoid any cross-talk from any other sources of that message (e.g. if the network actually changed, which we can't control/affect, versus SSL config changes, which we generally can). So I don't see anything fundamentally wrong with introducing a new net error code, but because it's potentially user visible, even in it's error code form, I believe that means using the process sidv@ described of getting UI approval.
,
Sep 29 2016
What codepaths do we still have that changes these at runtime? We don't expose settings checkboxes anymore or even enterprise policy for disabling TLS 1.2. (Just a random command-line flag.) I think we can figure out the QUIC and TLS 1.3 stuff when we get to it. I would not expect QUIC to use SSLClientSocket. I doubt it makes sense for it to use SSLConfig (what does it even mean to set max_version TLS 1.2?), but maybe there'll be something useful in there.
,
Oct 4 2016
David: TBH, I'm not sure. I think the next steps would be to check out the SSLConfigService (and prefs) and see if any policies are wired up to those prefs. If no policies are, then I agree, the SSLConfig (with respect to these specific elements) can't change at runtime, and we could remove the observer (and, arguably, the notification system, since we don't want to accidentally assume the notification system has observers if there are none)
,
Oct 12 2016
,
Nov 20 2017
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 18 2017
Old bug ping? :)
,
Jan 20 2018
Argh, sorry about the delay. I was on vacation and didn't catch the email. It's on my list, I will work on this asap, planned to fix it next week.
,
Jan 20 2018
,
Feb 3 2018
I'm just now reading through this bug, trying to understand what's going on. Re #5 and the question of should QUIC even be looking at the SSLConfig: The change https://codereview.chromium.org/2109303002 that was mentioned was to ensure that QUIC sessions using an old channel ID are closed when a channel ID gets deleted. In that context, QUIC should be listening for that OnSSLConfigChanged signal (since that gets notified when a channel ID is deleted), but perhaps that is the wrong mechanism to use for signalling that. I generally agree that QUIC doesn't care about the contents of an SSLConfig. From looking at the contents of the struct, most of it appears to be related to certificate policy (which I assume is plumbed through the ProofVerifier somehow) or TLS-specific (e.g. versions, cipher suites, renego, ALPN) which don't apply to QUIC (and when QUIC uses TLS 1.3, they still won't apply). channel_id_enabled and token_binding_params are two fields in that struct that don't fit those categories, but they don't change at runtime (I forget how those settings get plumbed to QUIC, but that doesn't matter for this bug). I think the only reason why QUIC needs to be an SSLConfigService::Observer is to get the signal that a Channel ID was deleted. Once Channel ID is deprecated we could probably remove it from being an observer.
,
Feb 3 2018
https://chromium-review.googlesource.com/c/chromium/src/+/898502 is working on the fix for this issue.
,
Feb 5 2018
Nick: SSLConfigChanged is a more general signal, and shouldn't always affect QUIC connections if I read you correctly. What we really want is something like OnChannelIdDeleted() rather than OnSSLConfigChanged. Do we have an ETA when Channel ID will be deprecated? If that's in a near future, I think we can just mark the session as going away for now. Otherwise, it might be a better idea to change the API and only notify QUIC when channel ID is deleted so that we don't close the connection/(mark session as going away) too aggressively. WDYT?
,
Feb 5 2018
I'm aiming for Q1 deprecation of Channel ID, with removal hopefully in Q2.
,
Feb 5 2018
SG, I will mark session as going away now and add a TODO in the CL so that we will unregister QUIC as an observer once channel ID goes away.
,
Feb 6 2018
> From looking at the contents of the struct, most of it appears to be related to certificate policy (which I assume is plumbed through the ProofVerifier somehow) or TLS-specific (e.g. versions, cipher suites, renego, ALPN) which don't apply to QUIC (and when QUIC uses TLS 1.3, they still won't apply). Nick: I might miss something when re-reading this. Why would they still not apply when QUIC uses TLS 1.3 if it's related to TLS-specific? Besides, if QUIC really shouldn't care about this, should SPDY care about the SSLConfigChange?
,
Feb 6 2018
For things that aren't TLS-specific (e.g. certificate policy fields like rev_checking_enabled, symantec_enforcement_disabled, or token_binding_params), they're already plumbed through to work with QUIC and SPDY connections as appropriate. For TLS-specific fields, we have version_min, version_max, tls13_variant, disabled_cipher_suites, version_interference_probe, false_start_enabled, require_ecdhe, alpn_protos, renego_allowed_default, and renego_allowed_for_protos. For TLS 1.3 in QUIC, we specifically don't care about the version_min, version_max, and tls13_variant fields because we will be setting them explicitly to do TLS 1.3. The fields false_start_enabled, require_ecdhe, renego_allowed_default, and renego_allowed_for_protos only apply for TLS versions less than 1.3, so they're also irrelevant. QUIC uses different ALPN protocols, so what's in the SSLConfig for alpn_protos is irrelevant for QUIC+TLS. The version_interference_probe is used for problems with TLS 1.3 over TCP - that probe won't exist with QUIC. This leaves just disabled_cipher_suites. It might be possible that we care about this field for QUIC, though TLS 1.3 doesn't have problems with cipher suites like previous versions do. SPDY might care about OnSSLConfigChanged, if it means that the underlying TLS connection should no longer be used.
,
Feb 6 2018
Note: DisabledCipherSuites is a command-line param for enterprises, so I think we would care about this as we transition QUIC to TLS1.3. Regarding the plumbing of policy fields - one of the expected behaviours is that if those fields change, we terminate/renegotiate connections. Right now, that behaviour is inherited by QUIC observing the SSLConfig. I think if we removed the observer, we'd propagate the new values to the HttpNetworkSession, which "should" (is QUIC testing this? I don't know) cause a new connection to be negotiated because the verification flags don't match - but we'd also keep around the old one. We'd probably want to gracefully shut it down, at a minimum.
,
Feb 7 2018
> Note: DisabledCipherSuites is a command-line param for enterprises, so I think we would care about this as we transition QUIC to TLS1.3. TLS 1.3 cipher suites are not the same animal as TLS 1.2 ones. Unlike TLS 1.2, we do not have the myriad of options and historical mistakes that we have in TLS 1.2. That command-line flag does not work for TLS 1.3 and, in fact, it is not even possible to configure things in BoringSSL. The behavior is implemented in code.
,
Feb 10 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/505dc9c876d8feed6393d25008e0b4270cc666f7 commit 505dc9c876d8feed6393d25008e0b4270cc666f7 Author: Zhongyi Shi <zhongyi@chromium.org> Date: Sat Feb 10 02:12:34 2018 Mark quic sessions as going away instead of close connections on SSL config change. Bug: 650509 Change-Id: Ie5bb461a6094a33c50605a4735675b6552738b65 Reviewed-on: https://chromium-review.googlesource.com/898502 Reviewed-by: Ryan Hamilton <rch@chromium.org> Reviewed-by: Nick Harper <nharper@chromium.org> Commit-Queue: Zhongyi Shi <zhongyi@chromium.org> Cr-Commit-Position: refs/heads/master@{#535932} [modify] https://crrev.com/505dc9c876d8feed6393d25008e0b4270cc666f7/net/quic/chromium/quic_stream_factory.cc [modify] https://crrev.com/505dc9c876d8feed6393d25008e0b4270cc666f7/net/quic/chromium/quic_stream_factory_test.cc
,
Feb 12 2018
zhongyi: Now that this in on in Canary, can we merge this to M65?
,
Feb 14 2018
Re #28: was that for fixing the spike of CONNECTION_CANCELLED in Beta channel? If that's the motivation, I think this won't work. Because the spike is caused by close the connection when we receive notifications via QSF::OnCertDataBaseChanged(). The fix in #26 was to mark session going away when we receive notifications on QSF::OnSSLConfigChanged. While this issue seemed to be related to CERT_DATA_BASED_CHANGED, it actually is a SSLConfigChange(the title might be a little confusing).
,
Feb 28 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e615a0843dd1e40f6f9056bad25c99e4b338c5d9 commit e615a0843dd1e40f6f9056bad25c99e4b338c5d9 Author: Cherie Shi <zhongyi@chromium.org> Date: Wed Feb 28 02:01:13 2018 Revert "Mark quic sessions as going away instead of close connections on SSL config change." This reverts commit 505dc9c876d8feed6393d25008e0b4270cc666f7. Reason for revert: reverted as this is suspected to be a potential cause of Quic Internal Error spike. Will reland if this is not the cause. Original change's description: > Mark quic sessions as going away instead of close connections on SSL config change. > > Bug: 650509 > Change-Id: Ie5bb461a6094a33c50605a4735675b6552738b65 > Reviewed-on: https://chromium-review.googlesource.com/898502 > Reviewed-by: Ryan Hamilton <rch@chromium.org> > Reviewed-by: Nick Harper <nharper@chromium.org> > Commit-Queue: Zhongyi Shi <zhongyi@chromium.org> > Cr-Commit-Position: refs/heads/master@{#535932} TBR=rch@chromium.org,nharper@chromium.org,zhongyi@chromium.org Bug: 650509 Change-Id: I3dd3c2ac52aa8bb244660f6b2e9ac9de971e1a23 Reviewed-on: https://chromium-review.googlesource.com/939654 Commit-Queue: Zhongyi Shi <zhongyi@chromium.org> Reviewed-by: Nick Harper <nharper@chromium.org> Reviewed-by: Zhongyi Shi <zhongyi@chromium.org> Cr-Commit-Position: refs/heads/master@{#539650} [modify] https://crrev.com/e615a0843dd1e40f6f9056bad25c99e4b338c5d9/net/quic/chromium/quic_stream_factory.cc [modify] https://crrev.com/e615a0843dd1e40f6f9056bad25c99e4b338c5d9/net/quic/chromium/quic_stream_factory_test.cc |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by rsleevi@chromium.org
, Sep 27 2016