Issue metadata
Sign in to add a comment
|
SSL connection with client certificate hangs, new SSL connections fail
Reported by
jmatth...@duosecurity.com,
Apr 5 2018
|
||||||||||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36 Steps to reproduce the problem: 1. Visit a site that requires a client certificate stored in a local keychain on MacOS 10.13.4 2. That request will hang 3. Subsequent SSL requests will fail What is the expected behavior? All requests with the client certificate should succeed along with all other SSL requests. What went wrong? Attached is the log file containing both successful and unsuccessful requests. The fact that this works on older versions of MacOS leads me to believe that MacOS 10.13.4 caused this behavior and we've filed a bug with Apple. We are looking for further insight however. I have a large dump file which is over the 10 MB attachment limit. Did this work before? Yes MacOS 10.13.3 - Chrome 65.0.3325.181 Chrome version: 65.0.3325.181 Channel: stable OS Version: OS X 10.13.4 Flash Version: This is intermittent on 10.13.4. We have users who are running Chrome 65.0.3325.181 and MacOS 10.13.4 without issue. We also have several users who can reliably reproduce this issue. Rebooting or sleeping the Mac temporarily resolves the issue. Making the request that needs the client certificate will reliably re-trigger the bug. Users on 10.13.3 and any version of Chrome are fine.
,
Apr 6 2018
I'm happy to email our net trace - but it's 19 MB zipped.
,
Apr 6 2018
Thanks for filing the issue! The issue seems to be out of scope for triaging from our end as this speaks about SSL connections with client certificate and requests, hence adding label "TE-NeedsTriageHelp" and requesting some one from Internals>Network>Certificate team to have a look into it and help in triaging it further.
,
Apr 6 2018
,
Apr 6 2018
Adding a pcap which captures this issue.
,
Apr 6 2018
Net export attached
,
Apr 6 2018
I'm not seeing any client certificate requests in that log. I'm guessing that happened before you started capturing? I do, however, see a bunch of stalled CERT_VERIFIER_JOB events. Likely something about the client certificate path is hanging the certificate verifier. We don't put certificate verification and the client certificate signing on the same threads, but there is a bit of locking, both in our code and Apple's. On our side, we externally lock a number of accesses to Security.framework, to work around historical threading bugs in Apple's code. https://cs.chromium.org/search/?q=GetMacSecurityServicesLock&type=cs On Apple's side, I've seen Keychain lock up before. That said, whether the lock is on our end or Apple's, the root cause is that the client certificate path is hanging, which is probably an Apple bug. (Unless we have some truly amazing macOS-version-specific bug where we take that lock without releasing it. That seems unlikely.) Some things that might help narrow it down, if you can get them. (I understand it's intermittent.) 1. You mentioned that rebooting or putting the machine to sleep solves it. Does that mean that restarting Chrome is not sufficient? (If so, that definitely sounds like the global Keychain service locked up.) 2. When this problem occurs, does Safari also hang up, or is it just Chrome? (More evidence for whether it's global or process-local.) 3. When you say "Visit a site that requires a client certificate stored in a local keychain on MacOS 10.13.4 / That request will hang", how far does that request get before it hangs? (I.e. does the hang happen when we look for matching client certificates or when we go and sign with the key?) 4. If you could attach a net-internals that includes the client certificate request, that might be useful. 5. https://www.chromium.org/for-testers/bug-reporting-guidelines/hanging-tabs contains instructions to forcibly crash a hanging process in a way that should generate a crash report. That will give us stack traces of what various threads are up to, which will help figure out exactly what operation(s) are stuck. When it says to open Chrome's task manager to find the process ID, use the Browser process. Or, if that's not responding, the macOS task manager will work fine too. Thanks!
,
Apr 6 2018
Thanks for the response! 1. Restarting Chrome will allow new SSL connections successfully, as will sleeping. The client certificate requests will not succeed until a full restart of the machine. 2. Safari seems completely unaffected. 3. It seems to eventually get reset - we'll try and get some deeper traces with Burp or Charles on this. 4 and 5. Attached. I apologize if these don't have relevant traffic, I'm playing a bit of telephone with users that are affected.
,
Apr 6 2018
Thank you for providing more feedback. Adding the requester to the cc list. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 9 2018
Once it does go into this hang it hangs all other SSL connections not just client certificate connections.
,
Apr 9 2018
With some further research we have determined internally that the resource contention is resolved (when the bug occurs) by removing the Yubikey U2F token from the machine. For any hung connections removing the Yubikey instantly causes those connections to succeed, likewise the client certificate negotiation immediately completes once the Yubikey is removed.
,
Apr 9 2018
Which Yubikey device are you using? And do you have any Yubikey middleware involved (namely, CCID support)?
,
Apr 9 2018
Several - Yubikey 4 nano, Yubikey 4c, Yubikey 4c nano. CCID is disabled (We only enable U2F)
,
Apr 9 2018
I'm going to tag this as WebAuthN related, for visibility. It's unclear if Security.framework is getting jammed up (more aptly, securityd), if Chrome is doing something on the IO thread related to WebAuthN, or something else. If you can reproduce it reliably, can you include an Activity Monitor sample (click the Gear -> Sample Process) for Chrome? And see if securityd / securityd_service / trustd is doing anything exciting?
,
Apr 9 2018
WebAuthN is behind a flag in 65; do you have it enabled?
,
Apr 9 2018
We do not have WebAuthN enabled on machines that are exhibiting this issue.
,
Apr 9 2018
Thank you for providing more feedback. Adding the requester to the cc list. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 9 2018
Resetting NF for the Activity Monitor traces
,
Apr 9 2018
Do you have a crash ID for those crashes you attached in comment #8? Alternatively, +eroman, do you know how one would correlate/upload those to our crash reporting stack?They're macOS, so WinDbg will presumably not like them.
,
Apr 9 2018
Unfortunately I don't believe we have the crash IDs anymore. Attached are two activity monitor samples. One is from the client cert negotiation and the other is from a new SSL connection after the client cert notification.
,
Apr 9 2018
Thank you for providing more feedback. Adding the requester to the cc list. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 9 2018
Thanks. From the sample of hanging load-new-page, we see the TaskSchedulerForegroundBlockingWorker threads in the CertVerifyProc blocking on the GetMacSecurityServicesLock(). The Platform Key thread is in GetClientCertsOnBackgroundThread, which holds the lock. That thread is calling into Keychain via SecItemCopyMatching, which is dispatching into SecTokenCreate as part of SecItemResultCopyPrepared, which is waiting for TKTokenCreateWithConnection to synchronously respond. This supports the theory that there's a CCID interaction going on (or other form of token), as macOS would not otherwise be calling into TokenKit directly. This also explains why removing the Yubikey resolves this - this would cause TKTokenCreateWithConnection to fail, thus releasing the lock, thus allowing certificate verification to resume. We have the lock in place because on various versions of macOS, the thread-safe invariants were violated within Apple code, causing UAFs, crashes, or hangs. One possible item is to revisit whether that's still necessary in a 10.9+ world, but that'd be for a future Chrome release. So we need to figure out why TKTokenCreateWithConnection() is hanging (is it waiting for Chrome to do something, or for the token?) in the near-term, and it's definitely releated to (some) WebAuthN devices. Is there any chance that you've enabled the PIV mode for native login - as per https://www.yubico.com/why-yubico/for-business/computer-login/mac-os-login/ ?
,
Apr 9 2018
Oh, and activity traces of any of the following daemons: trustd secd identityservicesd keychaind I forget which one hosts the TKTokenKit XPC server - and it may be none of them - but that may help trace through further.
,
Apr 9 2018
Some of the affected users do in fact have either PIV or CCID enabled (some have both). I'm now testing with a Yubikey that I know for a fact has all of those functions disabled to see if I can get the behavior to trigger again. Will report back once I've successfully repro'd. I appreciate all of the eyes on this!
,
Apr 9 2018
Oh, and sleeping would resolve this because it'd send a USB power change notification, which would cause the device to register as unplugged ('effectively'... USB power states and all), hence causing the call to fail - until the next time.
,
Apr 9 2018
Thank you for providing more feedback. Adding the requester to the cc list. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by krajshree@chromium.org
, Apr 6 2018Labels: Needs-Bisect Needs-Triage-M65