chapsd causes CERT_NewTempCertificate to take 10+ seconds. |
||||||||||||||||||||||||
Issue descriptionSome blocking NSS functions are expensive for several reasons, such as: a. Need to communicate with TPM. b. Need to acquire a lock which is also acquired by (a). Such functions must not be called on the IO thread to avoid freezing Chrome IPC communications. Actually we have observed UI/IO thread lockups on Chrome OS when TPM operations are slow (b/36000852). There are at least two places where such calls are made: 1. LoadPrivateKeyByPublicKey() #14 PK11_FindKeyByKeyID (slot=0xc4a7d6bbdc0, keyID=<optimized out>, wincx=0x0) at pk11akey.c:2329 #15 0x000057090cdfc0fc in crypto::FindNSSKeyFromPublicKeyInfoInSlot (input=..., slot=0xc4a7d6bbdc0) at ../../crypto/nss_key_util.cc:151 #16 0x000057090ef52671 in ownership::OwnerKeyUtilImpl::FindPrivateKeyInSlot (this=<optimized out>, key=..., slot=<optimized out>) at ../../components/ownership/owner_key_util_impl.cc:63 #17 0x000057090c2376d1 in chromeos::(anonymous namespace)::LoadPrivateKeyByPublicKey(const scoped_refptr<ownership::OwnerKeyUtil> &, scoped_refptr<ownership::PublicKey>, const std::string &, const base::Callback<void(const scoped_refptr<ownership::PublicKey>&, const scoped_refptr<ownership::PrivateKey>&), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> &) (owner_key_util=..., public_key=..., username_hash="6df3f648134aa95085a571cda4979f80ca106c56", callback=...) at ../../chrome/browser/chromeos/ownership/owner_settings_service_chromeos.cc:91 2. SSLClientSocketImpl::DoVerifyCert() #7 0x00007b5da0fa6b23 in CERT_NewTempCertificate (handle=0x2da60b066030, derCert=0x7b5d937be560, nickname=0x0, isperm=<optimized out>, copyDER=1) at stanpcertdb.c:346 #8 0x000057c1dbe9f883 in CreateOSCertHandleFromBytesWithNickname (nickname=0x0, length=1249, data=0x2da60d4f3500 "0\202\004\335\060\202\003\305\240\003\002\001\002\002\b^L\272@\370|l\223\060\r\006\t*\206H\206\367\r\001\001\v\005") at ../../net/cert/x509_certificate_nss.cc:161 #9 net::X509Certificate::CreateOSCertHandleFromBytes (data=0x2da60d4f3500 "0\202\004\335\060\202\003\305\240\003\002\001\002\002\b^L\272@\370|l\223\060\r\006\t*\206H\206\367\r\001\001\v\005", length=1249) at ../../net/cert/x509_certificate_nss.cc:140 #10 0x000057c1dbe4243d in net::X509Certificate::CreateFromDERCertChain (der_certs=std::vector of length 3, capacity 3 = {...}) at ../../net/cert/x509_certificate.cc:262 #11 0x000057c1dbe62d2b in OSChainFromBuffers (openssl_chain=0x2da60d40c5a0) at ../../net/socket/ssl_client_socket_impl.cc:254 #12 net::SSLClientSocketImpl::DoVerifyCert (this=this@entry=0x2da60c9ad500, result=result@entry=0) at ../../net/socket/ssl_client_socket_impl.cc:1175
,
Mar 22 2017
,
Mar 23 2017
+rsleevi
,
Mar 23 2017
Please get a taller call stack before doing anything with #2. What is CERT_NewTempCertificate calling here that blocks?
,
Mar 23 2017
Please also share the internal bug.
,
Mar 23 2017
Oh, it's this, maybe? Line number matches. https://dxr.mozilla.org/mozilla-central/source/security/nss/lib/certdb/stanpcertdb.c?q=stanpcertdb.c&redirect_type=direct#346 Tracing through the function, I eventually get here, which looks like it starts digging into slots? https://dxr.mozilla.org/mozilla-central/source/security/nss/lib/pki/trustdomain.c?q=%2Bfunction%3A%22nssTrustDomain_FindCertificateByIssuerAndSerialNumber%28NSSTrustDomain+%2A%2C+NSSDER+%2A%2C+NSSDER+%2A%29%22&redirect_type=single#727 Ryan, do you know what's up there? (Regardless, the proposed SSLClientSocketImpl CL isn't going to do the trick. If CERT_NewTempCertificate is problematic, moving call sites to the blocking pool one by one is infeasible and will cause unacceptable performance regressions. mattm's work in issue #671420 is the real fix.)
,
Mar 23 2017
Agreed on "real fix". And yeah, that sounds like NSS, and there may not be a quick fix. I also agree the overhead is likely unacceptably high.
ChromeOS ("used to") initialize the slot beforehand. libchaps can potentially do the initialization w/o blocking and simply defer the blocking operations, which should always be done on a worker?
,
Mar 24 2017
davidben, rsleevi, thanks for your help! > Please get a taller call stack before doing anything with #2. What is CERT_NewTempCertificate calling here that blocks? Attached chrome.bt.txt is a full stack trace of Chrome when the UI thread got frozen for 10+ seconds on logging in. Note that I applied https://codereview.chromium.org/2761353003, so LoadPrivateKeyByPublicKey() is run on the blocking pool instead of the IO thread. As you see in the stack trace: - The UI thread is blocked by a synchronous IPC call to the GPU process. - The IO thread is blocked on acquiring a lock in nssTrustDomain_GetCertForIssuerAndSNFromCache(). - The lock is actually held by the thread running net::NSSCertDatabaseChromeOS::ListCertsImpl(). - net::NSSCertDatabaseChromeOS::ListCertsImpl() is blocked by a synchronous D-Bus call of FindObjectsInit(). And chapsd does not respond to FindObjectInit() for 10+ seconds, resulting UI/IO thread to freeze. > Please also share the internal bug. The internal bug b/36000852 is about UI/IO thread freeze on running Chrome OS autotests. It's currently reproducible on some specific Chromebook devices, so we suspect TPM operations are slow on those devices, though it's still under investigation. > Regardless, the proposed SSLClientSocketImpl CL isn't going to do the trick. If CERT_NewTempCertificate is problematic, moving call sites to the blocking pool one by one is infeasible and will cause unacceptable performance regressions. mattm's work in issue #671420 is the real fix. Makes sense. What is ETA of Issue 671420 ? > ChromeOS ("used to") initialize the slot beforehand. libchaps can potentially do the initialization w/o blocking and simply defer the blocking operations, which should always be done on a worker? Attached chapsd.bt.txt is a full stack trace of chapsd at the same time of Chrome UI freeze above. Its initialization TokenInitThread::InitializeKeyHierarchy() actually runs on a worker thread, but ChapsAdaptor::FindObjectsInit() blocks until initialization finishes (chaps::ObjectPoolImpl::WaitForPrivateObjects()). I heard it is possible that the initialization takes 10+ seconds because it involves RSA key pair generation for example. I'm not sure we can process FindObjectsInit() without blocked by initialization.
,
Mar 24 2017
,
Mar 24 2017
Okay. Looks like it is indeed that function I was looking at. If the initialization is taking 10+ seconds, you need to address that problem. With delays like those, neither your CL nor issue #671420 is going to paper over the problem. Background threads are for keeping the UI smooth against a 0.1s operation, not 10s. Fundamentally, CERT_NewTempCertificate is an operation that needs to happen to make any SSL connection, with or without issue #671420 . That means network requests block on it. We are a browser. We live and breathe network requests. If network requests get stuck on the TPM for 10 seconds, it doesn't matter how buttery smooth the rest of the UI is, the user is still going to have a bad time.
,
Mar 24 2017
To clarify, I do agree that we want to put it on a worker thread since CERT_NewTempCertificate appears to, obnoxiously, hit various stores. That will be done by issue #671420 . But that won't make this scenario okay. 10+ seconds is a more pressing orthogonal problem.
,
Mar 24 2017
Yes, definitely we need to fix chapsd taking 10+ seconds to initialize, and we are looking into it. But to clarify, Chrome is not the only process that needs to talk to TPM; cryptohomed, tpm_managerd and attestationd also talk to TPM in parallel, and requests from them are serialized by chapsd. So, even if a request from Chrome is a light-weight one that should finish in 1ms, if another daemon is performing a heavy operations like RSA key generation, Chrome's request will be blocked until the heavy operation finishes. So we should be prepared that any D-Bus call to chapsd can take more than a second anyway. > To clarify, I do agree that we want to put it on a worker thread since CERT_NewTempCertificate appears to, obnoxiously, hit various stores. That will be done by issue #671420 . Great! How long do you think it will take to move CERT_NewTempCertificate() call to a worker thread?
,
Mar 24 2017
David: I disagree that this should be moved to a worker thread. This is an issue with the TPM<->Chrome interaction in which the TPM module is delaying its calls. This is where the root issue is. The TPM library (libchaps) should initialize quickly and immediately. It should initially register itself with no slots present (nothing inserted). As the TPM initializes (as part of chaps), then on completion of that initialization, the public and private slots should register themselves with insertion events, but only once they're fully initialized. Any interaction with the TPM itself (e.g. for keys) should and will be done on a worker thread, but that's already true. That's the API contract the library (loaded in to Chrome) should be doing. NSS attempts to cache as much as possible in memory already, precisely because it expects there to be tokens that take 10s to complete *some* operations, but having this path on the slow path is no good.
,
Mar 24 2017
Is libchaps a Chrome OS library or is it also used to init TPM on other platforms? Bug marked as OS=All so trying to understand whether we need to pull in CrOS folks here.
,
Mar 24 2017
rsleevi: Oh, perhaps I'm misunderstanding what the call stack was doing? Is the expectation that the operation occur synchronously without I/O or merely with O(disk access) I/O time. Certainly 10 seconds is well out of scope, but my assumption was the latter. If the former then, yeah, it shouldn't be on a worker thread. dskaram: Marking CrOS as this sounds like a CrOS-specific issue with chapsd. It's certainly NSS-specific.
,
Mar 24 2017
Adding folks who would know much more than I do about the topic. I un-cc'ed myself but please bring me back in if there's any product discussions where my help would be needed.
,
Mar 24 2017
davidben: I may have misread the callstack. NSS's access to tokens has two modes: public and private sessions. It expects public sessions to be fast (effectively) and private sessions to be slow and authenticated. It can cache the data in the public session (modulo insertion/removal events) in a local fast path. What I missed is the DBus serialization being the issue, thinking it was similar to the previous issue of the public slot initialization being deferred/blocked on with the private slot initialization. I think you're right that it may require moving it off to a dedicated thread, but it also means potentially any of the NSS accesses would also need to be on a dedicated thread. CC'ing Matt for his thoughts on what the most expedient route would be. What milestones are we targetting, I would say, determines what sort of next step we take.
,
Mar 24 2017
First I just wanted to expand on the reason issue #671420 would help here, since it may not be immediately obvious. We'll still need to create NSS temp certificates in order to do the cert verification, but 671420 will move that into the CertVerifyProcNSS. And thanks to MultiThreadedCertVerifier, that already runs on a worker thread. From Chromium side I'm not sure if there is a better solution than just doing 671420. On that I was planning to do other platforms before linux/chromeos, but that could be adjusted if priorities call for it.
,
Mar 29 2017
Thanks for discussions! > What milestones are we targetting, I would say, determines what sort of next step we take. This issue affects stability and time cost of autotests. As a temporary workaround, I'm increasing websocket timeouts from 10 seconds to 30 seconds. Not great solution, but it will work. OTOH this issue might be not very visible to users. IIUC the issue affects only the first login, and on the first login UI just freezes for some moment. That's all. So this issue might not be very urgent. I would say this issue is P2 (wanted for current milestore). However, since the workaround is hacky, I really want a proper fix in reasonably near future, e.g. by next milestone (M60).
,
Apr 3 2017
Reassigning to mattm@ as it is likely his upcoming change will be a proper fix.
,
Apr 3 2017
nya: mattm's fix will *not* paper over a 10+ second fix. The UX will still be problematic. It will be better, but not good. The browser will still be non-functional for those 10+ seconds. Moving things off-thread is suitable for delays on the order of disk access delays, not for a 10+ second delay. Do you have a plan to address the 10+ second delay? This needs to be addressed on the CrOS side.
,
Apr 3 2017
Reassigning to nya to work on the root cause in the TPM. The threading work is already being tracked by issue #671420 , but, again, this is not a fix.
,
Apr 4 2017
Issue 697094 has been merged into this issue.
,
Apr 4 2017
+apronin working on Chrome OS TPM OK, then let's use this public bug to track chapsd slowness issue. rsleevi> NSS's access to tokens has two modes: public and private sessions. It expects public sessions to be fast (effectively) and private sessions to be slow and authenticated. It can cache the data in the public session (modulo insertion/removal events) in a local fast path. Then it sounds to me that we should not serialize D-Bus requests? We have Chrome-side serialization in chaps_proxy.cc. But even if we remove Chrome-side synchronization, can chapsd/TPM process such concurrent requests?
,
Apr 4 2017
Oh BTW, to clarify, I'm not familiar with this area at all; I was just investigating Chrome OS autotest failures and stumbled upon this issue. You experts' help is definitely needed to resolve this issue.
,
Apr 4 2017
From a PKCS#11 model, the only serialization requirements are with respect to serialization within a given session handle, and NSS can and does use multiple session handles.
,
Apr 5 2017
rsleevi: I see. Then it seems we should remove the serialization lock in chaps_proxy, but IIUC the lock was introduced to avoid flooding chapsd and dropping requests, so I guess we need another way to avoid those issues before removing the lock. apronin: I'd like to hear your opinion.
,
Apr 5 2017
I believe it should be ok to remove the lock on the client side. But note that it won't necessarily help in the current situation: 1) In the current implementation, even if you remove serialization on the client side, processing all incoming dbus messages is serialized on the service (chapsd) side. So, if some previously sent message is stuck into processing for 10+ seconds, even if you send the next message not waiting for a response, it will wait for completing the previous message processing. 2) If the original analysis of what's taking long is correct, it's creating the new master key pair for the token initiated by LoadToken. chapsd already has the mechanism to prevent blocking all callers on this potentially long operation. LoadToken starts a TokenInitThread, where this long operation is performed, and returns w/o waiting for this thread to complete. Note this comment that sums it up at https://chromium.googlesource.com/chromiumos/platform/chaps/+/master/slot_manager_impl.cc#564: Decrypting (or creating) the master key requires the TPM so we'll put this on a worker thread. This has the effect that queries for public objects are responsive but queries for private objects will be waiting for the master key to be ready. 3) So, yes, if you only access public objects until the master key initialization is done, those should be fast, with or without removing the client-side global lock. However, the moment you access a private object in that user's slot, that dbus message will block waiting for the master key, and all subsequent accesses to public objects will wait for it. Again, with or without the client-side lock. 4) The only way to address it is to organize concurrent processing of dbus messages in chapsd. And it may be more than just implementing a more sophisticated locking scheme in chapsd code. The standard libraries it uses also play a role: I'm not sure what the currently used glib dbus library does - even if we remove the lock, it may already serialize all incoming dbus messages anyways. I will need to experiment to see how it works. 5) And this improvement from #4 only helps with public objects access. Any accesses to user's private objects will still block until the master key is ready. 6) Again, if I understand the original use case correctly, this 10+ seconds delay happens only on the first boot/first login, when the master key pair for the token has not been created yet. In addition to it being a lengthier-than-usual initialization of the token, it is also done in parallel with other first-boot initializations of tpm-related daemons (taking tpm ownership, creating cryptohome key, preparing for enrollment, etc). All accesses to tpm are serialized at the tpm-facing daemon (trunksd in this case), which is unavoidable: even if we don't do it there, then it's done on the tpm side. So, this TokenInitThread will wait on other threads multiple times while sending multiple tpm commands. I'm saying this to indicate that although we can possibly improve the initialization performance by optimizing the set of commands sent to the tpm, it will still likely be a multi-seconds operation during the first boot. And until it's complete, accesses to the user's objects won't be possible. As I understand, subsequent boots/logins are fine (b/36000852#comment37).
,
Apr 5 2017
We already restricted private object access to worker threads in Chrome, but public object accesses may happen on the IO thread (and NSS itself maintains a 10 second cache independent of that). So it sounds like #4 would be useful?
,
Apr 5 2017
Re #29: Yes, then #4 would be useful for the first login. BTW, one alternative implementation, if (as I suspect) glib dbus lib doesn't support concurrent messages, would be returning a special "would block" error code from chapsd if an operation on the private object would have to wait for the master key. And retrying private object accesses in the client-side chaps library code (with releasing and re-acquiring the client side lock).
,
Apr 5 2017
I'll work on the changes in #29/#30. Created a separate, more focused, issue 708716 for that.
,
Apr 5 2017
I should clarify - My comment #29 was with respect to "we" being the folks in //net doing private key access. None of us know the ChromeOS initialization path to know if it's accessing or loading private keys as part of the ChromeOS-in-Chrome startup side.
,
Apr 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b3d6e89884650aac1ad2bf9a51e8dc51f06d1f42 commit b3d6e89884650aac1ad2bf9a51e8dc51f06d1f42 Author: nya <nya@chromium.org> Date: Thu Apr 06 00:46:46 2017 Call OwnerKeyUtil::FindPrivateKeyInSlot() in the blocking pool. The function can block for long time and should not be called on the IO thread. In fact, on Chrome OS, it might not return until TPM-related initialization is finished. BUG= chromium:704024 TEST=unit_tests Review-Url: https://codereview.chromium.org/2761353003 Cr-Commit-Position: refs/heads/master@{#462299} [modify] https://crrev.com/b3d6e89884650aac1ad2bf9a51e8dc51f06d1f42/chrome/browser/chromeos/ownership/owner_settings_service_chromeos.cc
,
Apr 6 2017
apronin: Thanks for working on chapsd fix! > As I understand, subsequent boots/logins are fine (b/36000852#comment37). Hmm, but I could reproduce this issue by the following autotest which just repeats login without rebooting the device. https://chromium-review.googlesource.com/c/458138/1/client/site_tests/tmp_LoginStressTest/tmp_LoginStressTest.py So I guess this issue also hits non-first boot cases. Please see comment 8 for the stack traces when a freeze happens.
,
Apr 10 2017
,
Apr 20 2017
Doesn't seem like this is waiting on feedback.
,
Sep 12 2017
In the discussion above, I believe this bug was effectively split into following two bugs: - Issue 708716 : chaps: don't block accesses to public objects while waiting on master key initialization - Issue 671420 : Represent certificates using CRYPTO_BUFFER. Therefore I've been not working actively on this issue. FYI to Googlers: another internal bug was filed related to this issue, b/63900068.
,
Sep 12 2017
,
Sep 12 2017
Again, per comment #21, the real bug is that chaps is slow. Moving operations off-thread, one of the consequences of CRYPTO_BUFFER, will *not* paper over 10+ second slowdown. Nor will it move all of NSS access off-thread, just remove the first blocker. The CrOS code accesses NSS on the UI and IO threads all over the place. I would like to see the ScopedAllowIO removed from EnsureNSSInit someday, but it'll take a lot of work on the CrOS side.
,
Sep 12 2017
@Comment 39: We won't be able to removed ScopedAllowIO, because it's still fundamentally IO (disk/file) access. That said, Issue 708716 should still be treated as high priority.
,
Sep 12 2017
Well, we would if *all* NSS access could be moved off UI and IO thread, but that's quite unrealistic, yeah. I shouldn't have mentioned it. It's just a distraction from the real bug: chapsd should not cause CERT_NewTempCertificate to take 10+ seconds. This bug should remain assigned to someone working on chapsd. Issue #671420 is a red herring.
,
Sep 12 2017
Reassigning to khmel as he is actively working on b/63900068.
,
Sep 12 2017
Re #41: In fact, we seem to already have a bug for that: issue 708716 Does CERT_NewTempCertificate use private objects? chapsd doesn't know anything about CERT_NewTempCertificate. It provides PKCS#11 API, so deals at C_FindObject, C_CreateObject level. If the requested objects are not private (don't require TPM operations), it can be done faster. If the objects are private, chapsd time is defined by the TPM performance. Initiating the keys for the user keystore or creating a new keypair does take seconds. That can be addressed only on the tpm side. What chapsd can do is to not make public objects wait for private objects. And that's in issue 708716 .
,
Sep 12 2017
Right, Issue 708716 seems to address the root cause here - which is that CERT_NewTempCertificate (and the other NSS non-privkey APIs) will all try to use public sessions unless the token forces authentication, and it expects public sessions to be relatively fast
,
Sep 12 2017
Just to be clear: chapsd doesn't require PKCS-level authentication to access objects in user tokens. As soon as the user signs-in, its token is available w/o additional C_Login(). So, the question is about using objects with CKA_PRIVATE attribute (or which belong to CKO_PRIVATE_KEY class). Those require full token initialization (which can take extra 10+ seconds, as we see) before they can be accessed.
,
Sep 12 2017
My understanding is that something !CERT_NewTempCertificate is doing the CKO_PRIVATE_KEY operation, and as a result, those public operations are being serialized behind it. That is, even though chapsd allows public and private sessions as distinct sessions, it coalesces all activity. In the case of start-up, this is particularly bad - as otherwise, NSS maintains an internal cache of the public objects precisely to avoid hitting the card often.
,
Sep 12 2017
,
Sep 18 2017
provided workaround in b/63900068. This prevents appearing this race condition.
,
Dec 20 2017
,
Jan 22 2018
,
Jan 23 2018
|
||||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||||
Comment 1 by nya@chromium.org
, Mar 22 2017