New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 704024 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocked on:
issue 671420
issue 708716

Blocking:
issue 706336
issue 796437



Sign in to add a comment

chapsd causes CERT_NewTempCertificate to take 10+ seconds.

Project Member Reported by nya@chromium.org, Mar 22 2017

Issue description

Some blocking NSS functions are expensive for several reasons, such as:
a. Need to communicate with TPM.
b. Need to acquire a lock which is also acquired by (a).

Such functions must not be called on the IO thread to avoid freezing Chrome IPC communications. Actually we have observed UI/IO thread lockups on Chrome OS when TPM operations are slow (b/36000852).


There are at least two places where such calls are made:

1. LoadPrivateKeyByPublicKey()

#14 PK11_FindKeyByKeyID (slot=0xc4a7d6bbdc0, keyID=<optimized out>, wincx=0x0) at pk11akey.c:2329
#15 0x000057090cdfc0fc in crypto::FindNSSKeyFromPublicKeyInfoInSlot (input=..., slot=0xc4a7d6bbdc0) at ../../crypto/nss_key_util.cc:151
#16 0x000057090ef52671 in ownership::OwnerKeyUtilImpl::FindPrivateKeyInSlot (this=<optimized out>, key=..., slot=<optimized out>) at ../../components/ownership/owner_key_util_impl.cc:63
#17 0x000057090c2376d1 in chromeos::(anonymous namespace)::LoadPrivateKeyByPublicKey(const scoped_refptr<ownership::OwnerKeyUtil> &, scoped_refptr<ownership::PublicKey>, const std::string &, const base::Callback<void(const scoped_refptr<ownership::PublicKey>&, const scoped_refptr<ownership::PrivateKey>&), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> &) (owner_key_util=..., public_key=..., username_hash="6df3f648134aa95085a571cda4979f80ca106c56", callback=...) at ../../chrome/browser/chromeos/ownership/owner_settings_service_chromeos.cc:91

2. SSLClientSocketImpl::DoVerifyCert()

#7  0x00007b5da0fa6b23 in CERT_NewTempCertificate (handle=0x2da60b066030, derCert=0x7b5d937be560, nickname=0x0, isperm=<optimized out>, copyDER=1) at stanpcertdb.c:346
#8  0x000057c1dbe9f883 in CreateOSCertHandleFromBytesWithNickname (nickname=0x0, length=1249, data=0x2da60d4f3500 "0\202\004\335\060\202\003\305\240\003\002\001\002\002\b^L\272@\370|l\223\060\r\006\t*\206H\206\367\r\001\001\v\005") at ../../net/cert/x509_certificate_nss.cc:161
#9  net::X509Certificate::CreateOSCertHandleFromBytes (data=0x2da60d4f3500 "0\202\004\335\060\202\003\305\240\003\002\001\002\002\b^L\272@\370|l\223\060\r\006\t*\206H\206\367\r\001\001\v\005", length=1249) at ../../net/cert/x509_certificate_nss.cc:140
#10 0x000057c1dbe4243d in net::X509Certificate::CreateFromDERCertChain (der_certs=std::vector of length 3, capacity 3 = {...}) at ../../net/cert/x509_certificate.cc:262
#11 0x000057c1dbe62d2b in OSChainFromBuffers (openssl_chain=0x2da60d40c5a0) at ../../net/socket/ssl_client_socket_impl.cc:254
#12 net::SSLClientSocketImpl::DoVerifyCert (this=this@entry=0x2da60c9ad500, result=result@entry=0) at ../../net/socket/ssl_client_socket_impl.cc:1175

 

Comment 1 by nya@chromium.org, Mar 22 2017

Labels: -Pri-2 Pri-1
Cc: rsleevi@chromium.org
Components: Internals>Network>SSL
+rsleevi
Labels: Needs-Feedback
Please get a taller call stack before doing anything with #2. What is CERT_NewTempCertificate calling here that blocks?
Please also share the internal bug.
Oh, it's this, maybe? Line number matches.
https://dxr.mozilla.org/mozilla-central/source/security/nss/lib/certdb/stanpcertdb.c?q=stanpcertdb.c&redirect_type=direct#346
Tracing through the function, I eventually get here, which looks like it starts digging into slots?
https://dxr.mozilla.org/mozilla-central/source/security/nss/lib/pki/trustdomain.c?q=%2Bfunction%3A%22nssTrustDomain_FindCertificateByIssuerAndSerialNumber%28NSSTrustDomain+%2A%2C+NSSDER+%2A%2C+NSSDER+%2A%29%22&redirect_type=single#727

Ryan, do you know what's up there?

(Regardless, the proposed SSLClientSocketImpl CL isn't going to do the trick. If CERT_NewTempCertificate is problematic, moving call sites to the blocking pool one by one is infeasible and will cause unacceptable performance regressions. mattm's work in  issue #671420  is the real fix.)
Agreed on "real fix". And yeah, that sounds like NSS, and there may not be a quick fix. I also agree the overhead is likely unacceptably high.

ChromeOS ("used to") initialize the slot beforehand. libchaps can potentially do the initialization w/o blocking and simply defer the blocking operations, which should always be done on a worker?

Comment 8 by nya@chromium.org, Mar 24 2017

davidben, rsleevi, thanks for your help!


> Please get a taller call stack before doing anything with #2. What is CERT_NewTempCertificate calling here that blocks?

Attached chrome.bt.txt is a full stack trace of Chrome when the UI thread got frozen for 10+ seconds on logging in. Note that I applied https://codereview.chromium.org/2761353003, so LoadPrivateKeyByPublicKey() is run on the blocking pool instead of the IO thread.

As you see in the stack trace:
- The UI thread is blocked by a synchronous IPC call to the GPU process.
- The IO thread is blocked on acquiring a lock in nssTrustDomain_GetCertForIssuerAndSNFromCache().
- The lock is actually held by the thread running net::NSSCertDatabaseChromeOS::ListCertsImpl().
- net::NSSCertDatabaseChromeOS::ListCertsImpl() is blocked by a synchronous D-Bus call of FindObjectsInit().

And chapsd does not respond to FindObjectInit() for 10+ seconds, resulting UI/IO thread to freeze.


> Please also share the internal bug.

The internal bug b/36000852 is about UI/IO thread freeze on running Chrome OS autotests. It's currently reproducible on some specific Chromebook devices, so we suspect TPM operations are slow on those devices, though it's still under investigation.


> Regardless, the proposed SSLClientSocketImpl CL isn't going to do the trick. If CERT_NewTempCertificate is problematic, moving call sites to the blocking pool one by one is infeasible and will cause unacceptable performance regressions. mattm's work in  issue #671420  is the real fix.

Makes sense. What is ETA of  Issue 671420 ?


> ChromeOS ("used to") initialize the slot beforehand. libchaps can potentially do the initialization w/o blocking and simply defer the blocking operations, which should always be done on a worker?

Attached chapsd.bt.txt is a full stack trace of chapsd at the same time of Chrome UI freeze above. Its initialization TokenInitThread::InitializeKeyHierarchy() actually runs on a worker thread, but ChapsAdaptor::FindObjectsInit() blocks until initialization finishes (chaps::ObjectPoolImpl::WaitForPrivateObjects()).

I heard it is possible that the initialization takes 10+ seconds because it involves RSA key pair generation for example. I'm not sure we can process FindObjectsInit() without blocked by initialization.

chapsd.bt.txt
8.1 KB View Download
chrome.bt.txt
71.4 KB View Download

Comment 9 by nya@chromium.org, Mar 24 2017

Cc: davidben@chromium.org
Okay. Looks like it is indeed that function I was looking at.

If the initialization is taking 10+ seconds, you need to address that problem. With delays like those, neither your CL nor  issue #671420  is going to paper over the problem. Background threads are for keeping the UI smooth against a 0.1s operation, not 10s.

Fundamentally, CERT_NewTempCertificate is an operation that needs to happen to make any SSL connection, with or without  issue #671420 . That means network requests block on it. We are a browser. We live and breathe network requests. If network requests get stuck on the TPM for 10 seconds, it doesn't matter how buttery smooth the rest of the UI is, the user is still going to have a bad time.
To clarify, I do agree that we want to put it on a worker thread since CERT_NewTempCertificate appears to, obnoxiously, hit various stores. That will be done by  issue #671420 . But that won't make this scenario okay. 10+ seconds is a more pressing orthogonal problem.

Comment 12 by nya@chromium.org, Mar 24 2017

Yes, definitely we need to fix chapsd taking 10+ seconds to initialize, and we are looking into it.

But to clarify, Chrome is not the only process that needs to talk to TPM; cryptohomed, tpm_managerd and attestationd also talk to TPM in parallel, and requests from them are serialized by chapsd. So, even if a request from Chrome is a light-weight one that should finish in 1ms, if another daemon is performing a heavy operations like RSA key generation, Chrome's request will be blocked until the heavy operation finishes. So we should be prepared that any D-Bus call to chapsd can take more than a second anyway.


> To clarify, I do agree that we want to put it on a worker thread since CERT_NewTempCertificate appears to, obnoxiously, hit various stores. That will be done by  issue #671420 .

Great! How long do you think it will take to move CERT_NewTempCertificate() call to a worker thread?

Cc: dskaram@chromium.org
David: I disagree that this should be moved to a worker thread. This is an issue with the TPM<->Chrome interaction in which the TPM module is delaying its calls. This is where the root issue is.

The TPM library (libchaps) should initialize quickly and immediately. It should initially register itself with no slots present (nothing inserted). As the TPM initializes (as part of chaps), then on completion of that initialization, the public and private slots should register themselves with insertion events, but only once they're fully initialized.

Any interaction with the TPM itself (e.g. for keys) should and will be done on a worker thread, but that's already true.

That's the API contract the library (loaded in to Chrome) should be doing. NSS attempts to cache as much as possible in memory already, precisely because it expects there to be tokens that take 10s to complete *some* operations, but having this path on the slow path is no good.
Is libchaps a Chrome OS library or is it also used to init TPM on other platforms? Bug marked as OS=All so trying to understand whether we need to pull in CrOS folks here.
Labels: -OS-All OS-Chrome
Summary: chapsd causes CERT_NewTempCertificate to take 10+ seconds. (was: Do not call blocking NSS functions on the IO thread)
rsleevi: Oh, perhaps I'm misunderstanding what the call stack was doing? Is the expectation that the operation occur synchronously without I/O or merely with O(disk access) I/O time. Certainly 10 seconds is well out of scope, but my assumption was the latter. If the former then, yeah, it shouldn't be on a worker thread.

dskaram: Marking CrOS as this sounds like a CrOS-specific issue with chapsd. It's certainly NSS-specific.
Cc: -dskaram@chromium.org drcrash@chromium.org apronin@chromium.org
Adding folks who would know much more than I do about the topic. I un-cc'ed myself but please bring me back in if there's any product discussions where my help would be needed.
Cc: mattm@chromium.org
davidben: I may have misread the callstack.

NSS's access to tokens has two modes: public and private sessions. It expects public sessions to be fast (effectively) and private sessions to be slow and authenticated. It can cache the data in the public session (modulo insertion/removal events) in a local fast path.

What I missed is the DBus serialization being the issue, thinking it was similar to the previous issue of the public slot initialization being deferred/blocked on with the private slot initialization. I think you're right that it may require moving it off to a dedicated thread, but it also means potentially any of the NSS accesses would also need to be on a dedicated thread.

CC'ing Matt for his thoughts on what the most expedient route would be. What milestones are we targetting, I would say, determines what sort of next step we take.

Comment 18 by mattm@chromium.org, Mar 24 2017

First I just wanted to expand on the reason  issue #671420  would help here, since it may not be immediately obvious. We'll still need to create NSS temp certificates in order to do the cert verification, but 671420 will move that into the CertVerifyProcNSS. And thanks to MultiThreadedCertVerifier, that already runs on a worker thread.

From Chromium side I'm not sure if there is a better solution than just doing 671420. On that I was planning to do other platforms before linux/chromeos, but that could be adjusted if priorities call for it.

Comment 19 by nya@chromium.org, Mar 29 2017

Labels: -Pri-1 Pri-2
Thanks for discussions!

> What milestones are we targetting, I would say, determines what sort of next step we take.

This issue affects stability and time cost of autotests. As a temporary workaround, I'm increasing websocket timeouts from 10 seconds to 30 seconds. Not great solution, but it will work.

OTOH this issue might be not very visible to users. IIUC the issue affects only the first login, and on the first login UI just freezes for some moment. That's all.

So this issue might not be very urgent. I would say this issue is P2 (wanted for current milestore).

However, since the workaround is hacky, I really want a proper fix in reasonably near future, e.g. by next milestone (M60).

Comment 20 by nya@chromium.org, Apr 3 2017

Cc: nya@chromium.org
Owner: mattm@chromium.org
Reassigning to mattm@ as it is likely his upcoming change will be a proper fix.

nya: mattm's fix will *not* paper over a 10+ second fix. The UX will still be problematic. It will be better, but not good. The browser will still be non-functional for those 10+ seconds. Moving things off-thread is suitable for delays on the order of disk access delays, not for a 10+ second delay.

Do you have a plan to address the 10+ second delay? This needs to be addressed on the CrOS side.
Owner: nya@chromium.org
Reassigning to nya to work on the root cause in the TPM. The threading work is already being tracked by  issue #671420 , but, again, this is not a fix.
Cc: xixuan@chromium.org oka@chromium.org itspeter@chromium.org akes...@chromium.org philipchen@chromium.org shchen@chromium.org marc...@chromium.org
 Issue 697094  has been merged into this issue.

Comment 24 by nya@chromium.org, Apr 4 2017

+apronin working on Chrome OS TPM

OK, then let's use this public bug to track chapsd slowness issue.

rsleevi> NSS's access to tokens has two modes: public and private sessions. It expects public sessions to be fast (effectively) and private sessions to be slow and authenticated. It can cache the data in the public session (modulo insertion/removal events) in a local fast path.

Then it sounds to me that we should not serialize D-Bus requests? We have Chrome-side serialization in chaps_proxy.cc.

But even if we remove Chrome-side synchronization, can chapsd/TPM process such concurrent requests?

Comment 25 by nya@chromium.org, Apr 4 2017

Oh BTW, to clarify, I'm not familiar with this area at all; I was just investigating Chrome OS autotest failures and stumbled upon this issue. You experts' help is definitely needed to resolve this issue.

From a PKCS#11 model, the only serialization requirements are with respect to serialization within a given session handle, and NSS can and does use multiple session handles.

Comment 27 by nya@chromium.org, Apr 5 2017

rsleevi: I see. Then it seems we should remove the serialization lock in chaps_proxy, but IIUC the lock was introduced to avoid flooding chapsd and dropping requests, so I guess we need another way to avoid those issues before removing the lock.

apronin: I'd like to hear your opinion.

I believe it should be ok to remove the lock on the client side. But note that it won't necessarily help in the current situation:

1) In the current implementation, even if you remove serialization on the client side, processing all incoming dbus messages is serialized on the service (chapsd) side. So, if some previously sent message is stuck into processing for 10+ seconds, even if you send the next message not waiting for a response, it will wait for completing the previous message processing.

2) If the original analysis of what's taking long is correct, it's creating the new master key pair for the token initiated by LoadToken. chapsd already has the mechanism to prevent blocking all callers on this potentially long operation. LoadToken starts a TokenInitThread, where this long operation is performed, and returns w/o waiting for this thread to complete. Note this comment that sums it up at https://chromium.googlesource.com/chromiumos/platform/chaps/+/master/slot_manager_impl.cc#564:
Decrypting (or creating) the master key requires the TPM so we'll put this on a worker thread. This has the effect that queries for public objects are responsive but queries for private objects will be waiting for the master key to be ready.

3) So, yes, if you only access public objects until the master key initialization is done, those should be fast, with or without removing the client-side global lock. However, the moment you access a private object in that user's slot, that dbus message will block waiting for the master key, and all subsequent accesses to public objects will wait for it. Again, with or without the client-side lock.

4) The only way to address it is to organize concurrent processing of dbus messages in chapsd. And it may be more than just implementing a more sophisticated locking scheme in chapsd code. The standard libraries it uses also play a role: I'm not sure what the currently used glib dbus library does - even if we remove the lock, it may already serialize all incoming dbus messages anyways. I will need to  experiment to see how it works.

5) And this improvement from #4 only helps with public objects access. Any accesses to user's private objects will still block until the master key is ready. 

6) Again, if I understand the original use case correctly, this 10+ seconds delay happens only on the first boot/first login, when the master key pair for the token has not been created yet. In addition to it being a lengthier-than-usual initialization of the token, it is also done in parallel with other first-boot initializations of tpm-related daemons (taking tpm ownership, creating cryptohome key, preparing for enrollment, etc). All accesses to tpm are serialized at the tpm-facing daemon (trunksd in this case), which is unavoidable: even if we don't do it there, then it's done on the tpm side. So, this TokenInitThread will wait on other threads multiple times while sending multiple tpm commands.

I'm saying this to indicate that although we can possibly improve the initialization performance by optimizing the set of commands sent to the tpm, it will still likely be a multi-seconds operation during the first boot. And until it's complete, accesses to the user's objects won't be possible.

As I understand, subsequent boots/logins are fine (b/36000852#comment37).
We already restricted private object access to worker threads in Chrome, but public object accesses may happen on the IO thread (and NSS itself maintains a 10 second cache independent of that).

So it sounds like #4 would be useful?
Re #29: Yes, then #4 would be useful for the first login.

BTW, one alternative implementation, if (as I suspect) glib dbus lib doesn't support concurrent messages, would be returning a special "would block" error code from chapsd if an operation on the private object would have to wait for the master key. And retrying private object accesses in the client-side chaps library code (with releasing and re-acquiring the client side lock).
Blockedon: 708716
I'll work on the changes in #29/#30. Created a separate, more focused,  issue 708716  for that. 
I should clarify - My comment #29 was with respect to "we" being the folks in //net doing private key access. None of us know the ChromeOS initialization path to know if it's accessing or loading private keys as part of the ChromeOS-in-Chrome startup side.
Project Member

Comment 33 by bugdroid1@chromium.org, Apr 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b3d6e89884650aac1ad2bf9a51e8dc51f06d1f42

commit b3d6e89884650aac1ad2bf9a51e8dc51f06d1f42
Author: nya <nya@chromium.org>
Date: Thu Apr 06 00:46:46 2017

Call OwnerKeyUtil::FindPrivateKeyInSlot() in the blocking pool.

The function can block for long time and should not be called on the IO
thread. In fact, on Chrome OS, it might not return until TPM-related
initialization is finished.

BUG= chromium:704024 
TEST=unit_tests

Review-Url: https://codereview.chromium.org/2761353003
Cr-Commit-Position: refs/heads/master@{#462299}

[modify] https://crrev.com/b3d6e89884650aac1ad2bf9a51e8dc51f06d1f42/chrome/browser/chromeos/ownership/owner_settings_service_chromeos.cc

Comment 34 by nya@chromium.org, Apr 6 2017

apronin:

Thanks for working on chapsd fix!

> As I understand, subsequent boots/logins are fine (b/36000852#comment37).

Hmm, but I could reproduce this issue by the following autotest which just repeats login without rebooting the device.
https://chromium-review.googlesource.com/c/458138/1/client/site_tests/tmp_LoginStressTest/tmp_LoginStressTest.py

So I guess this issue also hits non-first boot cases. Please see comment 8 for the stack traces when a freeze happens.

Comment 35 by nya@chromium.org, Apr 10 2017

Blocking: 706336
Labels: -Needs-Feedback
Doesn't seem like this is waiting on feedback.

Comment 37 by nya@chromium.org, Sep 12 2017

Owner: ----
Status: Available (was: Started)
In the discussion above, I believe this bug was effectively split into following two bugs:

-  Issue 708716 : chaps: don't block accesses to public objects while waiting on master key initialization
-  Issue 671420 : Represent certificates using CRYPTO_BUFFER.	

Therefore I've been not working actively on this issue.

FYI to Googlers: another internal bug was filed related to this issue, b/63900068.

Comment 38 by nya@chromium.org, Sep 12 2017

Blockedon: 671420
Again, per comment #21, the real bug is that chaps is slow. Moving operations off-thread, one of the consequences of CRYPTO_BUFFER, will *not* paper over 10+ second slowdown. Nor will it move all of NSS access off-thread, just remove the first blocker. The CrOS code accesses NSS on the UI and IO threads all over the place. I would like to see the ScopedAllowIO removed from EnsureNSSInit someday, but it'll take a lot of work on the CrOS side.
@Comment 39: We won't be able to removed ScopedAllowIO, because it's still fundamentally IO (disk/file) access. That said,  Issue 708716  should still be treated as high priority.
Owner: nya@chromium.org
Status: Assigned (was: Available)
Well, we would if *all* NSS access could be moved off UI and IO thread, but that's quite unrealistic, yeah. I shouldn't have mentioned it. It's just a distraction from the real bug: chapsd should not cause CERT_NewTempCertificate to take 10+ seconds.

This bug should remain assigned to someone working on chapsd.  Issue #671420  is a red herring.

Comment 42 by nya@chromium.org, Sep 12 2017

Cc: khmel@chromium.org
Owner: khmel@chromium.org
Reassigning to khmel as he is actively working on b/63900068.

Re #41: In fact, we seem to already have a bug for that:  issue 708716 

Does CERT_NewTempCertificate use private objects?

chapsd doesn't know anything about CERT_NewTempCertificate. It provides PKCS#11 API, so deals at C_FindObject, C_CreateObject level. If the requested objects are not private (don't require TPM operations), it can be done faster. If the objects are private, chapsd time is defined by the TPM performance. Initiating the keys for the user keystore or creating a new keypair does take seconds. That can be addressed only on the tpm side. What chapsd can do is to not make public objects wait for private objects. And that's in  issue 708716 .

Right,  Issue 708716  seems to address the root cause here - which is that CERT_NewTempCertificate (and the other NSS non-privkey APIs) will all try to use public sessions unless the token forces authentication, and it expects public sessions to be relatively fast
Just to be clear: chapsd doesn't require PKCS-level authentication to access objects in user tokens. As soon as the user signs-in, its token is available w/o additional C_Login().

So, the question is about using objects with CKA_PRIVATE attribute (or which belong to CKO_PRIVATE_KEY class). Those require full token initialization (which can take extra 10+ seconds, as we see) before they can be accessed.
My understanding is that something !CERT_NewTempCertificate is doing the CKO_PRIVATE_KEY operation, and as a result, those public operations are being serialized behind it.

That is, even though chapsd allows public and private sessions as distinct sessions, it coalesces all activity. In the case of start-up, this is particularly bad - as otherwise, NSS maintains an internal cache of the public objects precisely to avoid hitting the card often.
Cc: -marc...@chromium.org

Comment 48 by khmel@chromium.org, Sep 18 2017

Status: Fixed (was: Assigned)
provided workaround in b/63900068. This prevents appearing this race condition.

Comment 49 by kbr@chromium.org, Dec 20 2017

Blocking: 796437

Comment 50 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Comment 51 by dchan@chromium.org, Jan 23 2018

Status: Fixed (was: Archived)

Sign in to add a comment