New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 708716 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 704024



Sign in to add a comment

chaps: don't block accesses to public objects while waiting on master key initialization

Project Member Reported by apronin@chromium.org, Apr 5 2017

Issue description

Currently, LoadToken puts master key initialization on a separate worker thread. And only accesses to private objects block on master key readiness.

However, if a private object operation is initiated, all other dbus messages sent to chapsd will wait on that operation completion, since they are all serialized at ChapsAdaptor (server-side) and ChapsProxy (client-side) levels.

Need to avoid blocking accesses that don't need to wait for the master key (e.g. for public objects) on the private object operation completion in such cases.

(Branched from  issue 704024 )
 
Description: Show this description
One possible way to achieve that is to avoid blocking and return a special error code from chapsd in such cases. And handle it in libchaps by releasing and re-acquiring the lock, and repeating the operation if such "would block" error is received.
Blocking: 704024

Comment 4 by nya@chromium.org, Apr 11 2017

Cc: nya@chromium.org
Cc: ejcaruso@chromium.org
Re #2: One simplistic relatively quick workaround is, even w/o any special processing in libchaps, to return CKR_USER_NOT_LOGGED_IN instead of waiting for private objects from all ops that deal with them. That's even compliant to PKCS#11 API spec.

The only outlier is C_Login - and the workaround is to never use it: the "login" will happen automatically when the token is initialized. Until then, retry on CKR_USER_NOT_LOGGED_IN in other ops.
Labels: -Pri-2 Pri-1
Status: Started (was: Assigned)
With CL:614787 in place (removes the global lock on the client side), it may be possible to do a limited-scope fix in ChapsAdaptor - repeat calling ChapsService methods while they return a magic error value, but release the lock between the calls. Or, find a way to queue these calls until the private objects are loaded.

Also need to check if this ChapsAdaptro-level global lock can be removed completely.
As long as the D-Bus call blocks we'll still only have that call in flight. I don't think the lock removal really changed much here; I removed it because it was superfluous now that we were serializing all the calls on one thread.

You might be able to do something in the ChapsProxy where the task gets re-posted to the D-Bus thread (probably with a delay to prevent flooding the daemon with requests) when it gets this magic error value.

If I remember correctly the only use for the adaptor global lock is to prevent D-Bus calls from being handled until the async initialization gets to a certain point, so we should be able to replace that with a real asynchronous startup sequence. If we want to minimize the structural changes it can be replaced with a WaitableEvent that is manually reset. Either way the adaptor handles requests on one thread so the lock doesn't really matter that much from the D-Bus handling point of view.
Re #9: You right, still serialized on the client side. So, as of now still need to propagate the error to the client and retry there.

However, it seems to be possible to replace CallMethodAndBlockWithTimeout() in DBusProxyWrapper::CallMethodOnTaskRunner() with its async cousin CallMethodWithTimeout() and keep the external DBusProxyWrapper interface unchanged. Then we'll remove the per-process serialization. And then dropping ChapsProxy global lock starts making sense for the multi-thread-client scenario.

As a long-term solution on ToT, I like this de-serialization approach more than re-posting tasks in ChapsProxy on magic error. However, for a quick-fix for old branches (b/63900068), it'd require picking too many changes from ToT. So, for that case will have to rely on magic errors.
That sounds fine. I was scared of doing something like that because I was wondering if the locking mechanism meant there was an assumption that only one operation would be in progress at a time, and if we move to a more asynchronous model then we might have several things in flight at once. In that case it might be useful to audit the code and make sure that assumption isn't affecting anything, especially because we have several threads running around on the adaptor side.
The short-term plan is doing the magic-error thing, anyways.

BTW, the first shot at chapsd side changes applied cleanly from ToT to 61 (pre-testing WIP CLs 666549..66552), which is better than expected. The client side code in ChapsProxy had more changes, so is yet to be seen if will need separate CLs.
Labels: M-62
Project Member

Comment 14 by bugdroid1@chromium.org, Sep 19 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/72c8738a05ac4fab8f33188a27c18124397f9699

commit 72c8738a05ac4fab8f33188a27c18124397f9699
Author: Andrey Pronin <apronin@chromium.org>
Date: Tue Sep 19 03:33:06 2017

chaps: return error code from Session::FlushModifiableObject

Small refactoring for session in preparation to switching to
non-blocking implementation for waiting for private objects:
return CK_RV instead of bool from Session::FlushModifiableObject.

BUG= chromium:708716 
TEST=unit tests

Change-Id: I5775de2d060a09599beb456009e83f5d2a97bafb
Reviewed-on: https://chromium-review.googlesource.com/666549
Commit-Ready: Andrey Pronin <apronin@chromium.org>
Tested-by: Andrey Pronin <apronin@chromium.org>
Reviewed-by: Eric Caruso <ejcaruso@chromium.org>

[modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/chaps_service_test.cc
[modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/session_test.cc
[modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/session_mock.h
[modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/session.h
[modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/session_impl.cc
[modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/chaps_service.cc
[modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/session_impl.h

Project Member

Comment 15 by bugdroid1@chromium.org, Sep 19 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/c037a75ad13541ada6e8b29cbb7770c0f7feadf0

commit c037a75ad13541ada6e8b29cbb7770c0f7feadf0
Author: Andrey Pronin <apronin@chromium.org>
Date: Tue Sep 19 03:33:07 2017

chaps: object pool operations return Result

In preparation to switching to non-blocking implementation for
waiting for private objects: return Result instead of bool from
ObjectPool methods that deal with objects.

BUG= chromium:708716 
TEST=unit tests

Change-Id: I56b3a17ac6d79bfe67929c186f643295cf0c6411
Reviewed-on: https://chromium-review.googlesource.com/667746
Commit-Ready: Andrey Pronin <apronin@chromium.org>
Tested-by: Andrey Pronin <apronin@chromium.org>
Reviewed-by: Eric Caruso <ejcaruso@chromium.org>

[modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/object_pool_impl.cc
[modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/slot_manager_test.cc
[modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/session_test.cc
[modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/opencryptoki_importer.cc
[modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/object_pool_impl.h
[modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/object_pool_test.cc
[modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/slot_manager_impl.cc
[modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/object_pool_mock.h
[modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/session_impl.cc
[modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/object_pool.h

Project Member

Comment 16 by bugdroid1@chromium.org, Sep 19 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/906bf3a73fb2149379b0c1359142373a113cc749

commit 906bf3a73fb2149379b0c1359142373a113cc749
Author: Andrey Pronin <apronin@chromium.org>
Date: Tue Sep 19 03:33:07 2017

chaps: implement IsPrivate in ObjectMock

In preparation to switching to non-blocking implementation for
waiting for private objects: implement IsPrivate in ObjectMock.

BUG= chromium:708716 
TEST=unit tests

Change-Id: I5d1eb9272a3ff71beea3e8191de952c8329ae2bf
Reviewed-on: https://chromium-review.googlesource.com/667747
Commit-Ready: Andrey Pronin <apronin@chromium.org>
Tested-by: Andrey Pronin <apronin@chromium.org>
Reviewed-by: Eric Caruso <ejcaruso@chromium.org>

[modify] https://crrev.com/906bf3a73fb2149379b0c1359142373a113cc749/chaps/object_pool_test.cc
[modify] https://crrev.com/906bf3a73fb2149379b0c1359142373a113cc749/chaps/object_mock.h

Project Member

Comment 17 by bugdroid1@chromium.org, Sep 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/0d388e9de7c52ed7309fc552f1873a9f85249984

commit 0d388e9de7c52ed7309fc552f1873a9f85249984
Author: Andrey Pronin <apronin@chromium.org>
Date: Wed Sep 20 02:43:36 2017

chaps: stop waiting for private objects in chapsd

Due to chapsd global serialization of requests from clients,
waiting on anything in one requests blocks all other requests.

Instead of waiting for private objects loading when an operation
on a private object is requested, return an error immediately and
let the client-side retry as needed.

BUG= chromium:708716 
TEST=unit tests
CQ-DEPEND=CL:669945

Change-Id: I93d9c8a9fb71aea837fcdf8f0e40a13290605fa9
Reviewed-on: https://chromium-review.googlesource.com/667748
Commit-Ready: Andrey Pronin <apronin@chromium.org>
Tested-by: Andrey Pronin <apronin@chromium.org>
Reviewed-by: Eric Caruso <ejcaruso@chromium.org>

[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/object_pool_impl.cc
[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/chaps_service_test.cc
[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/session_impl.cc
[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/chaps_utility.cc
[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/session_test.cc
[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/session_mock.h
[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/object_pool_impl.h
[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/object_pool_test.cc
[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/session.h
[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/object_pool_mock.h
[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/chaps.h
[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/object_pool.h
[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/chaps_service.cc
[modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/session_impl.h

Project Member

Comment 18 by bugdroid1@chromium.org, Sep 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/3298a0f278cd666fc5fcf412e26148195b09416b

commit 3298a0f278cd666fc5fcf412e26148195b09416b
Author: Andrey Pronin <apronin@chromium.org>
Date: Wed Sep 20 02:43:36 2017

chaps: retry requests to chapsd on getting would-block error

On the client-side libchaps, if chapsd returns an error indicating
that it would block waiting for private objects, retry the operation
until a different outcome is recieved or a deadline is reached.

BUG= chromium:708716 
TEST=unit tests

Change-Id: Ia9b83e30ccfa86052a0810d8c4230bf8b7e5f911
Reviewed-on: https://chromium-review.googlesource.com/669945
Commit-Ready: Andrey Pronin <apronin@chromium.org>
Tested-by: Andrey Pronin <apronin@chromium.org>
Reviewed-by: Andrey Pronin <apronin@chromium.org>

[modify] https://crrev.com/3298a0f278cd666fc5fcf412e26148195b09416b/chaps/chaps_test.cc
[modify] https://crrev.com/3298a0f278cd666fc5fcf412e26148195b09416b/chaps/chaps.cc
[modify] https://crrev.com/3298a0f278cd666fc5fcf412e26148195b09416b/chaps/chaps_proxy_mock.h

Status: Fixed (was: Started)
Thanks for fixing this! By the way, I noticed that the DBus implementation this thing uses under the hood actually has support for the service to implement methods asynchronously. So rather than polling, you could probably use AddMethodHandler in place of AddSimpleMethodHandler.

https://chromium.googlesource.com/aosp/platform/external/libbrillo/+/master/brillo/dbus/dbus_object.h
That was a much bigger refactor than I wanted to do at the time I moved the D-Bus bindings, but I can work on that if we think it's worth having. We'd still need to export a synchronous API for the pkcs#11 shared library, in any event.
Fair enough. Feel free to ignore me. I'm just a bystander as far as the implementation details go. (I went sniffing around because I was idly curious.)
Re #20: yes, fair point, and will look into making chapsd more async in the future. However, short term, see comments #8-10. There is also a global lock that server-side chapsd holds for all its methods, and on the client side we also need to deserialize the calls (all threads of the same process will post chapsd requests to the same message loop and do sync calls there). Seems doable, but as ejcaruso@ said, is a bigger & riskier change.

Sign in to add a comment