chaps: don't block accesses to public objects while waiting on master key initialization |
|||||||
Issue descriptionCurrently, LoadToken puts master key initialization on a separate worker thread. And only accesses to private objects block on master key readiness. However, if a private object operation is initiated, all other dbus messages sent to chapsd will wait on that operation completion, since they are all serialized at ChapsAdaptor (server-side) and ChapsProxy (client-side) levels. Need to avoid blocking accesses that don't need to wait for the master key (e.g. for public objects) on the private object operation completion in such cases. (Branched from issue 704024 )
,
Apr 5 2017
One possible way to achieve that is to avoid blocking and return a special error code from chapsd in such cases. And handle it in libchaps by releasing and re-acquiring the lock, and repeating the operation if such "would block" error is received.
,
Apr 5 2017
,
Apr 11 2017
,
May 18 2017
,
Sep 12 2017
Re #2: One simplistic relatively quick workaround is, even w/o any special processing in libchaps, to return CKR_USER_NOT_LOGGED_IN instead of waiting for private objects from all ops that deal with them. That's even compliant to PKCS#11 API spec. The only outlier is C_Login - and the workaround is to never use it: the "login" will happen automatically when the token is initialized. Until then, retry on CKR_USER_NOT_LOGGED_IN in other ops.
,
Sep 13 2017
,
Sep 13 2017
With CL:614787 in place (removes the global lock on the client side), it may be possible to do a limited-scope fix in ChapsAdaptor - repeat calling ChapsService methods while they return a magic error value, but release the lock between the calls. Or, find a way to queue these calls until the private objects are loaded. Also need to check if this ChapsAdaptro-level global lock can be removed completely.
,
Sep 13 2017
As long as the D-Bus call blocks we'll still only have that call in flight. I don't think the lock removal really changed much here; I removed it because it was superfluous now that we were serializing all the calls on one thread. You might be able to do something in the ChapsProxy where the task gets re-posted to the D-Bus thread (probably with a delay to prevent flooding the daemon with requests) when it gets this magic error value. If I remember correctly the only use for the adaptor global lock is to prevent D-Bus calls from being handled until the async initialization gets to a certain point, so we should be able to replace that with a real asynchronous startup sequence. If we want to minimize the structural changes it can be replaced with a WaitableEvent that is manually reset. Either way the adaptor handles requests on one thread so the lock doesn't really matter that much from the D-Bus handling point of view.
,
Sep 13 2017
Re #9: You right, still serialized on the client side. So, as of now still need to propagate the error to the client and retry there. However, it seems to be possible to replace CallMethodAndBlockWithTimeout() in DBusProxyWrapper::CallMethodOnTaskRunner() with its async cousin CallMethodWithTimeout() and keep the external DBusProxyWrapper interface unchanged. Then we'll remove the per-process serialization. And then dropping ChapsProxy global lock starts making sense for the multi-thread-client scenario. As a long-term solution on ToT, I like this de-serialization approach more than re-posting tasks in ChapsProxy on magic error. However, for a quick-fix for old branches (b/63900068), it'd require picking too many changes from ToT. So, for that case will have to rely on magic errors.
,
Sep 14 2017
That sounds fine. I was scared of doing something like that because I was wondering if the locking mechanism meant there was an assumption that only one operation would be in progress at a time, and if we move to a more asynchronous model then we might have several things in flight at once. In that case it might be useful to audit the code and make sure that assumption isn't affecting anything, especially because we have several threads running around on the adaptor side.
,
Sep 14 2017
The short-term plan is doing the magic-error thing, anyways. BTW, the first shot at chapsd side changes applied cleanly from ToT to 61 (pre-testing WIP CLs 666549..66552), which is better than expected. The client side code in ChapsProxy had more changes, so is yet to be seen if will need separate CLs.
,
Sep 15 2017
,
Sep 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/72c8738a05ac4fab8f33188a27c18124397f9699 commit 72c8738a05ac4fab8f33188a27c18124397f9699 Author: Andrey Pronin <apronin@chromium.org> Date: Tue Sep 19 03:33:06 2017 chaps: return error code from Session::FlushModifiableObject Small refactoring for session in preparation to switching to non-blocking implementation for waiting for private objects: return CK_RV instead of bool from Session::FlushModifiableObject. BUG= chromium:708716 TEST=unit tests Change-Id: I5775de2d060a09599beb456009e83f5d2a97bafb Reviewed-on: https://chromium-review.googlesource.com/666549 Commit-Ready: Andrey Pronin <apronin@chromium.org> Tested-by: Andrey Pronin <apronin@chromium.org> Reviewed-by: Eric Caruso <ejcaruso@chromium.org> [modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/chaps_service_test.cc [modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/session_test.cc [modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/session_mock.h [modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/session.h [modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/session_impl.cc [modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/chaps_service.cc [modify] https://crrev.com/72c8738a05ac4fab8f33188a27c18124397f9699/chaps/session_impl.h
,
Sep 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/c037a75ad13541ada6e8b29cbb7770c0f7feadf0 commit c037a75ad13541ada6e8b29cbb7770c0f7feadf0 Author: Andrey Pronin <apronin@chromium.org> Date: Tue Sep 19 03:33:07 2017 chaps: object pool operations return Result In preparation to switching to non-blocking implementation for waiting for private objects: return Result instead of bool from ObjectPool methods that deal with objects. BUG= chromium:708716 TEST=unit tests Change-Id: I56b3a17ac6d79bfe67929c186f643295cf0c6411 Reviewed-on: https://chromium-review.googlesource.com/667746 Commit-Ready: Andrey Pronin <apronin@chromium.org> Tested-by: Andrey Pronin <apronin@chromium.org> Reviewed-by: Eric Caruso <ejcaruso@chromium.org> [modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/object_pool_impl.cc [modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/slot_manager_test.cc [modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/session_test.cc [modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/opencryptoki_importer.cc [modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/object_pool_impl.h [modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/object_pool_test.cc [modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/slot_manager_impl.cc [modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/object_pool_mock.h [modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/session_impl.cc [modify] https://crrev.com/c037a75ad13541ada6e8b29cbb7770c0f7feadf0/chaps/object_pool.h
,
Sep 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/906bf3a73fb2149379b0c1359142373a113cc749 commit 906bf3a73fb2149379b0c1359142373a113cc749 Author: Andrey Pronin <apronin@chromium.org> Date: Tue Sep 19 03:33:07 2017 chaps: implement IsPrivate in ObjectMock In preparation to switching to non-blocking implementation for waiting for private objects: implement IsPrivate in ObjectMock. BUG= chromium:708716 TEST=unit tests Change-Id: I5d1eb9272a3ff71beea3e8191de952c8329ae2bf Reviewed-on: https://chromium-review.googlesource.com/667747 Commit-Ready: Andrey Pronin <apronin@chromium.org> Tested-by: Andrey Pronin <apronin@chromium.org> Reviewed-by: Eric Caruso <ejcaruso@chromium.org> [modify] https://crrev.com/906bf3a73fb2149379b0c1359142373a113cc749/chaps/object_pool_test.cc [modify] https://crrev.com/906bf3a73fb2149379b0c1359142373a113cc749/chaps/object_mock.h
,
Sep 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/0d388e9de7c52ed7309fc552f1873a9f85249984 commit 0d388e9de7c52ed7309fc552f1873a9f85249984 Author: Andrey Pronin <apronin@chromium.org> Date: Wed Sep 20 02:43:36 2017 chaps: stop waiting for private objects in chapsd Due to chapsd global serialization of requests from clients, waiting on anything in one requests blocks all other requests. Instead of waiting for private objects loading when an operation on a private object is requested, return an error immediately and let the client-side retry as needed. BUG= chromium:708716 TEST=unit tests CQ-DEPEND=CL:669945 Change-Id: I93d9c8a9fb71aea837fcdf8f0e40a13290605fa9 Reviewed-on: https://chromium-review.googlesource.com/667748 Commit-Ready: Andrey Pronin <apronin@chromium.org> Tested-by: Andrey Pronin <apronin@chromium.org> Reviewed-by: Eric Caruso <ejcaruso@chromium.org> [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/object_pool_impl.cc [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/chaps_service_test.cc [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/session_impl.cc [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/chaps_utility.cc [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/session_test.cc [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/session_mock.h [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/object_pool_impl.h [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/object_pool_test.cc [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/session.h [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/object_pool_mock.h [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/chaps.h [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/object_pool.h [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/chaps_service.cc [modify] https://crrev.com/0d388e9de7c52ed7309fc552f1873a9f85249984/chaps/session_impl.h
,
Sep 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/3298a0f278cd666fc5fcf412e26148195b09416b commit 3298a0f278cd666fc5fcf412e26148195b09416b Author: Andrey Pronin <apronin@chromium.org> Date: Wed Sep 20 02:43:36 2017 chaps: retry requests to chapsd on getting would-block error On the client-side libchaps, if chapsd returns an error indicating that it would block waiting for private objects, retry the operation until a different outcome is recieved or a deadline is reached. BUG= chromium:708716 TEST=unit tests Change-Id: Ia9b83e30ccfa86052a0810d8c4230bf8b7e5f911 Reviewed-on: https://chromium-review.googlesource.com/669945 Commit-Ready: Andrey Pronin <apronin@chromium.org> Tested-by: Andrey Pronin <apronin@chromium.org> Reviewed-by: Andrey Pronin <apronin@chromium.org> [modify] https://crrev.com/3298a0f278cd666fc5fcf412e26148195b09416b/chaps/chaps_test.cc [modify] https://crrev.com/3298a0f278cd666fc5fcf412e26148195b09416b/chaps/chaps.cc [modify] https://crrev.com/3298a0f278cd666fc5fcf412e26148195b09416b/chaps/chaps_proxy_mock.h
,
Sep 20 2017
,
Sep 20 2017
Thanks for fixing this! By the way, I noticed that the DBus implementation this thing uses under the hood actually has support for the service to implement methods asynchronously. So rather than polling, you could probably use AddMethodHandler in place of AddSimpleMethodHandler. https://chromium.googlesource.com/aosp/platform/external/libbrillo/+/master/brillo/dbus/dbus_object.h
,
Sep 20 2017
That was a much bigger refactor than I wanted to do at the time I moved the D-Bus bindings, but I can work on that if we think it's worth having. We'd still need to export a synchronous API for the pkcs#11 shared library, in any event.
,
Sep 20 2017
Fair enough. Feel free to ignore me. I'm just a bystander as far as the implementation details go. (I went sniffing around because I was idly curious.)
,
Sep 20 2017
Re #20: yes, fair point, and will look into making chapsd more async in the future. However, short term, see comments #8-10. There is also a global lock that server-side chapsd holds for all its methods, and on the client side we also need to deserialize the calls (all threads of the same process will post chapsd requests to the same message loop and do sync calls there). Seems doable, but as ejcaruso@ said, is a bigger & riskier change. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by apronin@chromium.org
, Apr 5 2017