chaps: System slot gets wiped when trunksd is unavailable |
|||||||||||
Issue descriptionChrome Version: ToT OS: Chrome OS What steps will reproduce the problem? (1) Put some data into the system slot (e.g., a device-wide certificate). (2) Start chapsd and, quickly after that, ask the system to shut down. For example, via: $ stop chapsd $ start chapsd; reboot What is the expected result? The system boots with the system slot intact. What happens instead? (if the timing of operations in step (2) was bad) System slot gets empty after the reboot. The implication is that device-wide certificate disappear. This is filed to track the issue discovered at crbug.com/844537#c93. Copying the key observations from there: > [trunks_dbus_proxy.cc(96)] Error TrunksDBusProxy cannot connect to trunksd. > <...> > [slot_manager_impl.cc(203)] Authentication failed for token at /var/lib/chaps, reinitializing token. > > So what's happening here is that trunksd got shut down *before* the token initialization in chapsd gets done. In that case chapsd decides that the token's database is in a bad state and intentionally wipes the database out. > > This sounds like a concrete bug that needs to be fixed: transient errors, like unavailability of the trunksd daemon, should definitely not result in database being removed.
,
Oct 4
So, roughly speaking, this happens in either or two scenarios: * trunksd fails to start at all since the system booted - due to some bug in trunksd or some other unexpected situation. In that case chapsd will wait for 30 seconds and after that wipe the system slot out. * trunksd was responsive in the beginning of the chapsd initialization, but was killed later before chapsd completed authentication of the system slot data. (The second case is more relevant to bug 844537 than the first one.)
,
Oct 4
Is this going to be able to make M70? We're nearing out stable checkpoints. Thanks.
,
Oct 5
CL published for review: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1264759 . It attempts to address the second scenario from comment 2 (presumably, scenario 1 is unrelated to the issue reported by the users). Re comment 3: We'd hope to make it into M70 since we hope it should help the bug 844537 that is seriously affecting some customers.
,
Oct 9
How are we looking? The CL in #4 is showing ready to submit.
,
Oct 9
The CL is ready and passed through some extensive local testing, but couldn't land so far due to Chrome OS Commit Queue flakes.
,
Oct 9
,
Oct 9
This is a urgent/important fix which needs to make in as soon as possible. If there are any risks please escalate and lets figure out what resources/meetings/discussions we need to make this happen.
,
Oct 9
I've asked current Chrome OS sheriffs and they're asking to wait a few hours for the completion of the next CQ build - they suppressed one of failing builders, so unless something else fails the next one is expected to complete normally. royans@/geohsu@: I was asked - what are our deadline for getting into M-70?
,
Oct 9
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/f404372e604134ef3e10347bc933758353b5401c commit f404372e604134ef3e10347bc933758353b5401c Author: Maksim Ivanov <emaxx@chromium.org> Date: Tue Oct 09 23:00:37 2018 chaps: Avoid wiping slots when TPM daemon dies Stop treating "TPM daemon unavailable" errors as a signal to treat the slot's database corrupted. This should avoid the problem of wiping out legitimate slot contents because trunksd died during the slot's initialization in chapsd. BUG= chromium:892149 TEST=manual: Put some cert into system-wide slot, restart chapsd and somewhere in the middle of its restart kill trunksd Change-Id: I73955ea23e1808627d8ee10e9ea5d98ec9cc66ca Reviewed-on: https://chromium-review.googlesource.com/c/1264759 Tested-by: Maksim Ivanov <emaxx@chromium.org> Reviewed-by: Andrey Pronin <apronin@chromium.org> Reviewed-by: Pavol Marko <pmarko@chromium.org> Commit-Queue: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> [modify] https://crrev.com/f404372e604134ef3e10347bc933758353b5401c/chaps/tpm2_utility_impl.cc [modify] https://crrev.com/f404372e604134ef3e10347bc933758353b5401c/chaps/tpm2_utility_impl.h
,
Oct 9
,
Oct 9
This bug requires manual review: We are only 6 days from stable. Please contact the milestone owner if you have questions. Owners: benmason@(Android), kariahda@(iOS), geohsu@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 9
,
Oct 10
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/3f53d233fcbc42d616121a56f1124c15645383cd commit 3f53d233fcbc42d616121a56f1124c15645383cd Author: Maksim Ivanov <emaxx@chromium.org> Date: Wed Oct 10 20:35:36 2018 [M70] chaps: Avoid wiping slots when TPM daemon dies Stop treating "TPM daemon unavailable" errors as a signal to treat the slot's database corrupted. This should avoid the problem of wiping out legitimate slot contents because trunksd died during the slot's initialization in chapsd. BUG= chromium:892149 TEST=manual: Put some cert into system-wide slot, restart chapsd and somewhere in the middle of its restart kill trunksd Change-Id: I73955ea23e1808627d8ee10e9ea5d98ec9cc66ca Reviewed-on: https://chromium-review.googlesource.com/c/1264759 Tested-by: Maksim Ivanov <emaxx@chromium.org> Reviewed-by: Andrey Pronin <apronin@chromium.org> Reviewed-by: Pavol Marko <pmarko@chromium.org> Commit-Queue: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> (cherry picked from commit f404372e604134ef3e10347bc933758353b5401c) Reviewed-on: https://chromium-review.googlesource.com/c/1274131 Reviewed-by: Maksim Ivanov <emaxx@chromium.org> Commit-Queue: Maksim Ivanov <emaxx@chromium.org> [modify] https://crrev.com/3f53d233fcbc42d616121a56f1124c15645383cd/chaps/tpm2_utility_impl.cc [modify] https://crrev.com/3f53d233fcbc42d616121a56f1124c15645383cd/chaps/tpm2_utility_impl.h
,
Oct 10
,
Oct 15
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 15
,
Jan 11
Marking as Verified based on the blocking bug's verification (https://bugs.chromium.org/p/chromium/issues/detail?id=844537#c123) and no customer reports of issues in the meantime (https://bugs.chromium.org/p/chromium/issues/detail?id=844537#c134). Please feel free to re-open if needed. |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by emaxx@chromium.org
, Oct 4