New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 892149 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: Oct 10
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 844537



Sign in to add a comment

chaps: System slot gets wiped when trunksd is unavailable

Project Member Reported by emaxx@chromium.org, Oct 4

Issue description

Chrome Version: ToT
OS: Chrome OS

What steps will reproduce the problem?
(1) Put some data into the system slot (e.g., a device-wide certificate).
(2) Start chapsd and, quickly after that, ask the system to shut down. For example, via:
  $ stop chapsd
  $ start chapsd; reboot

What is the expected result?
The system boots with the system slot intact.

What happens instead?
(if the timing of operations in step (2) was bad)
System slot gets empty after the reboot. The implication is that device-wide certificate disappear.


This is filed to track the issue discovered at crbug.com/844537#c93.

Copying the key observations from there:

  >  [trunks_dbus_proxy.cc(96)] Error TrunksDBusProxy cannot connect to trunksd.
  >  <...>
  >  [slot_manager_impl.cc(203)] Authentication failed for token at /var/lib/chaps, reinitializing token.
  >
  > So what's happening here is that trunksd got shut down *before* the token initialization in chapsd gets done. In that case chapsd decides that the token's database is in a bad state and intentionally wipes the database out.
  >
  > This sounds like a concrete bug that needs to be fixed: transient errors, like unavailability of the trunksd daemon, should definitely not result in database being removed.
 
Blocking: 844537
Status: Started (was: Assigned)
So, roughly speaking, this happens in either or two scenarios:

* trunksd fails to start at all since the system booted - due to some bug in trunksd or some other unexpected situation.
In that case chapsd will wait for 30 seconds and after that wipe the system slot out.

* trunksd was responsive in the beginning of the chapsd initialization, but was killed later before chapsd completed authentication of the system slot data.

(The second case is more relevant to bug 844537 than the first one.)
Is this going to be able to make M70? We're nearing out stable checkpoints. Thanks.
CL published for review: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1264759 . It attempts to address the second scenario from comment 2 (presumably, scenario 1 is unrelated to the issue reported by the users).

Re comment 3: We'd hope to make it into M70 since we hope it should help the bug 844537 that is seriously affecting some customers.
How are we looking? The CL in #4 is showing ready to submit.
The CL is ready and passed through some extensive local testing, but couldn't land so far due to Chrome OS Commit Queue flakes.
Labels: ReleaseBlock-Beta
This is a urgent/important fix which needs to make in as soon as possible. If there are any risks please escalate and lets figure out what resources/meetings/discussions we need to make this happen.
I've asked current Chrome OS sheriffs and they're asking to wait a few hours for the completion of the next CQ build - they suppressed one of failing builders, so unless something else fails the next one is expected to complete normally.

royans@/geohsu@: I was asked - what are our deadline for getting into M-70?
Project Member

Comment 10 by bugdroid1@chromium.org, Oct 9

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/f404372e604134ef3e10347bc933758353b5401c

commit f404372e604134ef3e10347bc933758353b5401c
Author: Maksim Ivanov <emaxx@chromium.org>
Date: Tue Oct 09 23:00:37 2018

chaps: Avoid wiping slots when TPM daemon dies

Stop treating "TPM daemon unavailable" errors as a signal to
treat the slot's database corrupted. This should avoid the
problem of wiping out legitimate slot contents because trunksd
died during the slot's initialization in chapsd.

BUG= chromium:892149 
TEST=manual: Put some cert into system-wide slot, restart
     chapsd and somewhere in the middle of its restart kill
     trunksd

Change-Id: I73955ea23e1808627d8ee10e9ea5d98ec9cc66ca
Reviewed-on: https://chromium-review.googlesource.com/c/1264759
Tested-by: Maksim Ivanov <emaxx@chromium.org>
Reviewed-by: Andrey Pronin <apronin@chromium.org>
Reviewed-by: Pavol Marko <pmarko@chromium.org>
Commit-Queue: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>

[modify] https://crrev.com/f404372e604134ef3e10347bc933758353b5401c/chaps/tpm2_utility_impl.cc
[modify] https://crrev.com/f404372e604134ef3e10347bc933758353b5401c/chaps/tpm2_utility_impl.h

Labels: Merge-Request-70
Project Member

Comment 12 by sheriffbot@chromium.org, Oct 9

Labels: -Merge-Request-70 Merge-Review-70 Hotlist-Merge-Review
This bug requires manual review: We are only 6 days from stable.
Please contact the milestone owner if you have questions.
Owners: benmason@(Android), kariahda@(iOS), geohsu@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Review-70 Merge-Approved-70
Project Member

Comment 14 by bugdroid1@chromium.org, Oct 10

Labels: merge-merged-release-R70-11021.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/3f53d233fcbc42d616121a56f1124c15645383cd

commit 3f53d233fcbc42d616121a56f1124c15645383cd
Author: Maksim Ivanov <emaxx@chromium.org>
Date: Wed Oct 10 20:35:36 2018

[M70] chaps: Avoid wiping slots when TPM daemon dies

Stop treating "TPM daemon unavailable" errors as a signal to
treat the slot's database corrupted. This should avoid the
problem of wiping out legitimate slot contents because trunksd
died during the slot's initialization in chapsd.

BUG= chromium:892149 
TEST=manual: Put some cert into system-wide slot, restart
     chapsd and somewhere in the middle of its restart kill
     trunksd

Change-Id: I73955ea23e1808627d8ee10e9ea5d98ec9cc66ca
Reviewed-on: https://chromium-review.googlesource.com/c/1264759
Tested-by: Maksim Ivanov <emaxx@chromium.org>
Reviewed-by: Andrey Pronin <apronin@chromium.org>
Reviewed-by: Pavol Marko <pmarko@chromium.org>
Commit-Queue: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
(cherry picked from commit f404372e604134ef3e10347bc933758353b5401c)
Reviewed-on: https://chromium-review.googlesource.com/c/1274131
Reviewed-by: Maksim Ivanov <emaxx@chromium.org>
Commit-Queue: Maksim Ivanov <emaxx@chromium.org>

[modify] https://crrev.com/3f53d233fcbc42d616121a56f1124c15645383cd/chaps/tpm2_utility_impl.cc
[modify] https://crrev.com/3f53d233fcbc42d616121a56f1124c15645383cd/chaps/tpm2_utility_impl.h

Status: Fixed (was: Started)
Project Member

Comment 16 by sheriffbot@chromium.org, Oct 15

Cc: geohsu@chromium.org
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Approved-70
Status: Verified (was: Fixed)
Marking as Verified based on the blocking bug's verification (https://bugs.chromium.org/p/chromium/issues/detail?id=844537#c123) and no customer reports of issues in the meantime (https://bugs.chromium.org/p/chromium/issues/detail?id=844537#c134). Please feel free to re-open if needed.

Sign in to add a comment