New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 892539 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Oct 10
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 844537



Sign in to add a comment

chaps: Inhibit power_manager shutdown while TokenInitThread is running

Project Member Reported by pmarko@chromium.org, Oct 5

Issue description

We're speculating that a trigger for bug 844537 is system shutdown being triggered early in chapsd initialization, specifically while TokenInitThread is running.

A workaround would be to inhibit system shutdown during the runtime of TokenInitThread.

We can do this at the moment by using lockfiles: see https://bugs.chromium.org/p/chromium/issues/detail?id=892539#c9 and https://bugs.chromium.org/p/chromium/issues/detail?id=892539#c12 .

A clean solution for later will be to:
- either change exit sequence in upstart scripts to ensure that chapsd's dependencies exit after chapsd, and that chapsd has enough time to shut down cleanly
- or investigate changing TokenInitThread to an multi-step async process which can be aborted
- or both.

[[OBSOLETE -- the original description talked about using RegisterSuspendDelay / the SuspendImminent signal as described here: 
https://chromium.googlesource.com/chromiumos/platform/system_api/+/master/dbus/power_manager/suspend.proto#15 .
But, as Dan mentioned in Comment #5, this would only prevent suspend, not shutdown ]]

 
Cc: derat@chromium.org
+Dan for power_manager expertise

Dan, if we'd like to inhibit suspend during the runtime of a (~2s running) PlatformThread, is it enough to:
- RegisterSuspendDelay with a timeout of e.g. 4s before the PlatformThread is started
- UnregisterSuspendDelay after the PlatformThread has ended
and not care about SuspendImminent, or do we also need to handle SuspendImminent?

In other words, if the sequence of events is:
- our client calls RegisterSuspendDelay (-> suspend delay id X)
- Suspend event comes in, power_manager sends out SuspendImminent but our client doesn't care
- our client calls UnregisterSuspendDelay with suspend delay id X

Will power_manager implicitly assume that our client is now ready for suspending due to the UnregisterSuspendDelay call, or do we need to also call HandleSuspendReadiness?

Thanks! 
Blocking: 844537
Cc: hendrich@chromium.org atwilson@chromium.org louiscollard@chromium.org
BTW if someone has time to try to implement this idea, feel free to take it :-)
Otherwise I'll try to have something working by Mon.
#1: The timeout was just intended as an escape hatch for buggy clients that fail to report readiness -- please don't depend on it. We try to minimize the time spent in the weird state where we would otherwise already have suspended but are waiting for other daemons, since it's easy to have races there. (Also, what would happen when token initialization takes longer than 4 seconds on some systems?)

Please instead watch for SuspendImminent, and make your helper thread signal back to your main thread so you can report readiness when initialization is complete.

But reading the initial description... is the problem suspend, or shutdown?
Hrm, the logs that were provided when the issue happened contain:
[0925/062859:INFO:daemon.cc(307)] Lid closed at startup
..
[0925/062901:INFO:state_controller.cc(1078)] Ready to perform lid-closed action (shutdown) 
so yes, the issue is shutdown (early after start, while chapsd is still doing some initialization).

I assumed that the SuspendImminent signal could also be used to prevent shutdown (as shutdown > suspend in some sense), but that's wrong then I guess? :-)


BTW, the plan was not to rely on the timeout - the chapsd initialization task (during which we'd like the system not to start shutting down) takes ~2s, that's why I suggested 4s as the "escape hatch" suspend delay timeout.
But understood that the correct way to do this, if the issue was suspend, would be to listend for SuspendImminent.

Now that suspend is not the issue, is there anything we could do to prevent shutdown during that init task?
BTW - unrelated, but slightly weird: the logs actually contain this:

[0925/062859:INFO:daemon.cc(307)] Lid closed at startup
(..)
[0925/062901:INFO:daemon.cc(1095)] Received updated external policy: ac_dim=7m ac_screen_off=7m30s ac_lock=0s ac_idle_warn=0s ac_idle=30m battery_dim=5m battery_screen_off=5m30s battery_lock=0s battery_idle_warn=0s battery_idle=6m30s ac_idle=suspend battery_idle=shutdown lid_closed=shutdown use_audio=1 use_video=1 presentation_factor=2.0 user_activity_factor=2.0 wait_for_initial_user_activity=0 force_nonzero_brightness_for_user_activity=1 (Prefs)
[0925/062901:INFO:state_controller.cc(916)] Updated settings: dim=5m screen_off=5m30s lock=0s idle_warn=0s idle=6m30s (shutdown) lid_closed=shutdown use_audio=1 use_video=1 wake_locks=
[0925/062901:INFO:state_controller.cc(1078)] Ready to perform lid-closed action (shutdown)
(..)
[0925/062904:INFO:daemon.cc(1095)] Received updated external policy: ac_dim=7m ac_screen_off=7m30s ac_lock=0s ac_idle_warn=0s ac_idle=30m battery_dim=5m battery_screen_off=5m30s battery_lock=0s battery_idle_warn=0s battery_idle=6m30s ac_idle=suspend battery_idle=shutdown lid_closed=no-op use_audio=1 use_video=1 presentation_factor=2.0 user_activity_factor=2.0 wait_for_initial_user_activity=0 force_nonzero_brightness_for_user_activity=1 (Prefs)
[0925/062904:INFO:state_controller.cc(916)] Updated settings: dim=5m screen_off=5m30s lock=0s idle_warn=0s idle=6m30s (shutdown) lid_closed=no-op use_audio=1 use_video=1 wake_locks=
[0925/062904:INFO:state_controller.cc(1078)] Ready to perform lid-closed action (no-op)

So the policy is first lid_closed=shutdown and then seems to change to no-op, after the shutdown has already started. But that's probably usual behavior due to this code:
https://cs.chromium.org/chromium/src/chromeos/dbus/power_policy_controller.cc?rcl=a048280752fe6327990e38c2c957a1a38edf593f&l=408
so I don't think that's an issue.
Summary: chaps: Inhibit power_manager shutdown while TokenInitThread is running (was: chaps: Inhibit power_manager suspend while TokenInitThread is running)
Hrm, if we still want to do this, it seems that we could inhibit shutdown by creating a lockfile in /run/lock/power_override[1] before starting TokenInitThread, and deleting it when TokenInitThread is done (or after a timeout, whichever happens first).
This would be slightly hacky, as the lockfile mechanism seems to be intended for things such as updating firmware.

IIUC the "chaps" user should even have permissions to write a lockfile because the directory is created with |1777| in platform2/init/chromeos_startup.

The relevant handling in powerd is in platform2/power_manager/powerd/daemon.cc in |SuspendAndShutdownAreBlocked|.

[1] https://chromium.googlesource.com/chromiumos/platform2/+show/master/power_manager/docs/faq.md#96
Yeah, a lockfile is (sadly) currently the way to go if you need to block shutdown.
#7: Yes, your analysis is correct. :-)
Cc: pmarko@chromium.org
Owner: hendrich@chromium.org
Thanks Dan!

Let's go with the lockfile as a workaround for now then.
Example implementations are in:
https://chromium.googlesource.com/chromiumos/third_party/flashrom/+/master/power.c
and
https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/modemfwd/modem_helper.cc

One thing we should be aware of is what happens in case:
- chapsd writes a lock file
- Lid is closed, powerd defers shutdown for 10s because of the presence of the lockfile
- chapsd removes its lock file
- Lid is opened

Will powerd cancel the shutdown, or will the system shut down after 10s anyway?
IIUC from power_manager/powerd/daemon.cc (handling of |retry_shutdown_for_lockfile_timer_|), the system will shut down after 10s anyway. This is not a big issue IMO, we should just be aware of this behavior.
Description: Show this description
#12: Yes, the system will shut down in that case, IIRC. The user experience around /run/lock/power_override is pretty poor; it's really just a hack to avoid bricking systems. :-(
Labels: Merge-Request-70
@geohsu: Hey, this CL (https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1267037) is still stuck in CQ, but will hopefully land soon. It is part of one of our fixes regarding the disappearing device certificate (see crbug.com/844537) and we absolutely want to merge it to M70. I think our PM (marcuskoehler@) already discussed this with you. Thanks!
Project Member

Comment 17 by sheriffbot@chromium.org, Oct 10

Labels: -Merge-Request-70 Merge-Review-70 Hotlist-Merge-Review
This bug requires manual review: We are only 5 days from stable.
Please contact the milestone owner if you have questions.
Owners: benmason@(Android), kariahda@(iOS), geohsu@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 18 by bugdroid1@chromium.org, Oct 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/dbb9a2c61d3ea79f27eb061ef7309f615aaca329

commit dbb9a2c61d3ea79f27eb061ef7309f615aaca329
Author: Alexander Hendrich <hendrich@chromium.org>
Date: Wed Oct 10 16:45:01 2018

chaps: Block shutdown while TokenInitThread runs

Block powerd from emitting a shutdown signal while the TokenInitThread
is running (~1-2secs) to prevent an unclean chapsd shutdown. Powerd
shutdown signal is postponed by creating a lock file with chapsd's PID
in /run/lock/power_override/. This lock file will be removed again once
TokenInitThread finishes.

BUG= chromium:892539 
TEST=manual: Put some cert into system-wide slot, restart chapsd, close
     lid (=shutdown) while TokenInitThread is active (~1-2s). Logs
     should verify TokenInitThread completion (contain "Master key is
     ready for token ..." & "Postponing shutdown for lockfile(s):
     /run/lock/power_override/chapsd_token_init_slot_0.lock").
     Additional Thread.sleep() in TokenInitThread might help
     with manual testing since it only runs for ~1-2s.

Change-Id: I3e82581a2592fefb0abcf8b9684acb61a909b4f6
Reviewed-on: https://chromium-review.googlesource.com/1267037
Commit-Ready: Alexander Hendrich <hendrich@chromium.org>
Tested-by: Alexander Hendrich <hendrich@chromium.org>
Reviewed-by: Andrey Pronin <apronin@chromium.org>

[modify] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/slot_manager_test.cc
[add] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/system_shutdown_blocker.h
[modify] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/slot_manager_impl.cc
[modify] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/slot_manager_impl.h
[add] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/system_shutdown_blocker.cc
[modify] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/chapsd.cc
[modify] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/BUILD.gn

Cc: geohsu@chromium.org
+geoshu@
Friendly ping on Comment #15 / Comment #16.
This is one of the parts we hope will help with bug 844537 .
Labels: -Merge-Review-70 Merge-Approved-70
Project Member

Comment 21 by bugdroid1@chromium.org, Oct 10

Labels: merge-merged-release-R70-11021.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/e885410ed8183b5fe9f906158f3a0e56666a40ce

commit e885410ed8183b5fe9f906158f3a0e56666a40ce
Author: Alexander Hendrich <hendrich@chromium.org>
Date: Wed Oct 10 21:05:48 2018

[Merge M70] chaps: Block shutdown while TokenInitThread runs

Block powerd from emitting a shutdown signal while the TokenInitThread
is running (~1-2secs) to prevent an unclean chapsd shutdown. Powerd
shutdown signal is postponed by creating a lock file with chapsd's PID
in /run/lock/power_override/. This lock file will be removed again once
TokenInitThread finishes.

BUG= chromium:892539 
TEST=manual: Put some cert into system-wide slot, restart chapsd, close
     lid (=shutdown) while TokenInitThread is active (~1-2s). Logs
     should verify TokenInitThread completion (contain "Master key is
     ready for token ..." & "Postponing shutdown for lockfile(s):
     /run/lock/power_override/chapsd_token_init_slot_0.lock").
     Additional Thread.sleep() in TokenInitThread might help
     with manual testing since it only runs for ~1-2s.

Change-Id: I3e82581a2592fefb0abcf8b9684acb61a909b4f6
Reviewed-on: https://chromium-review.googlesource.com/1267037
Commit-Ready: Alexander Hendrich <hendrich@chromium.org>
Tested-by: Alexander Hendrich <hendrich@chromium.org>
Reviewed-by: Andrey Pronin <apronin@chromium.org>
(cherry picked from commit dbb9a2c61d3ea79f27eb061ef7309f615aaca329)
Reviewed-on: https://chromium-review.googlesource.com/c/1274545
Commit-Queue: Alexander Hendrich <hendrich@chromium.org>
Reviewed-by: Alexander Hendrich <hendrich@chromium.org>
Reviewed-by: Maksim Ivanov <emaxx@chromium.org>

[modify] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/slot_manager_test.cc
[add] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/system_shutdown_blocker.h
[modify] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/slot_manager_impl.cc
[modify] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/slot_manager_impl.h
[modify] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/chaps.gyp
[add] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/system_shutdown_blocker.cc
[modify] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/chapsd.cc

Status: Fixed (was: Started)
Project Member

Comment 23 by sheriffbot@chromium.org, Oct 15

Cc: geo...@google.com
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Hotlist-Merge-Review -Merge-Approved-70
Could you please provide steps, if this needs manual verification.

Thanks.!
This will be very hard to manually verify without adding artificial sleeps in the code. We weren't able to reproduce the initial problem in our manual tests without the additional timeouts. In theory this should happen if you close the lid right (=shutdown) right after booting, but it's very difficult to get the timing right. So I guess we'll just have to wait and 'verify' it by hopefully not getting any more bug reports on this issue.
Status: Verified (was: Fixed)
Marking as Verified based on the blocking bug's verification (https://bugs.chromium.org/p/chromium/issues/detail?id=844537#c123) and no customer reports of issues in the meantime (https://bugs.chromium.org/p/chromium/issues/detail?id=844537#c134). Please feel free to re-open if needed.

Sign in to add a comment