chaps: Inhibit power_manager shutdown while TokenInitThread is running |
|||||||||||||||
Issue descriptionWe're speculating that a trigger for bug 844537 is system shutdown being triggered early in chapsd initialization, specifically while TokenInitThread is running. A workaround would be to inhibit system shutdown during the runtime of TokenInitThread. We can do this at the moment by using lockfiles: see https://bugs.chromium.org/p/chromium/issues/detail?id=892539#c9 and https://bugs.chromium.org/p/chromium/issues/detail?id=892539#c12 . A clean solution for later will be to: - either change exit sequence in upstart scripts to ensure that chapsd's dependencies exit after chapsd, and that chapsd has enough time to shut down cleanly - or investigate changing TokenInitThread to an multi-step async process which can be aborted - or both. [[OBSOLETE -- the original description talked about using RegisterSuspendDelay / the SuspendImminent signal as described here: https://chromium.googlesource.com/chromiumos/platform/system_api/+/master/dbus/power_manager/suspend.proto#15 . But, as Dan mentioned in Comment #5, this would only prevent suspend, not shutdown ]]
,
Oct 5
,
Oct 5
,
Oct 5
BTW if someone has time to try to implement this idea, feel free to take it :-) Otherwise I'll try to have something working by Mon.
,
Oct 5
#1: The timeout was just intended as an escape hatch for buggy clients that fail to report readiness -- please don't depend on it. We try to minimize the time spent in the weird state where we would otherwise already have suspended but are waiting for other daemons, since it's easy to have races there. (Also, what would happen when token initialization takes longer than 4 seconds on some systems?) Please instead watch for SuspendImminent, and make your helper thread signal back to your main thread so you can report readiness when initialization is complete. But reading the initial description... is the problem suspend, or shutdown?
,
Oct 5
Hrm, the logs that were provided when the issue happened contain: [0925/062859:INFO:daemon.cc(307)] Lid closed at startup .. [0925/062901:INFO:state_controller.cc(1078)] Ready to perform lid-closed action (shutdown) so yes, the issue is shutdown (early after start, while chapsd is still doing some initialization). I assumed that the SuspendImminent signal could also be used to prevent shutdown (as shutdown > suspend in some sense), but that's wrong then I guess? :-) BTW, the plan was not to rely on the timeout - the chapsd initialization task (during which we'd like the system not to start shutting down) takes ~2s, that's why I suggested 4s as the "escape hatch" suspend delay timeout. But understood that the correct way to do this, if the issue was suspend, would be to listend for SuspendImminent. Now that suspend is not the issue, is there anything we could do to prevent shutdown during that init task?
,
Oct 5
BTW - unrelated, but slightly weird: the logs actually contain this: [0925/062859:INFO:daemon.cc(307)] Lid closed at startup (..) [0925/062901:INFO:daemon.cc(1095)] Received updated external policy: ac_dim=7m ac_screen_off=7m30s ac_lock=0s ac_idle_warn=0s ac_idle=30m battery_dim=5m battery_screen_off=5m30s battery_lock=0s battery_idle_warn=0s battery_idle=6m30s ac_idle=suspend battery_idle=shutdown lid_closed=shutdown use_audio=1 use_video=1 presentation_factor=2.0 user_activity_factor=2.0 wait_for_initial_user_activity=0 force_nonzero_brightness_for_user_activity=1 (Prefs) [0925/062901:INFO:state_controller.cc(916)] Updated settings: dim=5m screen_off=5m30s lock=0s idle_warn=0s idle=6m30s (shutdown) lid_closed=shutdown use_audio=1 use_video=1 wake_locks= [0925/062901:INFO:state_controller.cc(1078)] Ready to perform lid-closed action (shutdown) (..) [0925/062904:INFO:daemon.cc(1095)] Received updated external policy: ac_dim=7m ac_screen_off=7m30s ac_lock=0s ac_idle_warn=0s ac_idle=30m battery_dim=5m battery_screen_off=5m30s battery_lock=0s battery_idle_warn=0s battery_idle=6m30s ac_idle=suspend battery_idle=shutdown lid_closed=no-op use_audio=1 use_video=1 presentation_factor=2.0 user_activity_factor=2.0 wait_for_initial_user_activity=0 force_nonzero_brightness_for_user_activity=1 (Prefs) [0925/062904:INFO:state_controller.cc(916)] Updated settings: dim=5m screen_off=5m30s lock=0s idle_warn=0s idle=6m30s (shutdown) lid_closed=no-op use_audio=1 use_video=1 wake_locks= [0925/062904:INFO:state_controller.cc(1078)] Ready to perform lid-closed action (no-op) So the policy is first lid_closed=shutdown and then seems to change to no-op, after the shutdown has already started. But that's probably usual behavior due to this code: https://cs.chromium.org/chromium/src/chromeos/dbus/power_policy_controller.cc?rcl=a048280752fe6327990e38c2c957a1a38edf593f&l=408 so I don't think that's an issue.
,
Oct 5
,
Oct 5
Hrm, if we still want to do this, it seems that we could inhibit shutdown by creating a lockfile in /run/lock/power_override[1] before starting TokenInitThread, and deleting it when TokenInitThread is done (or after a timeout, whichever happens first). This would be slightly hacky, as the lockfile mechanism seems to be intended for things such as updating firmware. IIUC the "chaps" user should even have permissions to write a lockfile because the directory is created with |1777| in platform2/init/chromeos_startup. The relevant handling in powerd is in platform2/power_manager/powerd/daemon.cc in |SuspendAndShutdownAreBlocked|. [1] https://chromium.googlesource.com/chromiumos/platform2/+show/master/power_manager/docs/faq.md#96
,
Oct 5
Yeah, a lockfile is (sadly) currently the way to go if you need to block shutdown.
,
Oct 5
#7: Yes, your analysis is correct. :-)
,
Oct 6
Thanks Dan! Let's go with the lockfile as a workaround for now then. Example implementations are in: https://chromium.googlesource.com/chromiumos/third_party/flashrom/+/master/power.c and https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/modemfwd/modem_helper.cc One thing we should be aware of is what happens in case: - chapsd writes a lock file - Lid is closed, powerd defers shutdown for 10s because of the presence of the lockfile - chapsd removes its lock file - Lid is opened Will powerd cancel the shutdown, or will the system shut down after 10s anyway? IIUC from power_manager/powerd/daemon.cc (handling of |retry_shutdown_for_lockfile_timer_|), the system will shut down after 10s anyway. This is not a big issue IMO, we should just be aware of this behavior.
,
Oct 6
,
Oct 6
https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1267037 is out for review, thanks all!
,
Oct 6
#12: Yes, the system will shut down in that case, IIRC. The user experience around /run/lock/power_override is pretty poor; it's really just a hack to avoid bricking systems. :-(
,
Oct 10
@geohsu: Hey, this CL (https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1267037) is still stuck in CQ, but will hopefully land soon. It is part of one of our fixes regarding the disappearing device certificate (see crbug.com/844537) and we absolutely want to merge it to M70. I think our PM (marcuskoehler@) already discussed this with you. Thanks!
,
Oct 10
This bug requires manual review: We are only 5 days from stable. Please contact the milestone owner if you have questions. Owners: benmason@(Android), kariahda@(iOS), geohsu@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 10
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/dbb9a2c61d3ea79f27eb061ef7309f615aaca329 commit dbb9a2c61d3ea79f27eb061ef7309f615aaca329 Author: Alexander Hendrich <hendrich@chromium.org> Date: Wed Oct 10 16:45:01 2018 chaps: Block shutdown while TokenInitThread runs Block powerd from emitting a shutdown signal while the TokenInitThread is running (~1-2secs) to prevent an unclean chapsd shutdown. Powerd shutdown signal is postponed by creating a lock file with chapsd's PID in /run/lock/power_override/. This lock file will be removed again once TokenInitThread finishes. BUG= chromium:892539 TEST=manual: Put some cert into system-wide slot, restart chapsd, close lid (=shutdown) while TokenInitThread is active (~1-2s). Logs should verify TokenInitThread completion (contain "Master key is ready for token ..." & "Postponing shutdown for lockfile(s): /run/lock/power_override/chapsd_token_init_slot_0.lock"). Additional Thread.sleep() in TokenInitThread might help with manual testing since it only runs for ~1-2s. Change-Id: I3e82581a2592fefb0abcf8b9684acb61a909b4f6 Reviewed-on: https://chromium-review.googlesource.com/1267037 Commit-Ready: Alexander Hendrich <hendrich@chromium.org> Tested-by: Alexander Hendrich <hendrich@chromium.org> Reviewed-by: Andrey Pronin <apronin@chromium.org> [modify] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/slot_manager_test.cc [add] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/system_shutdown_blocker.h [modify] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/slot_manager_impl.cc [modify] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/slot_manager_impl.h [add] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/system_shutdown_blocker.cc [modify] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/chapsd.cc [modify] https://crrev.com/dbb9a2c61d3ea79f27eb061ef7309f615aaca329/chaps/BUILD.gn
,
Oct 10
+geoshu@ Friendly ping on Comment #15 / Comment #16. This is one of the parts we hope will help with bug 844537 .
,
Oct 10
,
Oct 10
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/e885410ed8183b5fe9f906158f3a0e56666a40ce commit e885410ed8183b5fe9f906158f3a0e56666a40ce Author: Alexander Hendrich <hendrich@chromium.org> Date: Wed Oct 10 21:05:48 2018 [Merge M70] chaps: Block shutdown while TokenInitThread runs Block powerd from emitting a shutdown signal while the TokenInitThread is running (~1-2secs) to prevent an unclean chapsd shutdown. Powerd shutdown signal is postponed by creating a lock file with chapsd's PID in /run/lock/power_override/. This lock file will be removed again once TokenInitThread finishes. BUG= chromium:892539 TEST=manual: Put some cert into system-wide slot, restart chapsd, close lid (=shutdown) while TokenInitThread is active (~1-2s). Logs should verify TokenInitThread completion (contain "Master key is ready for token ..." & "Postponing shutdown for lockfile(s): /run/lock/power_override/chapsd_token_init_slot_0.lock"). Additional Thread.sleep() in TokenInitThread might help with manual testing since it only runs for ~1-2s. Change-Id: I3e82581a2592fefb0abcf8b9684acb61a909b4f6 Reviewed-on: https://chromium-review.googlesource.com/1267037 Commit-Ready: Alexander Hendrich <hendrich@chromium.org> Tested-by: Alexander Hendrich <hendrich@chromium.org> Reviewed-by: Andrey Pronin <apronin@chromium.org> (cherry picked from commit dbb9a2c61d3ea79f27eb061ef7309f615aaca329) Reviewed-on: https://chromium-review.googlesource.com/c/1274545 Commit-Queue: Alexander Hendrich <hendrich@chromium.org> Reviewed-by: Alexander Hendrich <hendrich@chromium.org> Reviewed-by: Maksim Ivanov <emaxx@chromium.org> [modify] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/slot_manager_test.cc [add] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/system_shutdown_blocker.h [modify] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/slot_manager_impl.cc [modify] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/slot_manager_impl.h [modify] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/chaps.gyp [add] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/system_shutdown_blocker.cc [modify] https://crrev.com/e885410ed8183b5fe9f906158f3a0e56666a40ce/chaps/chapsd.cc
,
Oct 10
,
Oct 15
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 15
,
Oct 18
Could you please provide steps, if this needs manual verification. Thanks.!
,
Oct 22
This will be very hard to manually verify without adding artificial sleeps in the code. We weren't able to reproduce the initial problem in our manual tests without the additional timeouts. In theory this should happen if you close the lid right (=shutdown) right after booting, but it's very difficult to get the timing right. So I guess we'll just have to wait and 'verify' it by hopefully not getting any more bug reports on this issue.
,
Jan 11
Marking as Verified based on the blocking bug's verification (https://bugs.chromium.org/p/chromium/issues/detail?id=844537#c123) and no customer reports of issues in the meantime (https://bugs.chromium.org/p/chromium/issues/detail?id=844537#c134). Please feel free to re-open if needed. |
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by pmarko@chromium.org
, Oct 5