factory: toolkit should clean up better at shutdown/reboot |
||||
Issue descriptionForked from problems like this: https://issuetracker.google.com/76121632 I see that the factory init process starts up a bunch of scripts from /usr/local/factory/init/goofy.d/, and several of these try to mask out or replace system features via bind mounts. These mounts don't get cleaned up properly at shutdown/reboot, which can lead to unclean unmounts and either trigger storage corruption or else trigger rare kernel bugs. Folks are looking at valiant hacks like this: https://chromium-review.googlesource.com/c/chromiumos/platform/factory/+/1022050 which cause chromeos_shutdown to unmount recursively (and so cover things like unmounting /var/tmp, via 'umount -nR /var', etc.). This is a fragile solution though. Instead, the factory toolkit should explicitly tear down its own mounts. On a related note: I don't think the current factory init process ever actually calls '/usr/local/factory/bin/goofy_control stop', which I believe can leave some random python processes hanging around. These too should be cleaned up, so chromeos_shutdown doesn't have to reap them.
,
Apr 21 2018
,
Apr 24 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/factory/+/160a4f76e290ad6a783391d417a3ccdae00b8a13 commit 160a4f76e290ad6a783391d417a3ccdae00b8a13 Author: Brian Norris <briannorris@chromium.org> Date: Tue Apr 24 06:08:44 2018 factory: unwind bind mounts on shutdown These are preventing unmounting the stateful partition, which yields an unclean unmount. To use this, we need something like this in the post-stop script for /etc/init/factory.conf: exec /usr/local/factory/init/startup stop BUG=b/76121632, chromium:835438 TEST=reboot; see clean shutdown Change-Id: I234b234759bb6c9cfefa58b3975d5772999e0a40 Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1022981 Tested-by: Youcheng Syu <youcheng@chromium.org> Reviewed-by: Youcheng Syu <youcheng@chromium.org> Commit-Queue: Youcheng Syu <youcheng@chromium.org> [modify] https://crrev.com/160a4f76e290ad6a783391d417a3ccdae00b8a13/init/goofy.d/setup_chrome.sh [modify] https://crrev.com/160a4f76e290ad6a783391d417a3ccdae00b8a13/init/common.d/inhibit_jobs.sh [modify] https://crrev.com/160a4f76e290ad6a783391d417a3ccdae00b8a13/init/startup [modify] https://crrev.com/160a4f76e290ad6a783391d417a3ccdae00b8a13/init/goofy.d/jailbreak.sh [modify] https://crrev.com/160a4f76e290ad6a783391d417a3ccdae00b8a13/init/common.d/vt2_hacks.sh
,
Apr 24 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/d93ee4748e8a3d3807cb0c7b90eb0c05b133b6fc commit d93ee4748e8a3d3807cb0c7b90eb0c05b133b6fc Author: Brian Norris <briannorris@chromium.org> Date: Tue Apr 24 06:08:45 2018 init: factory: run teardown processes at shutdown We've implemented some 'stop' scripts which handle unmounting various bind mounts and performing other cleanup. Wait until ui is completely down, since the factory toolkit tends to be using directories we want to unmount. CQ-DEPEND=CL:1022981 BUG=b:76121632, chromium:835438 TEST=reboot; see clean shutdown Change-Id: I5da2095adf3d75f0c5f301b29379a8e871e2f46f Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1022814 Reviewed-by: Youcheng Syu <youcheng@chromium.org> Commit-Queue: Youcheng Syu <youcheng@chromium.org> [modify] https://crrev.com/d93ee4748e8a3d3807cb0c7b90eb0c05b133b6fc/init/upstart/test-init/factory.conf
,
Apr 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/factory/+/4de83db097487762384214c4807d35943d122809 commit 4de83db097487762384214c4807d35943d122809 Author: Brian Norris <briannorris@chromium.org> Date: Wed Apr 25 01:53:11 2018 goofy: don't reimplement upstart's shutdown procedures All of these services have appropriate 'stop on' conditions, which get triggered via the runlevel change that happens with the 'shutdown' command ('shutdown -{h,r} now'). This list is just a red herring for generating unnecessary cleanup. If daemons aren't stopping when they should, modifications should be made to their init files, not methods like this. This is inspired by stuff like b/76121632 and CL:1022050. BUG= chromium:835438 TEST=run factory toolkit reboot Change-Id: I2a296fde5da27e56e467c8705674c5e300f4cae6 Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1022019 Reviewed-by: Gwendal Grignou <gwendal@chromium.org> Reviewed-by: Hung-Te Lin <hungte@chromium.org> Reviewed-by: Yilun Lin <yllin@chromium.org> [modify] https://crrev.com/4de83db097487762384214c4807d35943d122809/py/goofy/test_environment.py
,
Apr 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/factory/+/fe918bdbd447cef00aa2fe311254c1e1f73f6543 commit fe918bdbd447cef00aa2fe311254c1e1f73f6543 Author: Brian Norris <briannorris@chromium.org> Date: Wed Apr 25 07:26:10 2018 factory: fixup style on output redirection Apparently this is the preferred style. It didn't block review for: 160a4f76e290 factory: unwind bind mounts on shutdown BUG= chromium:835438 TEST=none Change-Id: Ia5902def97a6731cd43ac4d3f9a604f32d4338c9 Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1026163 Reviewed-by: Youcheng Syu <youcheng@chromium.org> [modify] https://crrev.com/fe918bdbd447cef00aa2fe311254c1e1f73f6543/init/goofy.d/setup_chrome.sh [modify] https://crrev.com/fe918bdbd447cef00aa2fe311254c1e1f73f6543/init/common.d/inhibit_jobs.sh [modify] https://crrev.com/fe918bdbd447cef00aa2fe311254c1e1f73f6543/init/goofy.d/jailbreak.sh [modify] https://crrev.com/fe918bdbd447cef00aa2fe311254c1e1f73f6543/init/common.d/vt2_hacks.sh
,
Apr 25 2018
Using the above CLs on Scarlet, I got through about 3000 reboot cycles on 3 devices with no unclean unmounts. I believe the latest failures were because minijail was still in use, so we couldn't unmount the factory-hacked version of minijail. I think we do need to ensure that more services are stopped before /etc/init/factory.conf gets to its 'post-stop'. That could mean augmenting our 'stop on' condition, or else we could force more services to stop in the "init/startup stop" command. That could essentially be something like reverting [1], except putting that into the new 'stop' command instead. [1] https://chromium-review.googlesource.com/c/chromiumos/platform/factory/+/1022019
,
Aug 15
are we done with this change? I'm marking this as fixed.
,
Aug 29
Not all those changes landed on ToT. I'm not going to bother. |
||||
►
Sign in to add a comment |
||||
Comment 1 by briannorris@chromium.org
, Apr 20 2018