New issue
Advanced search Search tips

Issue 835438 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 15
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

factory: toolkit should clean up better at shutdown/reboot

Project Member Reported by briannorris@chromium.org, Apr 20 2018

Issue description

Forked from problems like this: https://issuetracker.google.com/76121632

I see that the factory init process starts up a bunch of scripts from /usr/local/factory/init/goofy.d/, and several of these try to mask out 
or replace system features via bind mounts. These mounts don't get cleaned up properly at shutdown/reboot, which can lead to unclean unmounts and either trigger storage corruption or else trigger rare kernel bugs.

Folks are looking at valiant hacks like this:

https://chromium-review.googlesource.com/c/chromiumos/platform/factory/+/1022050

which cause chromeos_shutdown to unmount recursively (and so cover things like unmounting /var/tmp, via 'umount -nR /var', etc.). This is a fragile solution though.

Instead, the factory toolkit should explicitly tear down its own mounts.

On a related note: I don't think the current factory init process ever actually calls '/usr/local/factory/bin/goofy_control stop', which I believe can leave some random python processes hanging around. These too should be cleaned up, so chromeos_shutdown doesn't have to reap them.
 
Also, some of the existing factory shutdown logic looks like voodoo to me. I've posted this to remove some voodoo, but I'm not super familiar with the factory tests yet:

https://chromium-review.googlesource.com/c/chromiumos/platform/factory/+/1022019
Cc: marcochen@chromium.org
Project Member

Comment 3 by bugdroid1@chromium.org, Apr 24 2018

Labels: merge-merged-factory-scarlet-10211.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/factory/+/160a4f76e290ad6a783391d417a3ccdae00b8a13

commit 160a4f76e290ad6a783391d417a3ccdae00b8a13
Author: Brian Norris <briannorris@chromium.org>
Date: Tue Apr 24 06:08:44 2018

factory: unwind bind mounts on shutdown

These are preventing unmounting the stateful partition, which yields an
unclean unmount.

To use this, we need something like this in the post-stop script for
/etc/init/factory.conf:

exec /usr/local/factory/init/startup stop

BUG=b/76121632, chromium:835438
TEST=reboot; see clean shutdown

Change-Id: I234b234759bb6c9cfefa58b3975d5772999e0a40
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1022981
Tested-by: Youcheng Syu <youcheng@chromium.org>
Reviewed-by: Youcheng Syu <youcheng@chromium.org>
Commit-Queue: Youcheng Syu <youcheng@chromium.org>

[modify] https://crrev.com/160a4f76e290ad6a783391d417a3ccdae00b8a13/init/goofy.d/setup_chrome.sh
[modify] https://crrev.com/160a4f76e290ad6a783391d417a3ccdae00b8a13/init/common.d/inhibit_jobs.sh
[modify] https://crrev.com/160a4f76e290ad6a783391d417a3ccdae00b8a13/init/startup
[modify] https://crrev.com/160a4f76e290ad6a783391d417a3ccdae00b8a13/init/goofy.d/jailbreak.sh
[modify] https://crrev.com/160a4f76e290ad6a783391d417a3ccdae00b8a13/init/common.d/vt2_hacks.sh

Project Member

Comment 4 by bugdroid1@chromium.org, Apr 24 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/d93ee4748e8a3d3807cb0c7b90eb0c05b133b6fc

commit d93ee4748e8a3d3807cb0c7b90eb0c05b133b6fc
Author: Brian Norris <briannorris@chromium.org>
Date: Tue Apr 24 06:08:45 2018

init: factory: run teardown processes at shutdown

We've implemented some 'stop' scripts which handle unmounting various
bind mounts and performing other cleanup.

Wait until ui is completely down, since the factory toolkit tends to
be using directories we want to unmount.

CQ-DEPEND=CL:1022981
BUG=b:76121632,  chromium:835438 
TEST=reboot; see clean shutdown

Change-Id: I5da2095adf3d75f0c5f301b29379a8e871e2f46f
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1022814
Reviewed-by: Youcheng Syu <youcheng@chromium.org>
Commit-Queue: Youcheng Syu <youcheng@chromium.org>

[modify] https://crrev.com/d93ee4748e8a3d3807cb0c7b90eb0c05b133b6fc/init/upstart/test-init/factory.conf

Project Member

Comment 5 by bugdroid1@chromium.org, Apr 25 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/factory/+/4de83db097487762384214c4807d35943d122809

commit 4de83db097487762384214c4807d35943d122809
Author: Brian Norris <briannorris@chromium.org>
Date: Wed Apr 25 01:53:11 2018

goofy: don't reimplement upstart's shutdown procedures

All of these services have appropriate 'stop on' conditions, which get
triggered via the runlevel change that happens with the 'shutdown'
command ('shutdown -{h,r} now'). This list is just a red herring for
generating unnecessary cleanup.

If daemons aren't stopping when they should, modifications should be
made to their init files, not methods like this.

This is inspired by stuff like b/76121632 and CL:1022050.

BUG= chromium:835438 
TEST=run factory toolkit reboot

Change-Id: I2a296fde5da27e56e467c8705674c5e300f4cae6
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1022019
Reviewed-by: Gwendal Grignou <gwendal@chromium.org>
Reviewed-by: Hung-Te Lin <hungte@chromium.org>
Reviewed-by: Yilun Lin <yllin@chromium.org>

[modify] https://crrev.com/4de83db097487762384214c4807d35943d122809/py/goofy/test_environment.py

Project Member

Comment 6 by bugdroid1@chromium.org, Apr 25 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/factory/+/fe918bdbd447cef00aa2fe311254c1e1f73f6543

commit fe918bdbd447cef00aa2fe311254c1e1f73f6543
Author: Brian Norris <briannorris@chromium.org>
Date: Wed Apr 25 07:26:10 2018

factory: fixup style on output redirection

Apparently this is the preferred style. It didn't block review for:

160a4f76e290 factory: unwind bind mounts on shutdown

BUG= chromium:835438 
TEST=none

Change-Id: Ia5902def97a6731cd43ac4d3f9a604f32d4338c9
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1026163
Reviewed-by: Youcheng Syu <youcheng@chromium.org>

[modify] https://crrev.com/fe918bdbd447cef00aa2fe311254c1e1f73f6543/init/goofy.d/setup_chrome.sh
[modify] https://crrev.com/fe918bdbd447cef00aa2fe311254c1e1f73f6543/init/common.d/inhibit_jobs.sh
[modify] https://crrev.com/fe918bdbd447cef00aa2fe311254c1e1f73f6543/init/goofy.d/jailbreak.sh
[modify] https://crrev.com/fe918bdbd447cef00aa2fe311254c1e1f73f6543/init/common.d/vt2_hacks.sh

Using the above CLs on Scarlet, I got through about 3000 reboot cycles on 3 devices with no unclean unmounts. I believe the latest failures were because minijail was still in use, so we couldn't unmount the factory-hacked version of minijail. I think we do need to ensure that more services are stopped before /etc/init/factory.conf gets to its 'post-stop'.

That could mean augmenting our 'stop on' condition, or else we could force more services to stop in the "init/startup stop" command. That could essentially be something like reverting [1], except putting that into the new 'stop' command instead.

[1] https://chromium-review.googlesource.com/c/chromiumos/platform/factory/+/1022019
Owner: briannorris@chromium.org
Status: Fixed (was: Untriaged)
are we done with this change? I'm marking this as fixed.
Not all those changes landed on ToT. I'm not going to bother.

Sign in to add a comment