New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 737051 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

factory: connection_manager should not kill shill

Project Member Reported by hungte@chromium.org, Jun 27 2017

Issue description

The connection manager as goofy_plugin is causing many issues today, including:
 - when you restart factory, SSH connection will freeze for a while, and can't go back if goofy failed
 - in wiping, termination of factory (goofy) will stop shill and make it impossible to connect into shopfloor, also hard to debug since SSH is dropped.

Restarting shill is not a good option because (1) we'll lose all settings we've set in runtime, and (2) it may trigger additional setup jobs, including firewall, (3) for in-place-wiping, it's pretty hard to start shill in the chroot.

And no, we should not invoked dhclient / dhcpcd manually since it's not officially supported by CrOS and not guaranteed to work.

I think we should rewrite connection manager, since:
 - it currently calls DisableNetworking on plugin stop, which is not appropriate.

Ideally I think we should have a network manager that won't do "unexpected things" unless really needed. It should also restore system to "expected state" when destroyed.

And yes, I know there's some problems in current implementation because Goofy plugin systems now use "exclusive_resource" to handle life cycle of manager plugins like CPU & Network - and that's why it tries to kill shill when plugin is going to be sopped (because that's the way a pytest can manually configure network devices).

We need to think about what's the right way to handle resources.
 

Comment 1 by hungte@chromium.org, Jun 27 2017

Labels: -Pri-3 Pri-1
This is making every project to fail in finalization so I'd like to raise priority.
Project Member

Comment 2 by bugdroid1@chromium.org, Jun 27 2017

Labels: merge-merged-factory-eve-9667.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/factory/+/87983a45d2eb025e4a4be14e4c555252d5f17c36

commit 87983a45d2eb025e4a4be14e4c555252d5f17c36
Author: Hung-Te Lin <hungte@chromium.org>
Date: Tue Jun 27 14:03:10 2017

goofy: Change connection_manager to keep network on Goofy shutdown.

Connection manager plugin used to disable network when plugin is going
to be stopped; however this creates a problem that factory_restart and
wipe_in_place will run without network (since plugins are "stopped"
before "destroyed").

Adding a EnableNetworking in OnDestroyed does not help because enabling
and disabling takes too long and Goofy process may be killed before
finished re-configuring network.

As a result, we have to peak Goofy status (check if TERMINATING) in
plugin OnStop to decide if we should enable or disable network.

For long term we should improve connection manager to "restore system
network state when connection manager is initialized" instead of simply
"enabling network".

BUG= chromium:737051 
TEST=gooftool wipe_in_place; See network still enabled.

Change-Id: Id6b1d54c157f81512079e9d174cce1042e00c202
Reviewed-on: https://chromium-review.googlesource.com/549534
Tested-by: Hung-Te Lin <hungte@chromium.org>
Reviewed-by: Wei-Han Chen <stimim@chromium.org>
Commit-Queue: Wei-Han Chen <stimim@chromium.org>

[modify] https://crrev.com/87983a45d2eb025e4a4be14e4c555252d5f17c36/py/goofy/plugins/connection_manager.py

Comment 3 by hungte@chromium.org, Jun 28 2017

In additional to fixing the network-drop issues, I'd like to make changes to connection manager so that:

 - Stations can configure the static IP in plugin options.
 - Connection manager will re-configure WLAN and static IP when doing EnableNetwork, no matter if reset is True of False.

In that way we can fix the problem "fixed IP is lost after running some tests, for example BT/WLAN tests".
Project Member

Comment 4 by bugdroid1@chromium.org, Jun 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/factory/+/4c930774381f21ef2bc097bb2e4fcb123240500e

commit 4c930774381f21ef2bc097bb2e4fcb123240500e
Author: Hung-Te Lin <hungte@chromium.org>
Date: Wed Jun 28 10:09:47 2017

goofy: Change connection_manager to keep network on Goofy shutdown.

Connection manager plugin used to disable network when plugin is going
to be stopped; however this creates a problem that factory_restart and
wipe_in_place will run without network (since plugins are "stopped"
before "destroyed").

Adding a EnableNetworking in OnDestroyed does not help because enabling
and disabling takes too long and Goofy process may be killed before
finished re-configuring network.

As a result, we have to peak Goofy status (check if TERMINATING) in
plugin OnStop to decide if we should enable or disable network.

For long term we should improve connection manager to "restore system
network state when connection manager is initialized" instead of simply
"enabling network".

BUG= chromium:737051 , chromium:724085 
TEST=gooftool wipe_in_place; See network still enabled.

Change-Id: Id6b1d54c157f81512079e9d174cce1042e00c202
Reviewed-on: https://chromium-review.googlesource.com/549760
Commit-Ready: Hung-Te Lin <hungte@chromium.org>
Tested-by: Hung-Te Lin <hungte@chromium.org>
Reviewed-by: Ting Shen <phoenixshen@chromium.org>

[modify] https://crrev.com/4c930774381f21ef2bc097bb2e4fcb123240500e/py/goofy/plugins/connection_manager.py

Project Member

Comment 5 by bugdroid1@chromium.org, Jun 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/factory/+/bc731f6389c5423e0991eccb60628eeffa5d1a56

commit bc731f6389c5423e0991eccb60628eeffa5d1a56
Author: Hung-Te Lin <hungte@chromium.org>
Date: Wed Jun 28 10:09:47 2017

goofy: Prevent resetting network in Goofy init.

When starting Goofy, the test/utils/connection_manager was implemented as
"EnableNetwork(reset=False) unless `override_blacklisted_network_devices`
is set" on first launch, and do reset=True when Goofy is switching
from a test requesting exclusive network resource back to normal state.

However when being migrated to Goofy Plugins, the OnStart, which is
invoked for both system init and exclusive-resource transition, is
implemented as always do reset=True.

This creates a problem that even if override_blacklisted_network_devices
is None, we'll see a network reset (disable and enable) when Goofy
starts. This is also causing problems when we do remote SSH.

As a result, we should skip the first OnStart (since the plugin is always
doing start_enabled=True) and only do EnableNetwork(reset=True) in
normal running state.

BUG= chromium:737051 , chromium:724085 
TEST=make test; manually tested on DUT.

Change-Id: Ib589c84ed793c866b675b107144dd8ec3336978d
Reviewed-on: https://chromium-review.googlesource.com/550455
Commit-Ready: Hung-Te Lin <hungte@chromium.org>
Tested-by: Hung-Te Lin <hungte@chromium.org>
Reviewed-by: Wei-Han Chen <stimim@chromium.org>

[modify] https://crrev.com/bc731f6389c5423e0991eccb60628eeffa5d1a56/py/goofy/plugins/connection_manager.py

Comment 6 by hungte@chromium.org, Jul 14 2017

Owner: hungte@chromium.org
Status: Fixed (was: Untriaged)

Comment 7 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment