New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 622129 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

On whirlwind, powerwash breaks stateful_update

Reported by jrbarnette@chromium.org, Jun 22 2016

Issue description

Consider the following repair job on a whirlwind:
    http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row10-rack12-host2/56738755-repair/

If you go through status.log from the repair, you find
that repair.powerwash was attempted, and failed with this
line:
		FAIL	----	repair.powerwash	timestamp=1466542915	localtime=Jun 21 14:01:55	Failed to perform stateful update on chromeos4-row10-rack12-host2

Looking in debug/autoserv.DEBUG, you find these lines:
06/21 14:01:02.456 DEBUG|          ssh_host:0180| Running (ssh) '/tmp/stateful_update http://100.107.160.4:8082/static/whirlwind-release/R50-7978.2.0 --stateful_change=clean 2>&1'
06/21 14:01:02.886 DEBUG|        base_utils:0278| [stdout] Downloading stateful payload from http://100.107.160.4:8082/static/whirlwind-release/R50-7978.2.0/stateful.tgz
06/21 14:01:02.888 DEBUG|        base_utils:0278| [stdout] eval: 1: wget: not found

Logging into the DUT (now running the repair test image), you
can see this:
    root@chromeos4-row10-rack12-host2 $ which wget
    /usr/local/bin/wget

So, Powerwash on whirlwind test images wipes out `wget`, which
means that repair.powerwash must always fail.

There are two possible fixes:
  * Change whirlwind base images to include wget.
  * Change stateful_update to use curl, or else to support a
    choice of either wget or curl.

 

Comment 1 by cbook@google.com, Jun 23 2016

How does this work on other chromeos images?  Is wget always included in the base image?

Does whirlwind not depend on some package that normally pulls this in?
> How does this work on other chromeos images?  Is wget always
> included in the base image?

Yes, on standard Chrome OS, both curl and wget are part of the
root file system in the base image..


> Does whirlwind not depend on some package that normally pulls this in?

That would be the expected reason for the difference.
There's also an off chance that including 'wget' depends
directly on the 'embedded' use flag.

Comment 3 by cbook@google.com, Jun 23 2016

I just wonder if wget should be added to a chromeos base package instead of specifically whirlwind or jetstream.

Since autotest seems to require it after a powerwash, and this applies to all boards, it seems like this should be pulled into chromeos generically.
> I just wonder if wget should be added to a chromeos base
> package instead of specifically whirlwind or jetstream.

Yes, making sure 'wget' is a dependency of the base package
would be a Good Thing.

Comment 5 by cbook@google.com, Jun 23 2016

I will make a patch.  But can you confirm that "repair.powerwash" is only used on test images?  ie this is part of autotest infrastructure?
> But can you confirm that "repair.powerwash" is only used
> on test images?  ie this is part of autotest infrastructure?

I'm not sure what you're asking here.  Powerwash is a product
feature, and it gets tested in contexts other than repair.
'repair.powerwash' _is_ specific to the autotest infrastructure,
but it's not the only way this bug can be triggered.  All of
the cases at work here apply only to test images in the test
lab.

However, if the fix is merely to make 'wget' part of the base
image, then the real risk you want to ask about is "is it OK to
include this package in the consumer product, even though it
won't be used (or needed) by the consumers?"

Comment 7 by cbook@chromium.org, Jun 23 2016

I intend to make 'wget' only part of the test image, such that this won't impact consumers.

I wanted to know if wget was only required by the test infrastructure itself, and it sounds like that is the case.
> I intend to make 'wget' only part of the test image,
> such that this won't impact consumers.

Ah.  No, that won't work.  'wget' is _already_ part
of the test image only.  That's the bug.

> I wanted to know if wget was only required by the
> test infrastructure itself, and it sounds like that
> is the case.

The problem is that test-image specific bits are stored in
stateful, under /usr/local.  When you run 'powerwash', those
bits get wiped out.  In the case of whirlwind, the bits that
get wiped out include the 'wget' command, which breaks our
ability to update the device in the lab.

There's a powerwash test that runs regularly with every release
build, so the only thing protecting us from a total whirlwind
outage is that for whirlwind, servo reliably repairs the devices
by re-installing from USB.  But we'd rather not make lab reliability
only as good as servo repair.

Comment 9 by ra...@google.com, Jun 24 2016

I'm a little confused. If wget is part of the test image, why is powerwash having problems restoring it?

I think python may have a similar issue, I think I've seen python commands fail when Whirlwind DUTs are in this state. I initially thought that issue and the wget issue were linked.
> I'm a little confused. If wget is part of the test image,
> why is powerwash having problems restoring it?

Powerwash restores nothing.  It wipes out all content in
the stateful partition.  With the exception of certain upstart
jobs, all test-image specific content is stored in the stateful
partition.  So, powerwash wipes out all test-image only commands,
including 'python' and 'wget'.

For purposes of updating a DUT, we can live without python; it's
not needed for update.  However, the 'stateful_update' script
relies on 'wget'. So, because powerwash obliterates 'wget', we
can't apply updates after powerwash.

Being unable to apply an update isn't 100% fatal; servo repair
knows how to fix it, but that's undesirable.  One of the standard
repair procedures is "power wash, then apply an update".  That
procedure breaks on whirlwind.

Status: Untriaged (was: Unconfirmed)
Components: OS>Embedded
Status: Available (was: Untriaged)

Comment 13 by cbook@chromium.org, Jun 29 2016

https://chromium-review.googlesource.com/#/c/356826/

> If wget is part of the test image, why is powerwash having problems restoring it?

I only meant to include it on the root file system for test images.  See above patch.

Are you able to test with this patch?

Comment 14 Deleted

Owner: cbook@chromium.org
Status: Assigned (was: Untriaged)
Cc: ra...@google.com

Comment 17 Deleted

Comment 18 by cbook@chromium.org, Jul 17 2017

Cc: -cbook@chromium.org
Owner: ----
Removing myself as I nolonger work on this.
Updating this bug with status.  There are two ways to fix this
problem:
 1) Add 'wget' to the root file system in test images, so that
    'wget' is present even after powerwash.  This in the approach
    of the CL in c#13.
 2) Change 'stateful_update' to use 'curl' instead of 'wget'.
    The 'curl' command is already present in the root file system
    of all images, because it's used by chromeos-base/dev-install,
    which is part of the base Chromium OS image.

I'm partial to option 2).  It has the advantage that once the change
is deployed to the lab, the lab will be able to repair any build.
That means the change doesn't have to be merged to beta or stable
brances, and we don't have to worry about old FSI images that must
be installed from time-to-time.

Note that option 2) works in part because 'stateful_update' is part
of stateful(!), so after powerwash, we install the script from the
devserver.

Option 1) has the advantage of not depending on the quirks of the
lab installation flows, but is otherwise inferior because it won't
allow repairing the old builds without 'wget' that will still be
present in the lab from time-to-time.

let's use curl to fetch things.  we usually need libcurl anyways by system packages.
Components: -OS>Embedded Infra>Client>ChromeOS
Status: Untriaged (was: Assigned)
Hrmph!  'wget' was added to jetstream images with crosreview.com/i/383874,
which landed in R61-9610.0.0 as a fix for b/62098277.

Changing stateful_update to use 'curl' is arguably still better
(see c#19 and c#20), so let's leave this open.

Changing (and testing) stateful_update is more a thing requiring
expertise from the Infra team, though, so let's also change ownership.

Owner: haddowk@chromium.org
Status: Fixed (was: Untriaged)
Should be fixed with the CL for  bug 785487 :
    https://chromium-review.googlesource.com/773114

Comment 23 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Comment 24 by dchan@chromium.org, Jan 23 2018

Status: Fixed (was: Archived)

Sign in to add a comment