New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 732588 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

servo V4 repair fails with message "<host> is not in an installable state"

Reported by jrbarnette@chromium.org, Jun 12 2017

Issue description

From time, I see servo v4 repair failing with this error:
	FAIL	----	verify.update	timestamp=1495621253	localtime=May 24 03:20:53	chromeos6-row1-rack10-labstation is not in an installable state

That error message comes from client/common_lib/cros/autoupdater.py:
    def _wait_for_update_service(self):
        # ...

        # Expect the update engine to be idle.
        if status != UPDATER_IDLE:
            raise ChromiumOSError('%s is not in an installable state' %
                                  self.host.hostname)


This should _not_ cause the servo v4 to report failure.  We need to
fix the repair flow so that this condition is a warning buried in a
log file somewhere.

 
Owner: jrbarnette@chromium.org
Status: ExternalDependency (was: Available)
If you want action on this bug, please add more context. At least:
- Link to test logs where this failure happened.

Also, why is this Pri-1. If this is Pri-1 please give us some idea of the impact of this failure mode on this bug. (how often does it happen? Or when is it expected to happen from your understanding of the problem?)
Added note: Just saw the "Chase-Pending" label.
The requirements for Chase-Pending are even stricter. You have to justify (to some degree) that this either causes or extends outages. (I think the fix should be easy, so scope argument isn't needed really).

See akeshet@'s email about this label.
Status: Available (was: ExternalDependency)
Re impact:  This causes DUTs that are broken not to be able to repair,
even if servo is able to do the work.

Regarding context:  Part of the servo verification flow includes
"check whether the servo host needs update"  That check invokes the
code above, which causes failure.  But, "check whether the servo
needs update" shouldn't be fatal if the servo can't actually update.

In the specific case I looked at the servo host state was "needs
update".  A servo host sitting around in that state is also a bug,
but it's a separate bug.

Owner: ----
Regarding the potential for outages:  For the cited labstation,
there are three failed falco DUTs.  Given time, I expect I can
find other failed DUTs for other boards.  Enough failed DUTs
will cause outages.

As for whether there's an actual, historic outage I can point
to from this, that's less clear.  I can only say IWBN to
address this problem before it contributes to an actual outage.

Owner: pprabhu@chromium.org
Status: Assigned (was: Available)
Re #5. We can take this up in the next Chase-Pending bash. I'll note that you need to show real potential for outage / historic impact for bugs to be accepted to Chase.

We have 100s of things that can cause outages. For Chase, you must convince everybody in the room that this bug is likely to cause one before the other 99.

Assigning to myself just to remove from triage queue.
Cc: pprabhu@chromium.org jrbarnette@chromium.org
Labels: -Chase-Pending Chase
Owner: ayatane@chromium.org
Have 1-line fix, with risk uncertain. Allen to upload CL based on idea, with Richard for review.
Project Member

Comment 9 by bugdroid1@chromium.org, Jun 29 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/66aa254e6ce449cfd9b49b552c606f95ad53d612

commit 66aa254e6ce449cfd9b49b552c606f95ad53d612
Author: Allen Li <ayatane@chromium.org>
Date: Thu Jun 29 14:54:57 2017

[autotest] Skip servo update if pending reboot

If the servo is already pending reboot for an update, we skip trying
to trigger another update.

BUG= chromium:732588 
TEST=None

Change-Id: I1c0b3ba7fa52c2ef98294ad6584e223f1ddaf70c
Reviewed-on: https://chromium-review.googlesource.com/549089
Reviewed-by: Richard Barnette <jrbarnette@google.com>
Commit-Queue: Richard Barnette <jrbarnette@google.com>
Tested-by: Richard Barnette <jrbarnette@google.com>

[modify] https://crrev.com/66aa254e6ce449cfd9b49b552c606f95ad53d612/server/hosts/servo_host.py

Project Member

Comment 10 by bugdroid1@chromium.org, Jun 30 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/1a5cc0a2acf66ed9a93fd270ca1e34c005107fde

commit 1a5cc0a2acf66ed9a93fd270ca1e34c005107fde
Author: Allen Li <ayatane@chromium.org>
Date: Fri Jun 30 23:02:42 2017

[autotest] Get rid of to_raise exception storing

BUG= chromium:732588 
TEST=None

Change-Id: Ie0dbaa1692e1d3fd4d57978e4ad8d24da3d70045
Reviewed-on: https://chromium-review.googlesource.com/549087
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/1a5cc0a2acf66ed9a93fd270ca1e34c005107fde/client/common_lib/cros/autoupdater.py

Project Member

Comment 11 by bugdroid1@chromium.org, Jun 30 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/b5420a7d676be0c152ca6aa91cafbadb145d5bcd

commit b5420a7d676be0c152ca6aa91cafbadb145d5bcd
Author: Allen Li <ayatane@chromium.org>
Date: Fri Jun 30 23:02:43 2017

[autotest] Extract _get_metric_fields()

BUG= chromium:732588 
TEST=None

Change-Id: Ifd78cc0c44bccf2f2a0d811005821030f8b335bf
Reviewed-on: https://chromium-review.googlesource.com/549088
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/b5420a7d676be0c152ca6aa91cafbadb145d5bcd/client/common_lib/cros/autoupdater.py

Status: fii (was: Assigned)
Status: Fixed (was: fii)
For reference, the relevant part of the logs:

06/23 12:04:25.387 DEBUG|          ssh_host:0297| Running (ssh) '/usr/bin/update_engine_client -status | grep CURRENT_OP'
06/23 12:04:25.542 ERROR|             utils:0280| [stderr] [0623/120425:INFO:update_engine_client.cc(493)] Querying Update Engine status...
06/23 12:04:25.543 DEBUG|             utils:0280| [stdout] CURRENT_OP=UPDATE_STATUS_UPDATED_NEED_REBOOT
06/23 12:04:25.547 ERROR|            repair:0332| Failed: servo host software is up-to-date
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 329, in _verify_host
    self.verify(host)
  File "/usr/local/autotest/server/hosts/servo_repair.py", line 28, in verify
    host.update_image(wait_for_update=False)
  File "/usr/local/autotest/server/hosts/servo_host.py", line 582, in update_image
    updater.trigger_update()
  File "/usr/local/autotest/client/common_lib/cros/autoupdater.py", line 248, in trigger_update
    self._wait_for_update_service()
  File "/usr/local/autotest/client/common_lib/cros/autoupdater.py", line 237, in _wait_for_update_service
    self.host.hostname)
ChromiumOSError: chromeos6-row2-rack21-labstation2 is not in an installable state

Status: Fixed (was: Available)
n.m.  That failure pre-dates the fix... :-(

Sign in to add a comment