servo V4 repair fails with message "<host> is not in an installable state"
Reported by
jrbarnette@chromium.org,
Jun 12 2017
|
|||||||||
Issue description
From time, I see servo v4 repair failing with this error:
FAIL ---- verify.update timestamp=1495621253 localtime=May 24 03:20:53 chromeos6-row1-rack10-labstation is not in an installable state
That error message comes from client/common_lib/cros/autoupdater.py:
def _wait_for_update_service(self):
# ...
# Expect the update engine to be idle.
if status != UPDATER_IDLE:
raise ChromiumOSError('%s is not in an installable state' %
self.host.hostname)
This should _not_ cause the servo v4 to report failure. We need to
fix the repair flow so that this condition is a warning buried in a
log file somewhere.
,
Jun 13 2017
Added note: Just saw the "Chase-Pending" label. The requirements for Chase-Pending are even stricter. You have to justify (to some degree) that this either causes or extends outages. (I think the fix should be easy, so scope argument isn't needed really). See akeshet@'s email about this label.
,
Jun 13 2017
Re impact: This causes DUTs that are broken not to be able to repair, even if servo is able to do the work. Regarding context: Part of the servo verification flow includes "check whether the servo host needs update" That check invokes the code above, which causes failure. But, "check whether the servo needs update" shouldn't be fatal if the servo can't actually update. In the specific case I looked at the servo host state was "needs update". A servo host sitting around in that state is also a bug, but it's a separate bug.
,
Jun 13 2017
,
Jun 13 2017
Regarding the potential for outages: For the cited labstation, there are three failed falco DUTs. Given time, I expect I can find other failed DUTs for other boards. Enough failed DUTs will cause outages. As for whether there's an actual, historic outage I can point to from this, that's less clear. I can only say IWBN to address this problem before it contributes to an actual outage.
,
Jun 13 2017
Re #5. We can take this up in the next Chase-Pending bash. I'll note that you need to show real potential for outage / historic impact for bugs to be accepted to Chase. We have 100s of things that can cause outages. For Chase, you must convince everybody in the room that this bug is likely to cause one before the other 99. Assigning to myself just to remove from triage queue.
,
Jun 19 2017
,
Jun 26 2017
Have 1-line fix, with risk uncertain. Allen to upload CL based on idea, with Richard for review.
,
Jun 29 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/66aa254e6ce449cfd9b49b552c606f95ad53d612 commit 66aa254e6ce449cfd9b49b552c606f95ad53d612 Author: Allen Li <ayatane@chromium.org> Date: Thu Jun 29 14:54:57 2017 [autotest] Skip servo update if pending reboot If the servo is already pending reboot for an update, we skip trying to trigger another update. BUG= chromium:732588 TEST=None Change-Id: I1c0b3ba7fa52c2ef98294ad6584e223f1ddaf70c Reviewed-on: https://chromium-review.googlesource.com/549089 Reviewed-by: Richard Barnette <jrbarnette@google.com> Commit-Queue: Richard Barnette <jrbarnette@google.com> Tested-by: Richard Barnette <jrbarnette@google.com> [modify] https://crrev.com/66aa254e6ce449cfd9b49b552c606f95ad53d612/server/hosts/servo_host.py
,
Jun 30 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/1a5cc0a2acf66ed9a93fd270ca1e34c005107fde commit 1a5cc0a2acf66ed9a93fd270ca1e34c005107fde Author: Allen Li <ayatane@chromium.org> Date: Fri Jun 30 23:02:42 2017 [autotest] Get rid of to_raise exception storing BUG= chromium:732588 TEST=None Change-Id: Ie0dbaa1692e1d3fd4d57978e4ad8d24da3d70045 Reviewed-on: https://chromium-review.googlesource.com/549087 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Allen Li <ayatane@chromium.org> [modify] https://crrev.com/1a5cc0a2acf66ed9a93fd270ca1e34c005107fde/client/common_lib/cros/autoupdater.py
,
Jun 30 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/b5420a7d676be0c152ca6aa91cafbadb145d5bcd commit b5420a7d676be0c152ca6aa91cafbadb145d5bcd Author: Allen Li <ayatane@chromium.org> Date: Fri Jun 30 23:02:43 2017 [autotest] Extract _get_metric_fields() BUG= chromium:732588 TEST=None Change-Id: Ifd78cc0c44bccf2f2a0d811005821030f8b335bf Reviewed-on: https://chromium-review.googlesource.com/549088 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Allen Li <ayatane@chromium.org> [modify] https://crrev.com/b5420a7d676be0c152ca6aa91cafbadb145d5bcd/client/common_lib/cros/autoupdater.py
,
Jul 5 2017
,
Jul 5 2017
,
Jul 24 2017
<sigh> Not fixed. See for instance here:
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos6-row2-rack21-host2/814249-repair/20172306120343/
,
Jul 24 2017
For reference, the relevant part of the logs:
06/23 12:04:25.387 DEBUG| ssh_host:0297| Running (ssh) '/usr/bin/update_engine_client -status | grep CURRENT_OP'
06/23 12:04:25.542 ERROR| utils:0280| [stderr] [0623/120425:INFO:update_engine_client.cc(493)] Querying Update Engine status...
06/23 12:04:25.543 DEBUG| utils:0280| [stdout] CURRENT_OP=UPDATE_STATUS_UPDATED_NEED_REBOOT
06/23 12:04:25.547 ERROR| repair:0332| Failed: servo host software is up-to-date
Traceback (most recent call last):
File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 329, in _verify_host
self.verify(host)
File "/usr/local/autotest/server/hosts/servo_repair.py", line 28, in verify
host.update_image(wait_for_update=False)
File "/usr/local/autotest/server/hosts/servo_host.py", line 582, in update_image
updater.trigger_update()
File "/usr/local/autotest/client/common_lib/cros/autoupdater.py", line 248, in trigger_update
self._wait_for_update_service()
File "/usr/local/autotest/client/common_lib/cros/autoupdater.py", line 237, in _wait_for_update_service
self.host.hostname)
ChromiumOSError: chromeos6-row2-rack21-labstation2 is not in an installable state
,
Jul 25 2017
n.m. That failure pre-dates the fix... :-( |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by pprabhu@chromium.org
, Jun 13 2017Status: ExternalDependency (was: Available)