New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 677296 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

version labeling on Google_Kevin.8785.94.6 broke automated firmware update

Project Member Reported by semenzato@chromium.org, Dec 28 2016

Issue description

See also auto-filed issue 677273.

GS bucket and DUT name:

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/93082186-chromeos-test/chromeos2-row6-rack5-host3/

stdio link and snippet:

https://uberchromegw.corp.google.com/i/chromeos/builders/kevin-release/builds/715/steps/HWTest%20%5Bsanity%5D/logs/stdio

chromeos-server22-37: 335e54c068b69410 3
  Autotest instance: cautotest
  12-28-2016 [04:59:03] Created suite job: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=93082088
  @@@STEP_LINK@Link to suite@http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=93082088@@@
  Suite job   [ PASSED ]
  provision   [ FAILED ]
  provision     FAIL: DUT firmware requires update from Google_Kevin.8785.118.0 to Google_Kevin.8785.94.6, completed successfully
  


 
Owner: jrbarnette@chromium.org
Not sure if this is specific to this dut or there's an issue with the process/image/something else.

for chromeos2-row6-rack5-host3, it fails verify.rwfw and goes to update the fw.  However the verify afterwards passes even though the fw is still the old version.

repair job: http://chromeos-server7.mtv.corp.google.com/results/hosts/chromeos2-row6-rack5-host3/59311898-repair/debug/autoserv.DEBUG

Checking the fw on the dut after the fw update:
$ ssh root@chromeos2-row6-rack5-host3 "crossystem fwid"                                                  
Warning: Permanently added 'chromeos2-row6-rack5-host3' (RSA) to the list of known hosts.
Google_Kevin.8785.118.0

Not sure why the verify.rwfw succeeds after the repair when it should fail.
Labels: -Pri-2 Pri-1
This is still making the kevin canary fail.

Richard is probably not around.  Can someone else fix this?

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/93264700-chromeos-test/chromeos2-row6-rack5-host3/debug/

12/29 05:21:19.857 DEBUG|        base_utils:0280| [stdout] CHROMEOS_RELEASE_VERSION=9132.0.0
12/29 05:21:19.857 DEBUG|        base_utils:0280| [stdout] CHROMEOS_AUSERVER=https://tools.google.com/service/update2
12/29 05:21:19.894 INFO |        server_job:0153| 	GOOD	----	verify.cros	timestamp=1483017679	localtime=Dec 29 05:21:19	
12/29 05:21:19.895 INFO |            repair:0105| Skipping this operation: All host verification checks pass
12/29 05:21:19.896 DEBUG|            repair:0106| The following dependencies failed:
12/29 05:21:19.896 DEBUG|            repair:0108|     The firmware on this DUT is up-to-date
12/29 05:21:19.896 ERROR|           control:0071| DUT firmware requires update from Google_Kevin.8785.118.0 to Google_Kevin.8785.94.6
Traceback (most recent call last):


I'll lock that dut in the meantime so it doesn't cause more issues.
actually, that won't do anything, they're all at that version.  To get the release back to green, I'll temporarily set the stable fw version to 8785.118.0 until Richard or someone on the Kevin team can figure out what the expected fw should be.
$ ./atest stable_version modify --board kevin/rwfw --version Google_Kevin.8785.118.0
Stable version for board kevin/rwfw is changed from Google_Kevin.8785.94.6 to Google_Kevin.8785.118.0.
> $ ./atest stable_version modify --board kevin/rwfw --version Google_Kevin.8785.118.0
> Stable version for board kevin/rwfw is changed from Google_Kevin.8785.94.6 to Google_Kevin.8785.118.0.

Ai Ya!  Why was this necessary?  It's very likely that this
will cause more trouble, not less.

All the duts in the bvt pool were at that fw version so the release builders kept failing the hwtest stage.  Since the fw stable_version bump the builder went green: 

https://uberchromegw.corp.google.com/i/chromeos/builders/kevin-release

I can change it back but builder will go red again (maybe that's ok given this situation?).
The firmware assigned to a board must match the firmware bundled
with the current repair image.  The current kevin repair image
is R56-9000.35.0, which bundles firmware Google_Kevin.8785.94.6.

One of two things can/will eventually go wrong:
 1) A DUT that doesn't have 8785.118.0, because the verify/repair
    sequence can't work if the target firmware isn't bundled
    in the repair image.
 2) Next Tuesday, the firmware will be automatically reset to
    the firmware bundled in the (next) repair image, which, if
    it isn't 8785.118.0, will put us back where we were at the
    outset.

The fix requires two things:
 1) Set the repair image to a build that bundles the firmware
    we want (the .118.0 version).
 2) Figure out how all those kevins got the .118.0 firmware
    in the first place, and fix it so it won't happen again.

It looks like the 8785.94.6 firmware bundle doesn't identify
itself as expected.

Tracking the history of chromeos2-row6-rack5-host3, you see
this:

A) Last firmware check with Google_Kevin.8785.94.4 assigned:
    http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack5-host3/59294741-reset/

B) First firmware check after assigning Google_Kevin.8785.94.6:
    http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack5-host3/59294874-reset/

C) Second firmware check after assigning Google_Kevin.8785.94.6:
    http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack5-host3/59294926-reset/

A) and B) look like I'd expect; A) reports no firmware update, and
B) reports updating to the bundle in the build.

Looking at C, we discover a problem:
12/27 04:12:12.306 INFO |            repair:0327| Verifying this condition: The firmware on this DUT is up-to-date
12/27 04:12:12.471 DEBUG|          ssh_host:0177| Running (ssh) 'crossystem fwid'
12/27 04:12:12.898 DEBUG|        base_utils:0299| [stdout] Google_Kevin.8785.118.0

Here's how the bundle identified itself:
12/27 04:12:12.899 DEBUG|          ssh_host:0177| Running (ssh) 'chromeos-firmwareupdate -V'
12/27 04:12:14.383 DEBUG|        base_utils:0280| [stdout] 
12/27 04:12:14.384 DEBUG|        base_utils:0280| [stdout] flashrom(8): fe63e6a6f2431040d9cb7a62fdb6b11d */build/kevin/usr/sbin/flashrom
12/27 04:12:14.384 DEBUG|        base_utils:0280| [stdout]              ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, for GNU/Linux 2.6.16, BuildID[sha1]=9537c66614f060e027e143d229586f66d4bfc902, stripped
12/27 04:12:14.384 DEBUG|        base_utils:0280| [stdout]              0.9.4  : 65be03a : Nov 08 2016 10:51:57 UTC
12/27 04:12:14.385 DEBUG|        base_utils:0280| [stdout] 
12/27 04:12:14.385 DEBUG|        base_utils:0280| [stdout] BIOS image:   02b38affb90cd9c07dd8425bfffb3be7 */build/kevin/tmp/portage/chromeos-base/chromeos-firmware-kevin-0.0.1-r54/work/chromeos-firmware-kevin-0.0.1/.dist/kevin_fw_8785.94.6_8785.118.0.tbz2/image.bin
12/27 04:12:14.385 DEBUG|        base_utils:0280| [stdout] BIOS version: Google_Kevin.8785.94.6
12/27 04:12:14.385 DEBUG|        base_utils:0280| [stdout] EC image:     17a2133f4872e7da717bfaaba8026baa */build/kevin/tmp/portage/chromeos-base/chromeos-firmware-kevin-0.0.1-r54/work/chromeos-firmware-kevin-0.0.1/.dist/kevin_ec_8785.94.6_8785.118.0.tbz2/ec.bin
12/27 04:12:14.385 DEBUG|        base_utils:0280| [stdout] EC version:   kevin_v1.10.116-b2d1ab0

The bundle plainly calls itself Google_Kevin.8785.94.6, but the
bits seems to think they're really 8785.118.0.  Hence the trouble.

The reason that this problem showed up now is that as
of R57-9129.0.0, kevin bundles Google_Kevin.8785.122.0.
That changed the failure mode.

For now, this is the fix:

$ atest stable_version modify -b kevin -i R57-9135.0.0
Stable version for board kevin is changed from R56-9000.35.0 to R57-9135.0.0.
$ atest stable_version modify -b kevin/rwfw -i Google_Kevin.8785.122.0
Stable version for board kevin/rwfw is changed from Google_Kevin.8785.118.0 to Google_Kevin.8785.122.0.

The full root cause can wait until later.  Someone who understands
firmware builds needs to weigh in with an explanation.

Cc: aaboagye@chromium.org
Components: OS>Firmware
Owner: ----
Status: Available (was: Untriaged)
Summary: version labeling on Google_Kevin.8785.94.6 broke automated firmware update (was: kevin: DUT firmware requires update)
Ultimately, this is a bug in the firmware bundle for Google_Kevin.8785.94.6.

I think that means the firmware team has to own the problem.

Project Member

Comment 12 by sheriffbot@chromium.org, Feb 15 2018

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Owner: snanda@chromium.org
Status: Assigned (was: Untriaged)
AFAIK, whatever firmware update process caused this problem in
the first place could still happen again.  In that case, some
day in the future, some new hardware model will suffer the same
fate.  That is, all testing for the model will suddenly start
failing until developers intervene and work around the problem.

So, our options are:
 A) Go figure out how to ensure that the version strings
    printed by "chromeos-updatefirmware -V" will always match
    the version string reported by "crossystem fwid" after the
    firmware is installed.
 B) Continue to ignore this, and find out (the hard way) how
    long it takes until this problem causes another lab outage.

Option A) ain't free, and the cost of option B) is offset by
the event being unlikely.  So, if we judge that the expected
cost of option B) is cheaper, we can just close this as WontFix.

Sign in to add a comment