New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 823789 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Hana CQ run failed: "firmware requires update from Google_Hana.8438.120.0 to Google_Hana.8438.111.0"

Reported by jrbarnette@chromium.org, Mar 20 2018

Issue description

This CQ slave run failed:
    https://luci-milo.appspot.com/buildbot/chromeos/hana-paladin/2601

The provision suite had this complaint (in various forms) repeated
six times:
    [Test-Logs]: provision: FAIL: DUT firmware requires update from Google_Hana.8438.120.0 to Google_Hana.8438.111.0, [Errno 104] Connection reset by peer, completed successfully, servohost_Reboot is not unique.

There are seven DUTs in the pool, so that means virtually every
DUT made this complaint.

Correct firmware for hana is in fact Google_Hana.8438.111.0; no DUT
should have had Google_Hana.8438.120.0 installed.

The CQ pool seems to have recovered since; I'm not sure how.

Logs for the six failures are here:
    http://cautotest-prod/tko/retrieve_logs.cgi?job=/results/185054930-chromeos-test/
    http://cautotest-prod/tko/retrieve_logs.cgi?job=/results/185054934-chromeos-test/
    http://cautotest-prod/tko/retrieve_logs.cgi?job=/results/185054928-chromeos-test/
    http://cautotest-prod/tko/retrieve_logs.cgi?job=/results/185054932-chromeos-test/
    http://cautotest-prod/tko/retrieve_logs.cgi?job=/results/185054924-chromeos-test/
    http://cautotest-prod/tko/retrieve_logs.cgi?job=/results/185054936-chromeos-test/

 
The DUTs recovered because when repair was invoked, the DUTs were
allowed to downgrade from .120.0 to .111.0.

As for why the DUTs were running .120.0, apparently that was the
selected firmware revision prior to this mornings version assignment
run.  This provision on one of the problem DUTs shows that the .120.0
version was installed and accepted immediately before the failure:
    http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row3-rack3-host21/583754-provision/

According to the version assignment logs, last night hana upgraded from
R65-10323.46.0 to R65-10323.58.0.  Checking that:

    $ get_firmware_version hana R65-10323.46.0
    hana-release/R65-10323.46.0              Google_Hana.8438.120.0
    $ get_firmware_version hana R65-10323.58.0
    hana-release/R65-10323.58.0              Google_Hana.8438.111.0

So, in that sense, the event is the system WAI.

There are still two anomalies:
  * Why did the Beta channel downgrade the firmware?  Is that normal?
  * The logs from the version assignment didn't log the hana firmware
    change; it should have.

Cc: bhthompson@chromium.org
>   * The logs from the version assignment didn't log the hana firmware
>     change; it should have.

Oh, <sigh> It _was_ logged:

Applying firmware updates:
Failed to get firmware version for board kevin-arcnext: No JSON object
could be decoded.
Failed to get firmware version for board caroline-arcnext: No JSON
object could be decoded.
   hana                   Google_Hana.8438.120.0 -> Google_Hana.8438.111.0

Status: WontFix (was: Assigned)
I've confirmed that the downgrade was deliberate, and in response to
a late-breaking bug that forced reverting from the .120 build back to
the .111 build.

So, this is all WAI.

Sign in to add a comment