New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 782272 link

Starred by 5 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 809585
issue 810050



Sign in to add a comment

Fix the firmware update verifier to support unibuilds

Project Member Reported by akes...@chromium.org, Nov 7 2017

Issue description

https://luci-milo.appspot.com/buildbot/chromeos/reef-uni-paladin/937

suite timeline: https://viceroy.corp.google.com/chromeos/suite_details?job_id=154570472

All boards failed provision on their first attempt with "FAIL	----	verify.rwfw	timestamp=1510058396	localtime=Nov 07 04:39:56	DUT firmware requires update from Google_Snappy.9042.136.0 to Google_Snappy.9042.154.0"

Subsequent repair job installed the new firmware, and subsequent tests ran.

This corresponds closely with an email I received "Stable version update summary 2017-45" which did update snappy "Google_Snappy.9042.136.0 -> Google_Snappy.9042.154.0"


My question: is it working as intended that this caused a provision failure and thus a suite failure?

Did we just get very unluck in the race between stable versions and CQ hwtests?
 
> My question: is it working as intended that this caused a
> provision failure and thus a suite failure?

Mostly, no.  Normally, we'd expect a sequence like this:
  * Firmware version X is released in a bundle on ToT.
  * The bundle update is merged to Beta, and released.
  * Automated assignment assigns version X to the board.

Once those steps complete, it would be normal (not necessarily
_required_, but still _normal_) that any build actually installed
on a DUT would have the proper bundle.  In that case, when the
firmware update check triggered, the sequence would look like this:
  * The firmware verifier would look for firmware version X on the
    DUT, and find it out-of-date.
  * The verifier would then check the firmware bundle, and find that
    it contained version X.
  * The verifier would run "chromeos-firmwareupdate", and version X
    would be installed.
  * The verifier would pass.

The reason that it failed in this case is that this is a unibuild, and
the firmware verifier doesn't support unibuilds (yet).  The problem is
that the output from the multi-model bundle doesn't match the output
looked for by the verifier code.

The relevant logs are attached.  What they show is that the
build that had been provisioned actually included Google_Snappy.9042.154.0:

Here's an excerpt:
11/07 04:39:50.662 INFO |            repair:0349| Verifying this condition: The firmware on this DUT is up-to-date
11/07 04:39:50.834 DEBUG|          ssh_host:0296| Running (ssh) 'crossystem fwid' from '_verify_list|_verify_host|verify|_get_rw_firmware|run|run_very_slowly'
11/07 04:39:51.413 DEBUG|             utils:0299| [stdout] Google_Snappy.9042.136.0
11/07 04:39:51.444 DEBUG|          ssh_host:0296| Running (ssh) 'chromeos-firmwareupdate -V' from '_verify_list|_verify_host|verify|_get_available_firmware|run|run_very_slowly'
11/07 04:39:53.919 DEBUG|             utils:0280| [stdout] 
[ ... ]
11/07 04:39:53.935 DEBUG|             utils:0280| [stdout] Model:        snappy
11/07 04:39:53.936 DEBUG|             utils:0280| [stdout] BIOS image:   b562fc23c114d70905faa59f103b836a */build/reef-uni/tmp/portage/chromeos-base/chromeos-firmware-reef-0.0.1-r119/temp/tmpXhrHJU.pack_firmware-19137/models/snappy/image.bin
11/07 04:39:53.936 DEBUG|             utils:0280| [stdout] BIOS version: Google_Snappy.9042.154.0
11/07 04:39:53.936 DEBUG|             utils:0280| [stdout] EC image:     5a16cfba7fd5c52ccb8200e9a34afb77 */build/reef-uni/tmp/portage/chromeos-base/chromeos-firmware-reef-0.0.1-r119/temp/tmpXhrHJU.pack_firmware-19137/models/snappy/ec.bin
11/07 04:39:53.936 DEBUG|             utils:0280| [stdout] EC version:   snappy_v1.1.5935-5b2a89d

firmware-check.log
12.3 KB View Download
Cc: shapiroc@chromium.org
Owner: ----
Status: Available (was: Assigned)
Summary: Fix the firmware update verifier to support unibuilds (was: new firmware (stable image release) caused a CQ failure)
I might be the right person to do this, but it's not on my top
priority list.

P2 for now, since it could (in theory) cause infrequent CQ failures.

This problem only resolved itself because of the existence of
a non-unibuild solution for the affected board(s).  I don't know
how coral based models will cope.  If they cope too badly, this
could break all coral boards.  If that happens, this bug will
overnight become P1 or P0.  We should strive to fix the problem
before that happens.

Owner: shapiroc@chromium.org
Status: Assigned (was: Available)
Hmmm...  It looks like the "reef-uni" builders are all gone.
And checking coral build metadata, I see that coral unibuilds
bundle only a single firmware image for all models.

So, two questions:
  * Are there any remaining unibuilds that bundle more than
    one firmware image?
  * Assuming the answer to the first question is "no", are we
    committed for the foreseeable future to delivering only a
    single firmware image in a given unibuild image, as is
    currently done for coral?

Passing ownership to shapiroc@ to provide the answers.  I'm pretty
sure that if the answers to the questions are "no" and "yes",
respectively, then this bug should be WontFix.

Owner: jrbarnette@chromium.org
1 - not right now, but there may be ... we may do version pinning and if we do migrations, this will be a valid use case
Owner: ----
Status: Available (was: Assigned)
I still anticipate that I won't be the one to make the
code changes required to address this issue.

Blocking: 809585
This has become a problem on coral; see bug 809585.

Some of the technical details regarding the changes required to fix
this can be found here:
    https://docs.google.com/document/d/1cXbuL3sQRibHngSZ6NXDyUasGfoqcMqt_TnCEwvV4r0/edit#heading=h.ikzt5grcbq1w

Labels: -Pri-2 Pri-1
Owner: shapiroc@chromium.org
Status: Assigned (was: Available)
This is no longer a "when we can get to it" kind of priority.

Owner: akes...@chromium.org
Let me know what questions you have around the structure of the shellball in determining the version.
Owner: jrbarnette@chromium.org
There's a lot of text on this bug. I believe it distills down to:

> [T]he firmware verifier doesn't support unibuilds (yet).  The problem is
that the output from the multi-model bundle doesn't match the output
looked for by the verifier code.

jrbarnette@ can you briefly describe the fix that is required to the verifier?
Owner: akes...@chromium.org
> jrbarnette@ can you briefly describe the fix that is required to the verifier?

There's slightly more to it than that.  Full details are in the
doc cited in comment#8.

The relevant bullet items from that doc:
  * Update chromite so that the metadata file for a unibuild
    includes the firmware version for all models bundled with
    the build.
  * Update automated firmware version assignment to use the
    model information in the metadata produced by chromite.
  * Update the firmware update verifier in Autotest to use
    model instead of board as the lookup key.

Comments on the doc say that chromite changes have been made.
I haven't actually checked the metadata from chromite to confirm
that everything needed is actually there.

One other change that would have to be made part of the
verifier (third bullet) is that the verifier must learn to
parse the new-style output from "chromeos-firmwareupdate -V".

> I haven't actually checked the metadata from chromite to confirm
> that everything needed is actually there.

I've looked at metadata from a recent coral build, and it appears to
have everything that would be needed by automated version assignment.

To clarify, there are two things left to do:

1) Update assign_stable_versions to change to use models in place
   of boards for firmware version mapping.
2) Update the firmware update verifier so that a) it uses model
   instead of board, and b) it correctly parses the output of
   `chromeos-firmwareupdate -V`.

Blocking: 810050
This bug is now blocking all deployments of new coral models.
See bug 810050 for details.

Cc: akes...@chromium.org
Owner: nxia@chromium.org
-> nxia@ per discussion will work on this.

See comment 16 for the brief summary of TODOs.
A note regarding roll-out and urgency:  If we address only the changes
to assign_stable_images (step 1) before next Tuesday at 4:00 AM, we'll
avoid a repeat of coral failures from yesterday and today.  Then, step 2
can be completed without a looming deadline.

I've identified one other necessary code change:
The "deployment_test" and "repair_test" scripts perform firmware
updates, and will need to accommodate this change.

The dependency shows up in this code in
site_utils/deployment/install.py:
    from autotest_lib.site_utils.stable_images import assign_stable_images
[ ... ]
    def _update_build(afe, report_log, arguments):
[ ... ]
        if fw_version is None:
            fw_version = assign_stable_images.get_firmware_version(
                    cros_version_map, arguments.board, cros_version)

Project Member

Comment 22 by bugdroid1@chromium.org, Feb 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/7692d7eb924590b76b57e188fdbd06fb503ef764

commit 7692d7eb924590b76b57e188fdbd06fb503ef764
Author: Ningning Xia <nxia@google.com>
Date: Thu Feb 15 06:44:25 2018

cros_firmware: FirmwareVersionVerifier returns the right firmware versions.

Change FirmwareVersionVerifier._get_available_firmware to parse the
output of chromeos-firmwareupdate script and get the right available
firmware versions for unibuild and non-unibuild boards.

BUG= chromium:782272 
TEST=unit_tests; test new methods against duts.

Change-Id: I9bb493217e3c03b7e3178f3fd7ccf3a0c6f47db6
Reviewed-on: https://chromium-review.googlesource.com/917536
Tested-by: Ningning Xia <nxia@chromium.org>
Trybot-Ready: Ningning Xia <nxia@chromium.org>
Reviewed-by: Richard Barnette <jrbarnette@google.com>

[modify] https://crrev.com/7692d7eb924590b76b57e188fdbd06fb503ef764/server/afe_utils.py
[add] https://crrev.com/7692d7eb924590b76b57e188fdbd06fb503ef764/server/hosts/cros_firmware_unittest.py
[modify] https://crrev.com/7692d7eb924590b76b57e188fdbd06fb503ef764/server/hosts/cros_firmware.py

Project Member

Comment 23 by bugdroid1@chromium.org, Feb 17 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/4bd97b099b78aba8cf29f09185540b32da4160f9

commit 4bd97b099b78aba8cf29f09185540b32da4160f9
Author: Richard Barnette <jrbarnette@chromium.org>
Date: Sat Feb 17 05:42:16 2018

[autotest] Add 'model' to HostInfo.

We need HostInfo objects to have a 'model' attribute, so we can ask
about the 'model' label.

BUG= chromium:782272 
TEST=None

Change-Id: I8b991c0e32b658b3e064f679d1ed6b4a25f470d2
Reviewed-on: https://chromium-review.googlesource.com/923407
Commit-Ready: Richard Barnette <jrbarnette@chromium.org>
Tested-by: Richard Barnette <jrbarnette@google.com>
Reviewed-by: Ningning Xia <nxia@chromium.org>

[modify] https://crrev.com/4bd97b099b78aba8cf29f09185540b32da4160f9/server/hosts/host_info.py

Project Member

Comment 24 by bugdroid1@chromium.org, Feb 20 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/05af740dee511066430772a955096e70d7fb673c

commit 05af740dee511066430772a955096e70d7fb673c
Author: Ningning Xia <nxia@google.com>
Date: Tue Feb 20 18:19:52 2018

cros_firmware: FirmwareVersionVerifier returns the right firmware versions.

Change FirmwareVersionVerifier._get_available_firmware to parse the
output of chromeos-firmwareupdate script and get the right available
firmware versions for unibuild and non-unibuild boards.

BUG= chromium:782272 
TEST=unit_tests; test new methods against duts.
CQ-DEPEND=CL:923407

Change-Id: I9aa4b68e4b5becbc5f5f0853582280060911d691
Reviewed-on: https://chromium-review.googlesource.com/924469
Reviewed-by: Richard Barnette <jrbarnette@google.com>
Tested-by: Ningning Xia <nxia@chromium.org>

[modify] https://crrev.com/05af740dee511066430772a955096e70d7fb673c/server/afe_utils.py
[add] https://crrev.com/05af740dee511066430772a955096e70d7fb673c/server/hosts/cros_firmware_unittest.py
[modify] https://crrev.com/05af740dee511066430772a955096e70d7fb673c/server/hosts/cros_firmware.py

This seems to have just broken all active coral (astronaut) DUTs in the CQ.

As a quick workaround, I've just done a force rebalance, and am now asking the secondary to do a lab push to get the real fix into production.
Cc: dgarr...@chromium.org
 Issue 813802  has been merged into this issue.
OK.  At this point, I've confirmed that the changes in #c24 is in
the lab, and is blocking firmware updates for all coral boards.
That means it _should_ be safe to restart deploying nasher, and any
other coral (or fizz) based board.

Note that the problem isn't completely fixed until we unblock firmware
updates, by fixing the automated firmware version assignment.  That's
in progress.


Project Member

Comment 28 by bugdroid1@chromium.org, Feb 26 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/7459c5dbc4f07e0122c406c0d174e1a7f3bea489

commit 7459c5dbc4f07e0122c406c0d174e1a7f3bea489
Author: Ningning Xia <nxia@google.com>
Date: Mon Feb 26 09:17:20 2018

assign_stable_images: assign the right firmware version to models.

Previously, unibuild boards weren't assigned the right firmware versions
to update. Parse the metadata.json file generated by builds, get the
right firmware versions for regular and unibuild boards.

BUG= chromium:782272 
TEST=assign_stable_images -n; unit_tests

Change-Id: I795f60f2d2ff854495a102626db3b96443a41877
Reviewed-on: https://chromium-review.googlesource.com/912613
Commit-Ready: Ningning Xia <nxia@chromium.org>
Tested-by: Ningning Xia <nxia@chromium.org>
Reviewed-by: Ningning Xia <nxia@chromium.org>

[modify] https://crrev.com/7459c5dbc4f07e0122c406c0d174e1a7f3bea489/site_utils/stable_images/assign_stable_images.py
[modify] https://crrev.com/7459c5dbc4f07e0122c406c0d174e1a7f3bea489/site_utils/stable_images/assign_stable_images_unittest.py
[add] https://crrev.com/7459c5dbc4f07e0122c406c0d174e1a7f3bea489/site_utils/deployment/install_unittest.py
[modify] https://crrev.com/7459c5dbc4f07e0122c406c0d174e1a7f3bea489/site_utils/deployment/install.py

Comment 29 by nxia@chromium.org, Mar 19 2018

Status: Fixed (was: Assigned)

Sign in to add a comment