Fix the firmware update verifier to support unibuilds |
|||||||||||||||
Issue descriptionhttps://luci-milo.appspot.com/buildbot/chromeos/reef-uni-paladin/937 suite timeline: https://viceroy.corp.google.com/chromeos/suite_details?job_id=154570472 All boards failed provision on their first attempt with "FAIL ---- verify.rwfw timestamp=1510058396 localtime=Nov 07 04:39:56 DUT firmware requires update from Google_Snappy.9042.136.0 to Google_Snappy.9042.154.0" Subsequent repair job installed the new firmware, and subsequent tests ran. This corresponds closely with an email I received "Stable version update summary 2017-45" which did update snappy "Google_Snappy.9042.136.0 -> Google_Snappy.9042.154.0" My question: is it working as intended that this caused a provision failure and thus a suite failure? Did we just get very unluck in the race between stable versions and CQ hwtests?
,
Nov 27 2017
,
Dec 5 2017
I might be the right person to do this, but it's not on my top priority list. P2 for now, since it could (in theory) cause infrequent CQ failures. This problem only resolved itself because of the existence of a non-unibuild solution for the affected board(s). I don't know how coral based models will cope. If they cope too badly, this could break all coral boards. If that happens, this bug will overnight become P1 or P0. We should strive to fix the problem before that happens.
,
Dec 6 2017
Hmmm... It looks like the "reef-uni" builders are all gone.
And checking coral build metadata, I see that coral unibuilds
bundle only a single firmware image for all models.
So, two questions:
* Are there any remaining unibuilds that bundle more than
one firmware image?
* Assuming the answer to the first question is "no", are we
committed for the foreseeable future to delivering only a
single firmware image in a given unibuild image, as is
currently done for coral?
Passing ownership to shapiroc@ to provide the answers. I'm pretty
sure that if the answers to the questions are "no" and "yes",
respectively, then this bug should be WontFix.
,
Dec 6 2017
1 - not right now, but there may be ... we may do version pinning and if we do migrations, this will be a valid use case
,
Dec 6 2017
I still anticipate that I won't be the one to make the code changes required to address this issue.
,
Feb 6 2018
,
Feb 6 2018
This has become a problem on coral; see bug 809585.
Some of the technical details regarding the changes required to fix
this can be found here:
https://docs.google.com/document/d/1cXbuL3sQRibHngSZ6NXDyUasGfoqcMqt_TnCEwvV4r0/edit#heading=h.ikzt5grcbq1w
,
Feb 6 2018
,
Feb 6 2018
This is no longer a "when we can get to it" kind of priority.
,
Feb 6 2018
Let me know what questions you have around the structure of the shellball in determining the version.
,
Feb 6 2018
There's a lot of text on this bug. I believe it distills down to: > [T]he firmware verifier doesn't support unibuilds (yet). The problem is that the output from the multi-model bundle doesn't match the output looked for by the verifier code. jrbarnette@ can you briefly describe the fix that is required to the verifier?
,
Feb 6 2018
> jrbarnette@ can you briefly describe the fix that is required to the verifier?
There's slightly more to it than that. Full details are in the
doc cited in comment#8.
The relevant bullet items from that doc:
* Update chromite so that the metadata file for a unibuild
includes the firmware version for all models bundled with
the build.
* Update automated firmware version assignment to use the
model information in the metadata produced by chromite.
* Update the firmware update verifier in Autotest to use
model instead of board as the lookup key.
Comments on the doc say that chromite changes have been made.
I haven't actually checked the metadata from chromite to confirm
that everything needed is actually there.
,
Feb 6 2018
One other change that would have to be made part of the verifier (third bullet) is that the verifier must learn to parse the new-style output from "chromeos-firmwareupdate -V".
,
Feb 6 2018
> I haven't actually checked the metadata from chromite to confirm > that everything needed is actually there. I've looked at metadata from a recent coral build, and it appears to have everything that would be needed by automated version assignment.
,
Feb 7 2018
To clarify, there are two things left to do: 1) Update assign_stable_versions to change to use models in place of boards for firmware version mapping. 2) Update the firmware update verifier so that a) it uses model instead of board, and b) it correctly parses the output of `chromeos-firmwareupdate -V`.
,
Feb 7 2018
,
Feb 7 2018
This bug is now blocking all deployments of new coral models. See bug 810050 for details.
,
Feb 7 2018
-> nxia@ per discussion will work on this. See comment 16 for the brief summary of TODOs.
,
Feb 8 2018
A note regarding roll-out and urgency: If we address only the changes to assign_stable_images (step 1) before next Tuesday at 4:00 AM, we'll avoid a repeat of coral failures from yesterday and today. Then, step 2 can be completed without a looming deadline.
,
Feb 8 2018
I've identified one other necessary code change:
The "deployment_test" and "repair_test" scripts perform firmware
updates, and will need to accommodate this change.
The dependency shows up in this code in
site_utils/deployment/install.py:
from autotest_lib.site_utils.stable_images import assign_stable_images
[ ... ]
def _update_build(afe, report_log, arguments):
[ ... ]
if fw_version is None:
fw_version = assign_stable_images.get_firmware_version(
cros_version_map, arguments.board, cros_version)
,
Feb 15 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/7692d7eb924590b76b57e188fdbd06fb503ef764 commit 7692d7eb924590b76b57e188fdbd06fb503ef764 Author: Ningning Xia <nxia@google.com> Date: Thu Feb 15 06:44:25 2018 cros_firmware: FirmwareVersionVerifier returns the right firmware versions. Change FirmwareVersionVerifier._get_available_firmware to parse the output of chromeos-firmwareupdate script and get the right available firmware versions for unibuild and non-unibuild boards. BUG= chromium:782272 TEST=unit_tests; test new methods against duts. Change-Id: I9bb493217e3c03b7e3178f3fd7ccf3a0c6f47db6 Reviewed-on: https://chromium-review.googlesource.com/917536 Tested-by: Ningning Xia <nxia@chromium.org> Trybot-Ready: Ningning Xia <nxia@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com> [modify] https://crrev.com/7692d7eb924590b76b57e188fdbd06fb503ef764/server/afe_utils.py [add] https://crrev.com/7692d7eb924590b76b57e188fdbd06fb503ef764/server/hosts/cros_firmware_unittest.py [modify] https://crrev.com/7692d7eb924590b76b57e188fdbd06fb503ef764/server/hosts/cros_firmware.py
,
Feb 17 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/4bd97b099b78aba8cf29f09185540b32da4160f9 commit 4bd97b099b78aba8cf29f09185540b32da4160f9 Author: Richard Barnette <jrbarnette@chromium.org> Date: Sat Feb 17 05:42:16 2018 [autotest] Add 'model' to HostInfo. We need HostInfo objects to have a 'model' attribute, so we can ask about the 'model' label. BUG= chromium:782272 TEST=None Change-Id: I8b991c0e32b658b3e064f679d1ed6b4a25f470d2 Reviewed-on: https://chromium-review.googlesource.com/923407 Commit-Ready: Richard Barnette <jrbarnette@chromium.org> Tested-by: Richard Barnette <jrbarnette@google.com> Reviewed-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/4bd97b099b78aba8cf29f09185540b32da4160f9/server/hosts/host_info.py
,
Feb 20 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/05af740dee511066430772a955096e70d7fb673c commit 05af740dee511066430772a955096e70d7fb673c Author: Ningning Xia <nxia@google.com> Date: Tue Feb 20 18:19:52 2018 cros_firmware: FirmwareVersionVerifier returns the right firmware versions. Change FirmwareVersionVerifier._get_available_firmware to parse the output of chromeos-firmwareupdate script and get the right available firmware versions for unibuild and non-unibuild boards. BUG= chromium:782272 TEST=unit_tests; test new methods against duts. CQ-DEPEND=CL:923407 Change-Id: I9aa4b68e4b5becbc5f5f0853582280060911d691 Reviewed-on: https://chromium-review.googlesource.com/924469 Reviewed-by: Richard Barnette <jrbarnette@google.com> Tested-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/05af740dee511066430772a955096e70d7fb673c/server/afe_utils.py [add] https://crrev.com/05af740dee511066430772a955096e70d7fb673c/server/hosts/cros_firmware_unittest.py [modify] https://crrev.com/05af740dee511066430772a955096e70d7fb673c/server/hosts/cros_firmware.py
,
Feb 20 2018
This seems to have just broken all active coral (astronaut) DUTs in the CQ. As a quick workaround, I've just done a force rebalance, and am now asking the secondary to do a lab push to get the real fix into production.
,
Feb 20 2018
,
Feb 21 2018
OK. At this point, I've confirmed that the changes in #c24 is in the lab, and is blocking firmware updates for all coral boards. That means it _should_ be safe to restart deploying nasher, and any other coral (or fizz) based board. Note that the problem isn't completely fixed until we unblock firmware updates, by fixing the automated firmware version assignment. That's in progress.
,
Feb 26 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/7459c5dbc4f07e0122c406c0d174e1a7f3bea489 commit 7459c5dbc4f07e0122c406c0d174e1a7f3bea489 Author: Ningning Xia <nxia@google.com> Date: Mon Feb 26 09:17:20 2018 assign_stable_images: assign the right firmware version to models. Previously, unibuild boards weren't assigned the right firmware versions to update. Parse the metadata.json file generated by builds, get the right firmware versions for regular and unibuild boards. BUG= chromium:782272 TEST=assign_stable_images -n; unit_tests Change-Id: I795f60f2d2ff854495a102626db3b96443a41877 Reviewed-on: https://chromium-review.googlesource.com/912613 Commit-Ready: Ningning Xia <nxia@chromium.org> Tested-by: Ningning Xia <nxia@chromium.org> Reviewed-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/7459c5dbc4f07e0122c406c0d174e1a7f3bea489/site_utils/stable_images/assign_stable_images.py [modify] https://crrev.com/7459c5dbc4f07e0122c406c0d174e1a7f3bea489/site_utils/stable_images/assign_stable_images_unittest.py [add] https://crrev.com/7459c5dbc4f07e0122c406c0d174e1a7f3bea489/site_utils/deployment/install_unittest.py [modify] https://crrev.com/7459c5dbc4f07e0122c406c0d174e1a7f3bea489/site_utils/deployment/install.py
,
Mar 19 2018
|
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by jrbarnette@chromium.org
, Nov 27 2017> My question: is it working as intended that this caused a > provision failure and thus a suite failure? Mostly, no. Normally, we'd expect a sequence like this: * Firmware version X is released in a bundle on ToT. * The bundle update is merged to Beta, and released. * Automated assignment assigns version X to the board. Once those steps complete, it would be normal (not necessarily _required_, but still _normal_) that any build actually installed on a DUT would have the proper bundle. In that case, when the firmware update check triggered, the sequence would look like this: * The firmware verifier would look for firmware version X on the DUT, and find it out-of-date. * The verifier would then check the firmware bundle, and find that it contained version X. * The verifier would run "chromeos-firmwareupdate", and version X would be installed. * The verifier would pass. The reason that it failed in this case is that this is a unibuild, and the firmware verifier doesn't support unibuilds (yet). The problem is that the output from the multi-model bundle doesn't match the output looked for by the verifier code. The relevant logs are attached. What they show is that the build that had been provisioned actually included Google_Snappy.9042.154.0: Here's an excerpt: 11/07 04:39:50.662 INFO | repair:0349| Verifying this condition: The firmware on this DUT is up-to-date 11/07 04:39:50.834 DEBUG| ssh_host:0296| Running (ssh) 'crossystem fwid' from '_verify_list|_verify_host|verify|_get_rw_firmware|run|run_very_slowly' 11/07 04:39:51.413 DEBUG| utils:0299| [stdout] Google_Snappy.9042.136.0 11/07 04:39:51.444 DEBUG| ssh_host:0296| Running (ssh) 'chromeos-firmwareupdate -V' from '_verify_list|_verify_host|verify|_get_available_firmware|run|run_very_slowly' 11/07 04:39:53.919 DEBUG| utils:0280| [stdout] [ ... ] 11/07 04:39:53.935 DEBUG| utils:0280| [stdout] Model: snappy 11/07 04:39:53.936 DEBUG| utils:0280| [stdout] BIOS image: b562fc23c114d70905faa59f103b836a */build/reef-uni/tmp/portage/chromeos-base/chromeos-firmware-reef-0.0.1-r119/temp/tmpXhrHJU.pack_firmware-19137/models/snappy/image.bin 11/07 04:39:53.936 DEBUG| utils:0280| [stdout] BIOS version: Google_Snappy.9042.154.0 11/07 04:39:53.936 DEBUG| utils:0280| [stdout] EC image: 5a16cfba7fd5c52ccb8200e9a34afb77 */build/reef-uni/tmp/portage/chromeos-base/chromeos-firmware-reef-0.0.1-r119/temp/tmpXhrHJU.pack_firmware-19137/models/snappy/ec.bin 11/07 04:39:53.936 DEBUG| utils:0280| [stdout] EC version: snappy_v1.1.5935-5b2a89d12.3 KB
12.3 KB View Download