New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 874477 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Aug 30
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Nautilus DUT outage due to firmware update failure

Reported by jrbarnette@chromium.org, Aug 15

Issue description

The entire fleet of nautilus DUTs is failing repair.

The basic symptom in the repair log looks like this:
	FAIL	----	verify.rwfw	timestamp=1534238033	localtime=Aug 14 02:13:53	chromeos-firmwareupdate failed: from Google_Nautilus.10431.58.0 to Google_Nautilus.10431.71.0

Digging in to autoserv.DEBUG, you find these log entries:
08/13 22:11:59.614 INFO |            repair:0349| Verifying this condition: The firmware on this DUT is up-to-date
08/13 22:11:59.848 DEBUG|          ssh_host:0301| Running (ssh) 'crossystem fwid' from '_verify_list|_verify_host|verify|_get_rw_firmware|run|run_very_slowly'
08/13 22:12:00.325 DEBUG|             utils:0305| [stdout] Google_Nautilus.10431.58.0
08/13 22:12:00.337 DEBUG|          ssh_host:0301| Running (ssh) 'chromeos-firmwareupdate -V' from '_verify_list|_verify_host|verify|_get_available_firmware|run|run_very_slowly'
08/13 22:12:01.992 DEBUG|             utils:0286| [stdout] 
08/13 22:12:01.992 DEBUG|             utils:0286| [stdout] flashrom(8): 2668a68fc653ddb6e4213b26502572c5 */build/nautilus/usr/sbin/flashrom
08/13 22:12:01.993 DEBUG|             utils:0286| [stdout]              ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f4e1a54f988428d20e0df356bac0136e95ae02c2, stripped
08/13 22:12:01.993 DEBUG|             utils:0286| [stdout]              0.9.9  : a7a062b : Jul 31 2018 23:37:21 UTC
08/13 22:12:01.993 DEBUG|             utils:0286| [stdout] 
08/13 22:12:01.993 DEBUG|             utils:0286| [stdout] Model:        nautilus
08/13 22:12:01.993 DEBUG|             utils:0286| [stdout] BIOS image:   6c41593a613d03f5e62eb3bbd9f86587 */models/nautilus/image.bin
08/13 22:12:01.993 DEBUG|             utils:0286| [stdout] BIOS version: Google_Nautilus.10431.32.0
08/13 22:12:01.993 DEBUG|             utils:0286| [stdout] BIOS (RW) image:   165dbfd475f735d8316be81c9e2100cd */models/nautilus/image.binrw
08/13 22:12:01.993 DEBUG|             utils:0286| [stdout] BIOS (RW) version: Google_Nautilus.10431.71.0
08/13 22:12:01.993 DEBUG|             utils:0286| [stdout] EC image:     53013d55b461845c76e428f545e3a247 */models/nautilus/ec.bin
08/13 22:12:01.993 DEBUG|             utils:0286| [stdout] EC version:   nautilus_v1.1.8047-2c5917d7d
08/13 22:12:01.994 DEBUG|             utils:0286| [stdout] EC (RW) version: nautilus_v1.1.8083-6d4235d89
08/13 22:12:01.994 DEBUG|             utils:0286| [stdout] 
08/13 22:12:01.994 DEBUG|             utils:0286| [stdout] Model:        nautiluslte
08/13 22:12:01.994 DEBUG|             utils:0286| [stdout] BIOS image:   165dbfd475f735d8316be81c9e2100cd */models/nautiluslte/image.bin
08/13 22:12:01.994 DEBUG|             utils:0286| [stdout] BIOS version: Google_Nautilus.10431.71.0
08/13 22:12:01.994 DEBUG|             utils:0286| [stdout] BIOS (RW) image:   165dbfd475f735d8316be81c9e2100cd */models/nautiluslte/image.binrw
08/13 22:12:01.994 DEBUG|             utils:0286| [stdout] BIOS (RW) version: Google_Nautilus.10431.71.0
08/13 22:12:01.994 DEBUG|             utils:0286| [stdout] EC image:     b4437257a09bd16c99930627e4cbd183 */models/nautiluslte/ec.bin
08/13 22:12:01.995 DEBUG|             utils:0286| [stdout] EC version:   nautilus_v1.1.8083-6d4235d89
08/13 22:12:01.995 DEBUG|             utils:0286| [stdout] EC (RW) version: nautilus_v1.1.8083-6d4235d89
08/13 22:12:01.995 DEBUG|             utils:0286| [stdout] 

That is, the requested firmware version is for Google_Nautilus.10431.71.0,
which is supplied for model "nautiluslte" but not for model "nautilus".

Moreover, the attempt to update fails:
08/13 22:12:02.004 INFO |     cros_firmware:0314| Updating firmware from Google_Nautilus.10431.58.0 to Google_Nautilus.10431.71.0
08/13 22:12:02.013 DEBUG|          ssh_host:0301| Running (ssh) 'chromeos-firmwareupdate --mode=autoupdate' from '_repair_host|_verify_list|_verify_host|verify|run|run_very_slowly'
08/13 22:12:03.808 ERROR|             utils:0286| [stderr] cros_config_fdt_err: find mapping: FDT_ERR_NOTFOUND
08/13 22:12:03.822 ERROR|             utils:0286| [stderr] cros_config_read_sku_info: Failed to read master configuration
08/13 22:12:03.823 ERROR|             utils:0286| [stderr] Platform not supported
08/13 22:12:03.823 ERROR|             utils:0286| [stderr] Application error: Platform not supported
08/13 22:12:03.847 ERROR|             utils:0286| [stderr] ERROR: Cannot get model from mosys.
08/13 22:12:03.848 ERROR|             utils:0286| [stderr] ERROR: Execution failed: ./updater4.sh (error code = 1)
08/13 22:12:03.849 ERROR|     cros_firmware:0322| chromeos-firmwareupdate failed: from Google_Nautilus.10431.58.0 to Google_Nautilus.10431.71.0

 
Owner: gmeinke@chromium.org
I believe this is the bug you are already working on, Greg. I think the other is tracked in Buganizer, though?
Components: Infra>Client>ChromeOS>Build
Hmmm...

> That is, the requested firmware version is for Google_Nautilus.10431.71.0,
> which is supplied for model "nautiluslte" but not for model "nautilus".

This ain't so.  It's the RW firmware version that matters.  For both models,
the RW version is Google_Nautilus.10431.71.0.

So, the reason that DUTs fail repair is that "chromeos-firmwareupdate" fails.

There are two distinct problems to be dealt with, so we may need two bugs.
 1) We need to mitigate the nautilus repair failures in the lab, so that the
    DUTs can resume testing.
 2) We need to fix chromeos-firmwareupdate not to fail.

For item 1), the thing to do is to stop the updates temporarily.  Instructions
are on the Test Infra Team's sites page:
    https://sites.google.com/a/google.com/chromeos/for-team-members/infrastructure/chromeos-admin/manage-stable-version#TOC-Delete-a-Version-Setting

For item 2), the thing to do is to fix chromeos-firmwareupdate before the
next automatic firmware update (scheduled for Tuesday, 4:00AM).

> I believe this is the bug you are already working on, Greg. I think
> the other is tracked in Buganizer, though?

Can someone post a reference to the bugs mentioned here?

Summary: Nautilus DUT outage due to firmware update failure (was: Nautilus DUT outage due to mismatch in firmware bundle)
Need the summary to reflect the real problem...

Cc: englab-sys-cros@google.com
+englab-sys-cros@

The folks in Stierlin Ct need to know that nautilus devices are all
broken, and can't be fixed with a manual repair.

Issue 874242 has been merged into this issue.
This bug is also blocking deployment of nautiluslte devices into
the test lab.

The nautiluslte deployment can't be unblocked without a fix in
some canary build.

yes, this should be fixed by https://chromium-review.googlesource.com/c/chromiumos/platform/mosys/+/1176090 and tracked by b/112319097. It has been in the CQ for well over a day
Owner: xixuan@chromium.org
> yes, this should be fixed by [ ... ]

OK.  There's still work to be done with this bug that goes
beyond just "fix the code".  In order to unblock the nautiluslte
deployment, we have to update the stable repair image for nautilus
to a build with the fix.  To be safe, that build must pass BVT on
hardware.  But BVT testing is blocked because existing nautilus
DUTs can't upgrade.

So, the deputy needs to do these things:
  * Disable firmware updates for nautilus, per #c3.  This will allow
    testing the fix on hardware.
  * After the fix passes BVT, manually update the stable version to
    the canary that passes.

First step: disable firmware updates:

xixuan@xixuan0:~/chromiumos/src/third_party/autotest/files$ stable_version -d -t cros nautilus
Delete    Chrome OS  nautilus     -> R70-10950.0.0
Delete    Firmware   nautilus     -> Google_Nautilus.10431.71.0
Delete    Firmware   nautiluslte  -> Google_Nautilus.10431.71.0


Owner: gu...@chromium.org
For task 2:

"After the fix passes BVT, manually update the stable version to the canary that passes.", pass it to next deputy.
Just update stable version to 10981 which the fix landed.

$ stable_version -t   cros nautilus R70-10981.0.0
Updating  Chrome OS  nautilus     -> R69-10895.21.0 to R70-10981.0.0
Updating  Firmware   nautilus     -> Google_Nautilus.10431.67.0 to Google_Nautilus.10431.71.0
Adding    Firmware   nautiluslte  -> Google_Nautilus.10431.71.0

Owner: jrbarnette@chromium.org
-> to current deputy to continue follow up.
Status: Fixed (was: Assigned)
The only follow-up was that the various "nautiluslte" DUTs were
labeled with "model:nautilus".

To fix the labels, I've removed the "model:nautilus" label from all
of those DUTs, and then run repair on them.

Sign in to add a comment