Nautilus DUT outage due to firmware update failure
Reported by
jrbarnette@chromium.org,
Aug 15
|
||||||||
Issue descriptionThe entire fleet of nautilus DUTs is failing repair. The basic symptom in the repair log looks like this: FAIL ---- verify.rwfw timestamp=1534238033 localtime=Aug 14 02:13:53 chromeos-firmwareupdate failed: from Google_Nautilus.10431.58.0 to Google_Nautilus.10431.71.0 Digging in to autoserv.DEBUG, you find these log entries: 08/13 22:11:59.614 INFO | repair:0349| Verifying this condition: The firmware on this DUT is up-to-date 08/13 22:11:59.848 DEBUG| ssh_host:0301| Running (ssh) 'crossystem fwid' from '_verify_list|_verify_host|verify|_get_rw_firmware|run|run_very_slowly' 08/13 22:12:00.325 DEBUG| utils:0305| [stdout] Google_Nautilus.10431.58.0 08/13 22:12:00.337 DEBUG| ssh_host:0301| Running (ssh) 'chromeos-firmwareupdate -V' from '_verify_list|_verify_host|verify|_get_available_firmware|run|run_very_slowly' 08/13 22:12:01.992 DEBUG| utils:0286| [stdout] 08/13 22:12:01.992 DEBUG| utils:0286| [stdout] flashrom(8): 2668a68fc653ddb6e4213b26502572c5 */build/nautilus/usr/sbin/flashrom 08/13 22:12:01.993 DEBUG| utils:0286| [stdout] ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f4e1a54f988428d20e0df356bac0136e95ae02c2, stripped 08/13 22:12:01.993 DEBUG| utils:0286| [stdout] 0.9.9 : a7a062b : Jul 31 2018 23:37:21 UTC 08/13 22:12:01.993 DEBUG| utils:0286| [stdout] 08/13 22:12:01.993 DEBUG| utils:0286| [stdout] Model: nautilus 08/13 22:12:01.993 DEBUG| utils:0286| [stdout] BIOS image: 6c41593a613d03f5e62eb3bbd9f86587 */models/nautilus/image.bin 08/13 22:12:01.993 DEBUG| utils:0286| [stdout] BIOS version: Google_Nautilus.10431.32.0 08/13 22:12:01.993 DEBUG| utils:0286| [stdout] BIOS (RW) image: 165dbfd475f735d8316be81c9e2100cd */models/nautilus/image.binrw 08/13 22:12:01.993 DEBUG| utils:0286| [stdout] BIOS (RW) version: Google_Nautilus.10431.71.0 08/13 22:12:01.993 DEBUG| utils:0286| [stdout] EC image: 53013d55b461845c76e428f545e3a247 */models/nautilus/ec.bin 08/13 22:12:01.993 DEBUG| utils:0286| [stdout] EC version: nautilus_v1.1.8047-2c5917d7d 08/13 22:12:01.994 DEBUG| utils:0286| [stdout] EC (RW) version: nautilus_v1.1.8083-6d4235d89 08/13 22:12:01.994 DEBUG| utils:0286| [stdout] 08/13 22:12:01.994 DEBUG| utils:0286| [stdout] Model: nautiluslte 08/13 22:12:01.994 DEBUG| utils:0286| [stdout] BIOS image: 165dbfd475f735d8316be81c9e2100cd */models/nautiluslte/image.bin 08/13 22:12:01.994 DEBUG| utils:0286| [stdout] BIOS version: Google_Nautilus.10431.71.0 08/13 22:12:01.994 DEBUG| utils:0286| [stdout] BIOS (RW) image: 165dbfd475f735d8316be81c9e2100cd */models/nautiluslte/image.binrw 08/13 22:12:01.994 DEBUG| utils:0286| [stdout] BIOS (RW) version: Google_Nautilus.10431.71.0 08/13 22:12:01.994 DEBUG| utils:0286| [stdout] EC image: b4437257a09bd16c99930627e4cbd183 */models/nautiluslte/ec.bin 08/13 22:12:01.995 DEBUG| utils:0286| [stdout] EC version: nautilus_v1.1.8083-6d4235d89 08/13 22:12:01.995 DEBUG| utils:0286| [stdout] EC (RW) version: nautilus_v1.1.8083-6d4235d89 08/13 22:12:01.995 DEBUG| utils:0286| [stdout] That is, the requested firmware version is for Google_Nautilus.10431.71.0, which is supplied for model "nautiluslte" but not for model "nautilus". Moreover, the attempt to update fails: 08/13 22:12:02.004 INFO | cros_firmware:0314| Updating firmware from Google_Nautilus.10431.58.0 to Google_Nautilus.10431.71.0 08/13 22:12:02.013 DEBUG| ssh_host:0301| Running (ssh) 'chromeos-firmwareupdate --mode=autoupdate' from '_repair_host|_verify_list|_verify_host|verify|run|run_very_slowly' 08/13 22:12:03.808 ERROR| utils:0286| [stderr] cros_config_fdt_err: find mapping: FDT_ERR_NOTFOUND 08/13 22:12:03.822 ERROR| utils:0286| [stderr] cros_config_read_sku_info: Failed to read master configuration 08/13 22:12:03.823 ERROR| utils:0286| [stderr] Platform not supported 08/13 22:12:03.823 ERROR| utils:0286| [stderr] Application error: Platform not supported 08/13 22:12:03.847 ERROR| utils:0286| [stderr] ERROR: Cannot get model from mosys. 08/13 22:12:03.848 ERROR| utils:0286| [stderr] ERROR: Execution failed: ./updater4.sh (error code = 1) 08/13 22:12:03.849 ERROR| cros_firmware:0322| chromeos-firmwareupdate failed: from Google_Nautilus.10431.58.0 to Google_Nautilus.10431.71.0
,
Aug 15
,
Aug 15
Hmmm...
> That is, the requested firmware version is for Google_Nautilus.10431.71.0,
> which is supplied for model "nautiluslte" but not for model "nautilus".
This ain't so. It's the RW firmware version that matters. For both models,
the RW version is Google_Nautilus.10431.71.0.
So, the reason that DUTs fail repair is that "chromeos-firmwareupdate" fails.
There are two distinct problems to be dealt with, so we may need two bugs.
1) We need to mitigate the nautilus repair failures in the lab, so that the
DUTs can resume testing.
2) We need to fix chromeos-firmwareupdate not to fail.
For item 1), the thing to do is to stop the updates temporarily. Instructions
are on the Test Infra Team's sites page:
https://sites.google.com/a/google.com/chromeos/for-team-members/infrastructure/chromeos-admin/manage-stable-version#TOC-Delete-a-Version-Setting
For item 2), the thing to do is to fix chromeos-firmwareupdate before the
next automatic firmware update (scheduled for Tuesday, 4:00AM).
,
Aug 15
> I believe this is the bug you are already working on, Greg. I think > the other is tracked in Buganizer, though? Can someone post a reference to the bugs mentioned here?
,
Aug 15
Need the summary to reflect the real problem...
,
Aug 15
+englab-sys-cros@ The folks in Stierlin Ct need to know that nautilus devices are all broken, and can't be fixed with a manual repair.
,
Aug 15
Issue 874242 has been merged into this issue.
,
Aug 15
This bug is also blocking deployment of nautiluslte devices into the test lab. The nautiluslte deployment can't be unblocked without a fix in some canary build.
,
Aug 17
yes, this should be fixed by https://chromium-review.googlesource.com/c/chromiumos/platform/mosys/+/1176090 and tracked by b/112319097. It has been in the CQ for well over a day
,
Aug 17
> yes, this should be fixed by [ ... ]
OK. There's still work to be done with this bug that goes
beyond just "fix the code". In order to unblock the nautiluslte
deployment, we have to update the stable repair image for nautilus
to a build with the fix. To be safe, that build must pass BVT on
hardware. But BVT testing is blocked because existing nautilus
DUTs can't upgrade.
So, the deputy needs to do these things:
* Disable firmware updates for nautilus, per #c3. This will allow
testing the fix on hardware.
* After the fix passes BVT, manually update the stable version to
the canary that passes.
,
Aug 17
First step: disable firmware updates: xixuan@xixuan0:~/chromiumos/src/third_party/autotest/files$ stable_version -d -t cros nautilus Delete Chrome OS nautilus -> R70-10950.0.0 Delete Firmware nautilus -> Google_Nautilus.10431.71.0 Delete Firmware nautiluslte -> Google_Nautilus.10431.71.0
,
Aug 17
For task 2: "After the fix passes BVT, manually update the stable version to the canary that passes.", pass it to next deputy.
,
Aug 24
Just update stable version to 10981 which the fix landed. $ stable_version -t cros nautilus R70-10981.0.0 Updating Chrome OS nautilus -> R69-10895.21.0 to R70-10981.0.0 Updating Firmware nautilus -> Google_Nautilus.10431.67.0 to Google_Nautilus.10431.71.0 Adding Firmware nautiluslte -> Google_Nautilus.10431.71.0
,
Aug 27
-> to current deputy to continue follow up.
,
Aug 30
The only follow-up was that the various "nautiluslte" DUTs were labeled with "model:nautilus". To fix the labels, I've removed the "model:nautilus" label from all of those DUTs, and then run repair on them. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by jclinton@chromium.org
, Aug 15