HWTest failure (provision) on reef-chrome-pfq |
|||||
Issue description- reef-chrome-pfq: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8938699915606146368 provision FAIL: Failure in build R70-10952.0.0-rc1: System rolled back to previous build, No answer to ssh from chromeos6-row4-rack10-labstation2 08/09 00:01:07.974 WARNI| test:0637| The test failed with the following exception Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 631, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute dargs) File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 191, in run_once host, url, use_quick_provision, with_cheets) File "/usr/local/autotest/server/afe_utils.py", line 112, in machine_install_and_update_labels image_name, host_attributes = updater.run_update() File "/usr/local/autotest/server/cros/autoupdater.py", line 1011, in run_update self._complete_update(expected_kernel) File "/usr/local/autotest/server/cros/autoupdater.py", line 960, in _complete_update expected_kernel, NewBuildUpdateError.ROLLBACK_FAILURE) File "/usr/local/autotest/server/cros/autoupdater.py", line 780, in verify_boot_expectations raise NewBuildUpdateError(self.update_version, rollback_message) NewBuildUpdateError: Failure in build R70-10952.0.0-rc1: System rolled back to previous build There are a bunch of exceptions here too: https://00e9e64bac908d4c630a6c4b76f2b12279e1f1a802cf370aac-apidata.googleusercontent.com/download/storage/v1/b/chromeos-autotest-results/o/225224673-chromeos-test%2Fchromeos6-row4-rack10-host10%2Fdebug%2Fautoserv.DEBUG?qk=AD5uMEvWV-augbr5trTi-RLmYjWh3Z2Va7ZtnLPovujMFdWKPEkl2yMFu1YwC4pDeNGW_fHMVo7oMVtnsr79kauKGEZxv1eTjrH65yV2EeZUtDovgRQQaCXkaC3cHZN6xphQYUbwzGxhnS0bj7yWmYlMPQXdDZEm8kEQ41HE29Z-dTDb1lihSfb1XGfQoLc6TfHnjKXEzr3gRCPDQ1EoH098ilwALAijcoKMsRyuLfyYJkhXgEn4a52QdhENSGX9Q8XQ63G0qD4EozaRQ4Pbv8_YRpHOMtmdt3nsVI5Bp353-FDRKgAYCD8gdMKtAzaaJmiwlK0s6eIe9d6CQZJZmN8DeEAuRYErRjS10yP5kKWKiWMun-YxFnE5o2Ro6-HV_67IiyWsZVkwID665unguM0W13uMrYfzM8oPBdoiyWP_BqoHe1AMmWYSeHdXEgrWCkdXoNpP0-6u8tonF_vB7-pAqutax75mp3KuYabLvouFAwpXmdnCP_JRaDFrI9F1w7hsISvIEnDYJQpIcVOylJxxql5VrvCePZHx-9smaagcmHcUNuSb9HM6M08cwbtp5idgXuwn2a45RXgra2BIBSMr3594NdXkzdV1RjCdgXnruI9sPzMrpzBNsLzYxL-qFW-ILc3xe0DQXqjsteQl-BOsNuo4F0F0tlhxi7LJVB5-Ca6LtKAillmQQsjjeU96h7tgjLsuQsPQLJwJ3iQ8z7lqeb4bzJsX1md74P-de0lDwRAZDMeqy42PqRa_n1oghnTOTjJP_f3OGvYrchY4jJ3EWRpWKex5REknS1pJvPEAEm6PI8pA1jeTNEt0an8XGUOmXWIZVzjL5sfdFqrDPS-UuBYSSfIbVA
,
Aug 9
I will look into it... Just need to file several other issues.
,
Aug 9
There is a couple of kernel panics in `eventlog.txt`, but the timing doesn't seem to be aligned with the test time: 145 | 2018-08-08 16:00:11 | Kernel Event | Panic ... 162 | 2018-08-08 16:13:00 | Kernel Event | Panic `console-ramoops-0` doesn't show any panics. `messages` however shows several crashes for `coreutils`: 2018-08-09T03:14:34.170426+00:00 INFO crash_reporter[9794]: libminijail[9794]: mount /dev/log -> /dev/log type '' 2018-08-09T03:14:34.182845+00:00 WARNING crash_reporter[9794]: [user] Received crash notification for coreutils[9793] sig 31, user 0 (developer build - not testing - always dumping) 2018-08-09T03:14:34.186212+00:00 INFO crash_reporter[9794]: State of crashed process [9793]: S (sleeping) 2018-08-09T03:14:34.186710+00:00 INFO crash_reporter[9794]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5' 2018-08-09T03:14:34.197910+00:00 INFO metrics_daemon[3144]: [INFO:metrics_daemon.cc(427)] Got org.chromium.CrashReporter.UserCrash D-Bus signal 2018-08-09T03:14:34.208089+00:00 INFO crash_reporter[9794]: Stored minidump to /var/spool/crash/coreutils.20180808.201434.9793.dmp 2018-08-09T03:14:34.208537+00:00 INFO crash_reporter[9794]: Leaving core file at /proc/self/fd/5/coreutils.20180808.201434.9793.core due to developer image
,
Aug 9
would help to get the minidump output for these crashes. `coreutils` is a multicall binary, so it's not clear from that what was actually run. although sig 31 is SIGSYS, so a seccomp violation there is also weird ... these don't normally run under seccomp
,
Aug 9
Here's the contents of the eventlog from the time of the failure: 242 | 2018-08-08 23:57:25 | Kernel Event | Clean Shutdown 243 | 2018-08-08 23:57:28 | System boot | 7218 244 | 2018-08-08 23:57:28 | System Reset 245 | 2018-08-09 00:00:03 | Kernel Event | Clean Shutdown 246 | 2018-08-09 00:00:04 | System boot | 7219 247 | 2018-08-09 00:00:04 | System Reset 248 | 2018-08-09 00:00:04 | cr50 Update Reset 249 | 2018-08-09 00:00:04 | ACPI Enter | S5 250 | 2018-08-09 00:00:07 | System boot | 7220 251 | 2018-08-09 00:00:07 | Chrome OS Recovery Mode | Recovery Button Pressed | 0x02 252 | 2018-08-09 00:00:07 | Log area cleared | 1022 253 | 2018-08-09 00:00:07 | EC Event | Keyboard Recovery 254 | 2018-08-09 00:00:07 | Memory Cache Update | Variable | Success Part of that I understand. Part of it is... odd. +nsanders@ because he's answered questions about Cr50 for me before.
,
Aug 9
> would help to get the minidump output for these crashes. > `coreutils` is a multicall binary, so it's not clear from > that what was actually run. The crashes appear to be timestamped 2 hours before the failure here; they're not related to this bug.
,
Aug 9
"cr50 Update Reset" is when a deferred cr50 update takes place and forces a double reboot. I could see it interfering in a recovery boot (which may be the case here?), but it's not clear how it could cause a rollback. I guess ATL will try USB install in this case, hence the recovery request? Are there EC logs?
,
Aug 10
> I guess ATL will try USB install in this case, hence the recovery request? Are there EC logs? The recovery request happened in the middle of the reboot; it was too soon to be attributable to any attempt to repair with servo. That's one of the things I can't explain.
,
Aug 10
> [ ... ] Are there EC logs?
Full logs of the failure are here:
https://stainless.corp.google.com/browse/chromeos-autotest-results/225224673-chromeos-test/chromeos6-row4-rack10-host10/
Unfortunately, checking status.log, I see that the labstation was
down:
FAIL ---- verify.servo_ssh timestamp=1533797723 localtime=Aug 08 23:55:23 No answer to ssh from chromeos6-row4-rack10-labstation2
"No labstation" means "no servo" and therefore "no EC logs." :-(
,
Aug 17
I think this is no longer failing, so closing. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by jrbarnette@chromium.org
, Aug 9Owner: afakhry@chromium.org