New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 872813 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Aug 17
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

HWTest failure (provision) on reef-chrome-pfq

Project Member Reported by afakhry@chromium.org, Aug 9

Issue description

- reef-chrome-pfq: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8938699915606146368

provision     FAIL: Failure in build R70-10952.0.0-rc1: System rolled back to previous build, No answer to ssh from chromeos6-row4-rack10-labstation2


08/09 00:01:07.974 WARNI|              test:0637| The test failed with the following exception
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 631, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute
    dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 191, in run_once
    host, url, use_quick_provision, with_cheets)
  File "/usr/local/autotest/server/afe_utils.py", line 112, in machine_install_and_update_labels
    image_name, host_attributes = updater.run_update()
  File "/usr/local/autotest/server/cros/autoupdater.py", line 1011, in run_update
    self._complete_update(expected_kernel)
  File "/usr/local/autotest/server/cros/autoupdater.py", line 960, in _complete_update
    expected_kernel, NewBuildUpdateError.ROLLBACK_FAILURE)
  File "/usr/local/autotest/server/cros/autoupdater.py", line 780, in verify_boot_expectations
    raise NewBuildUpdateError(self.update_version, rollback_message)
NewBuildUpdateError: Failure in build R70-10952.0.0-rc1: System rolled back to previous build




There are a bunch of exceptions here too: https://00e9e64bac908d4c630a6c4b76f2b12279e1f1a802cf370aac-apidata.googleusercontent.com/download/storage/v1/b/chromeos-autotest-results/o/225224673-chromeos-test%2Fchromeos6-row4-rack10-host10%2Fdebug%2Fautoserv.DEBUG?qk=AD5uMEvWV-augbr5trTi-RLmYjWh3Z2Va7ZtnLPovujMFdWKPEkl2yMFu1YwC4pDeNGW_fHMVo7oMVtnsr79kauKGEZxv1eTjrH65yV2EeZUtDovgRQQaCXkaC3cHZN6xphQYUbwzGxhnS0bj7yWmYlMPQXdDZEm8kEQ41HE29Z-dTDb1lihSfb1XGfQoLc6TfHnjKXEzr3gRCPDQ1EoH098ilwALAijcoKMsRyuLfyYJkhXgEn4a52QdhENSGX9Q8XQ63G0qD4EozaRQ4Pbv8_YRpHOMtmdt3nsVI5Bp353-FDRKgAYCD8gdMKtAzaaJmiwlK0s6eIe9d6CQZJZmN8DeEAuRYErRjS10yP5kKWKiWMun-YxFnE5o2Ro6-HV_67IiyWsZVkwID665unguM0W13uMrYfzM8oPBdoiyWP_BqoHe1AMmWYSeHdXEgrWCkdXoNpP0-6u8tonF_vB7-pAqutax75mp3KuYabLvouFAwpXmdnCP_JRaDFrI9F1w7hsISvIEnDYJQpIcVOylJxxql5VrvCePZHx-9smaagcmHcUNuSb9HM6M08cwbtp5idgXuwn2a45RXgra2BIBSMr3594NdXkzdV1RjCdgXnruI9sPzMrpzBNsLzYxL-qFW-ILc3xe0DQXqjsteQl-BOsNuo4F0F0tlhxi7LJVB5-Ca6LtKAillmQQsjjeU96h7tgjLsuQsPQLJwJ3iQ8z7lqeb4bzJsX1md74P-de0lDwRAZDMeqy42PqRa_n1oghnTOTjJP_f3OGvYrchY4jJ3EWRpWKex5REknS1pJvPEAEm6PI8pA1jeTNEt0an8XGUOmXWIZVzjL5sfdFqrDPS-UuBYSSfIbVA
 
Components: -Infra>Client>ChromeOS OS>Kernel
Owner: afakhry@chromium.org
> provision     FAIL: Failure in build R70-10952.0.0-rc1: System rolled back to previous build, No answer to ssh from chromeos6-row4-rack10-labstation2

This symptom is caused when the newly installed build crashes repeatedly,
leading the firmware to roll back to the previous version.

Usually, the crashes will be caused by the new software.  So, we need some
debug from a sheriff to figure out why the build crashed.

Cc: geohsu@chromium.org
I will look into it... Just need to file several other issues.
Cc: vapier@chromium.org de...@chromium.org
There is a couple of kernel panics in `eventlog.txt`, but the timing doesn't seem to be aligned with the test time:

145 | 2018-08-08 16:00:11 | Kernel Event | Panic
...
162 | 2018-08-08 16:13:00 | Kernel Event | Panic

`console-ramoops-0` doesn't show any panics.

`messages` however shows several crashes for `coreutils`:

2018-08-09T03:14:34.170426+00:00 INFO crash_reporter[9794]: libminijail[9794]: mount /dev/log -> /dev/log type ''
2018-08-09T03:14:34.182845+00:00 WARNING crash_reporter[9794]: [user] Received crash notification for coreutils[9793] sig 31, user 0 (developer build - not testing - always dumping)
2018-08-09T03:14:34.186212+00:00 INFO crash_reporter[9794]: State of crashed process [9793]: S (sleeping)
2018-08-09T03:14:34.186710+00:00 INFO crash_reporter[9794]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5'
2018-08-09T03:14:34.197910+00:00 INFO metrics_daemon[3144]: [INFO:metrics_daemon.cc(427)] Got org.chromium.CrashReporter.UserCrash D-Bus signal
2018-08-09T03:14:34.208089+00:00 INFO crash_reporter[9794]: Stored minidump to /var/spool/crash/coreutils.20180808.201434.9793.dmp
2018-08-09T03:14:34.208537+00:00 INFO crash_reporter[9794]: Leaving core file at /proc/self/fd/5/coreutils.20180808.201434.9793.core due to developer image
would help to get the minidump output for these crashes.  `coreutils` is a multicall binary, so it's not clear from that what was actually run.

although sig 31 is SIGSYS, so a seccomp violation there is also weird ... these don't normally run under seccomp
Cc: nsanders@chromium.org
Here's the contents of the eventlog from the time of the failure:
242 | 2018-08-08 23:57:25 | Kernel Event | Clean Shutdown
243 | 2018-08-08 23:57:28 | System boot | 7218
244 | 2018-08-08 23:57:28 | System Reset
245 | 2018-08-09 00:00:03 | Kernel Event | Clean Shutdown
246 | 2018-08-09 00:00:04 | System boot | 7219
247 | 2018-08-09 00:00:04 | System Reset
248 | 2018-08-09 00:00:04 | cr50 Update Reset
249 | 2018-08-09 00:00:04 | ACPI Enter | S5
250 | 2018-08-09 00:00:07 | System boot | 7220
251 | 2018-08-09 00:00:07 | Chrome OS Recovery Mode | Recovery Button Pressed | 0x02
252 | 2018-08-09 00:00:07 | Log area cleared | 1022
253 | 2018-08-09 00:00:07 | EC Event | Keyboard Recovery
254 | 2018-08-09 00:00:07 | Memory Cache Update | Variable | Success

Part of that I understand.  Part of it is... odd.

+nsanders@ because he's answered questions about Cr50 for me before.

> would help to get the minidump output for these crashes.
> `coreutils` is a multicall binary, so it's not clear from
> that what was actually run.

The crashes appear to be timestamped 2 hours before the failure here; they're
not related to this bug.


"cr50 Update Reset" is when a deferred cr50 update takes place and forces a double reboot. I could see it interfering in a recovery boot (which may be the case here?), but it's not clear how it could cause a rollback. 

I guess ATL will try USB install in this case, hence the recovery request? Are there EC logs?
> I guess ATL will try USB install in this case, hence the recovery request? Are there EC logs?

The recovery request happened in the middle of the reboot; it was too
soon to be attributable to any attempt to repair with servo.  That's
one of the things I can't explain.

> [ ... ] Are there EC logs?

Full logs of the failure are here:
    https://stainless.corp.google.com/browse/chromeos-autotest-results/225224673-chromeos-test/chromeos6-row4-rack10-host10/

Unfortunately, checking status.log, I see that the labstation was
down:
	FAIL	----	verify.servo_ssh	timestamp=1533797723	localtime=Aug 08 23:55:23	No answer to ssh from chromeos6-row4-rack10-labstation2

"No labstation" means "no servo" and therefore "no EC logs." :-(

Cc: michae...@chromium.org
Status: WontFix (was: Assigned)
I think this is no longer failing, so closing.

Sign in to add a comment