New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 870022 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Aug 7
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

chromeos4-row1-rack8-host3 fails to come back after reboot

Project Member Reported by evgreen@chromium.org, Aug 1

Issue description

This has happened consistently for the last three release builds. For instance:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8939401675686074096

Provision reports:
ABORT: Check ribbon cable: 'pwr_button' is stuck, Host did not return from reboot


However when I go to ssh into the machine, it's there.

...Actually, I rebooted it just for a lark. I rebooted it at 11:16AM, and then made a loop on my dev machine trying to SSH into it, which didn't succeed until 11:58AM. /var/log/messages also shows that the kernel seemed to boot at that time. So where was this machine in the meantime? The machine was chromeos4-row1-rack8-host3.cros.
 
Components: -Infra>Client>ChromeOS>CI Infra>Client>ChromeOS>Test
Owner: jrbarnette@chromium.org
Status: Assigned (was: Untriaged)
Summary: chromeos4-row1-rack8-host3 fails to come back after reboot (was: leon fails to come back after provisioning)
Hmmm...  The DUT has been consistently failing provisioning with
this symptom, starting yesterday morning:
$ dut-status -d 48 -f chromeos4-row1-rack8-host3 | grep provision
    2018-08-01 06:28:36  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/571066-provision/
    2018-07-31 23:39:53  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/570865-provision/
    2018-07-31 21:09:24  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/570678-provision/
    2018-07-31 14:02:44  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/570607-provision/
    2018-07-31 09:45:27  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/570378-provision/
    2018-07-31 01:42:21  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/570176-provision/
    2018-07-31 01:11:40  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/570156-provision/
    2018-07-30 22:07:15  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/569843-provision/
    2018-07-30 19:31:58  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/569328-provision/
    2018-07-30 19:04:54  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/569251-provision/
    2018-07-30 15:17:18  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/569158-provision/
    2018-07-30 14:46:28  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/569132-provision/

The "status.log" file has the summary of the trouble.

A recent repair task did capture eventlog.txt:
    2018-08-01 12:24:02  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/571155-repair/

Here's what the EC saw:
254 | 2018-08-01 11:13:31 | Kernel Event | Clean Shutdown
255 | 2018-08-01 11:13:32 | System boot | 6193
256 | 2018-08-01 11:13:32 | System Reset
257 | 2018-08-01 11:26:20 | EC Event | Battery
258 | 2018-08-01 11:33:12 | EC Event | Battery
259 | 2018-08-01 11:45:15 | EC Event | Battery

This was taken from /var/log/messages:
2018-08-01T11:13:30.323182-07:00 NOTICE pre-shutdown[13326]: Shutting down for reboot: not-via-powerd
2018-08-01T11:13:30.331116-07:00 INFO kernel: [ 7995.578390] EXT4-fs (dm-0): re-mounted. Opts: (null)
2018-08-01T11:13:30.332470-07:00 WARNING chapsd[907]: SRK does not exist - this is normal when the TPM is not yet owned.
2018-08-01T11:13:30.332520-07:00 WARNING chapsd[907]: SRK does not exist - this is normal when the TPM is not yet owned.
2018-08-01T11:13:30.332539-07:00 INFO chapsd[907]: Unloading keys for all slots.
2018-08-01T11:13:30.333837-07:00 INFO btdispatch[1643]: Power manager becomes not available
2018-08-01T11:57:58.601741-07:00 INFO kernel: [    0.000000] Initializing cgroup subsys cpu
2018-08-01T11:57:58.601772-07:00 NOTICE kernel: [    0.000000] Linux version 3.8.11 (chrome-bot@cros-beefy144-c2) (gcc version 4.9.x 20150123 (prerelease) (4.9.2_cos_gg_4.9.2-r191-71959ce8f47f676a26bb21da7117101d9d73867e_4.9.2-r191) ) #1 SMP Mon Jul 30 22:22:46 PDT 2018

I've moved the DUT out of the BVT pool, so it won't cause failures on
the release builders.

Status: ExternalDependency (was: Assigned)
I've filed b/112103600 and locked the unit pending further examination.

Status: Fixed (was: ExternalDependency)
A replacement has been ordered on the lab ticket, so we're done here.

Sign in to add a comment