chromeos4-row1-rack8-host3 fails to come back after reboot |
||||
Issue descriptionThis has happened consistently for the last three release builds. For instance: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8939401675686074096 Provision reports: ABORT: Check ribbon cable: 'pwr_button' is stuck, Host did not return from reboot However when I go to ssh into the machine, it's there. ...Actually, I rebooted it just for a lark. I rebooted it at 11:16AM, and then made a loop on my dev machine trying to SSH into it, which didn't succeed until 11:58AM. /var/log/messages also shows that the kernel seemed to boot at that time. So where was this machine in the meantime? The machine was chromeos4-row1-rack8-host3.cros.
,
Aug 1
Hmmm... The DUT has been consistently failing provisioning with
this symptom, starting yesterday morning:
$ dut-status -d 48 -f chromeos4-row1-rack8-host3 | grep provision
2018-08-01 06:28:36 -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/571066-provision/
2018-07-31 23:39:53 -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/570865-provision/
2018-07-31 21:09:24 -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/570678-provision/
2018-07-31 14:02:44 -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/570607-provision/
2018-07-31 09:45:27 -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/570378-provision/
2018-07-31 01:42:21 OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/570176-provision/
2018-07-31 01:11:40 OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/570156-provision/
2018-07-30 22:07:15 OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/569843-provision/
2018-07-30 19:31:58 OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/569328-provision/
2018-07-30 19:04:54 OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/569251-provision/
2018-07-30 15:17:18 OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/569158-provision/
2018-07-30 14:46:28 OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/569132-provision/
The "status.log" file has the summary of the trouble.
A recent repair task did capture eventlog.txt:
2018-08-01 12:24:02 OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack8-host3/571155-repair/
Here's what the EC saw:
254 | 2018-08-01 11:13:31 | Kernel Event | Clean Shutdown
255 | 2018-08-01 11:13:32 | System boot | 6193
256 | 2018-08-01 11:13:32 | System Reset
257 | 2018-08-01 11:26:20 | EC Event | Battery
258 | 2018-08-01 11:33:12 | EC Event | Battery
259 | 2018-08-01 11:45:15 | EC Event | Battery
This was taken from /var/log/messages:
2018-08-01T11:13:30.323182-07:00 NOTICE pre-shutdown[13326]: Shutting down for reboot: not-via-powerd
2018-08-01T11:13:30.331116-07:00 INFO kernel: [ 7995.578390] EXT4-fs (dm-0): re-mounted. Opts: (null)
2018-08-01T11:13:30.332470-07:00 WARNING chapsd[907]: SRK does not exist - this is normal when the TPM is not yet owned.
2018-08-01T11:13:30.332520-07:00 WARNING chapsd[907]: SRK does not exist - this is normal when the TPM is not yet owned.
2018-08-01T11:13:30.332539-07:00 INFO chapsd[907]: Unloading keys for all slots.
2018-08-01T11:13:30.333837-07:00 INFO btdispatch[1643]: Power manager becomes not available
2018-08-01T11:57:58.601741-07:00 INFO kernel: [ 0.000000] Initializing cgroup subsys cpu
2018-08-01T11:57:58.601772-07:00 NOTICE kernel: [ 0.000000] Linux version 3.8.11 (chrome-bot@cros-beefy144-c2) (gcc version 4.9.x 20150123 (prerelease) (4.9.2_cos_gg_4.9.2-r191-71959ce8f47f676a26bb21da7117101d9d73867e_4.9.2-r191) ) #1 SMP Mon Jul 30 22:22:46 PDT 2018
I've moved the DUT out of the BVT pool, so it won't cause failures on
the release builders.
,
Aug 1
I've filed b/112103600 and locked the unit pending further examination.
,
Aug 7
A replacement has been ordered on the lab ticket, so we're done here. |
||||
►
Sign in to add a comment |
||||
Comment 1 by jclinton@chromium.org
, Aug 1Owner: jrbarnette@chromium.org
Status: Assigned (was: Untriaged)