New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 854064 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jun 2018
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

chromeos4-row8-rack3-host7 is sick

Reported by jrbarnette@chromium.org, Jun 19 2018

Issue description

The `candy` DUT chromeos4-row8-rack3-host7 is in a bad way.  Here's its
history:
    $ dut-status -f chromeos4-row8-rack3-host7
    chromeos4-row8-rack3-host7
        2018-06-18 19:55:06  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1085989-repair/
        2018-06-18 19:54:02  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1085977-verify/
        2018-06-18 19:23:15  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1085650-repair/
        2018-06-18 19:20:14  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1085614-provision/
        2018-06-18 16:16:07  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1083781-repair/
        2018-06-18 16:10:32  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1083715-provision/
        2018-06-18 10:25:49  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1081998-repair/
        2018-06-18 10:19:57  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1081978-provision/
        2018-06-18 06:22:59  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1080965-repair/
        2018-06-18 06:18:13  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1080897-provision/
        2018-06-18 04:53:51  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1080275-repair/
        2018-06-18 04:49:00  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1080210-provision/
        2018-06-18 02:15:05  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1078889-repair/
        2018-06-18 02:09:40  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack3-host7/1078836-provision/

The host is (mostly) stuck in a repair loop.

With the new provision code, the error is relatively easy to see; it's
in the attached `status.log`.  Basically, "postinst" is failing.

The most likely explanation is that there's a hardware problem with the
DUT's storage.

Someone (the deputy) should log in to the host, check /var/log/messages
for storage complaints, and if the storage looks bad, file a ticket with
englab-sys-cros to decommission/replace the unit.

 
status.log
4.6 KB View Download
I've locked the host pending action:

$ atest host mod --lock -r  crbug.com/854064  chromeos4-row8-rack3-host7
Locked host: 
	chromeos4-row8-rack3-host7

... Following up on possible causes of the problem.  The relevant
error is this:
  Filesystem hash verification failed
  Expected c33e6ed2484e10dbb8b42398b4cf2bd758825a42 != actual 
9fec6d8fa491064d53b7075b34f82c571cc8fed9

A storage failure that corrupts data is an obvious explanation of
the problem.  Other possible problems would be memory or CPU failure,
or maybe network corruption...  A software bug is least likely, although
it might be unwise to rule it out right away.

Comment 3 by jkop@chromium.org, Jun 20 2018

Status: Started (was: Assigned)
Sent it a repair job to get more current logs.

Comment 4 by jkop@chromium.org, Jun 20 2018

Status: Assigned (was: Started)
Repair job completed; repair failed. /var/log/messages attached.

It does not look to me like there is a storage failure.
messages
116 KB View Download

Comment 5 by jkop@chromium.org, Jun 20 2018

Status: Fixed (was: Assigned)
Passed to englab-sys-cros, b/110488394

Sign in to add a comment