New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 818759 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Deputy Tracking: Large number of devices failing automated repair

Project Member Reported by pprabhu@chromium.org, Mar 5 2018

Issue description

I started the week to a large number of devices that died slowly and were not auto-repaired.
Using this bug to track all of those and chase specific root causes separately.

This bug should be closed at the end of the week as this is only an incident report bug.
 
nasher:

pprabhu@pprabhu:chromiumos$ dut-status -m nasher -p bvt -n | xargs -i dut-status -g {}
chromeos2-row4-rack9-host2
    2018-03-01 16:27:16  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack9-host2/296414-repair/
    2018-03-01 15:49:45  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack9-host2/295732-provision/
    2018-03-01 15:47:21  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack9-host2/295678-repair/
chromeos2-row4-rack9-host11
    2018-02-24 02:07:11  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack9-host11/187019-repair/
    2018-02-24 01:27:28  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack9-host11/186423-provision/
    2018-02-24 01:26:50  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack9-host11/186405-cleanup/
chromeos2-row4-rack9-host12
    2018-02-28 00:15:18  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack9-host12/270827-repair/
    2018-02-27 23:22:11  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack9-host12/269683-provision/
    2018-02-27 22:41:31  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack9-host12/269017-repair/
chromeos2-row4-rack9-host17
    2018-02-28 22:58:15  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack9-host17/286368-repair/
    2018-02-28 20:50:45  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack9-host17/284215-provision/
    2018-02-28 19:42:07  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack9-host17/283516-cleanup/
chromeos2-row4-rack10-host16
    2018-02-27 22:28:21  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack10-host16/268811-repair/
    2018-02-27 21:54:58  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack10-host16/268138-provision/
    2018-02-27 21:28:17  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack10-host16/267657-repair/


2 of these are issue 695151 where chromeos-install fails due to disk corruption.
In one instance, image failed to bot from USB
Another one had no servo support (the servo was likely down)
banjo:

pprabhu@pprabhu:chromiumos$ dut-status -m banjo -p bvt -n | xargs -i dut-status -g {}
chromeos6-row1-rack5-host15
    2018-03-01 22:50:13  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack5-host15/301344-repair/
    2018-03-01 22:32:09  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack5-host15/301070-reset/
    2018-03-01 22:31:44  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack5-host15/301059-cleanup/
No servo

chromeos6-row1-rack5-host17
    2018-02-26 12:27:46  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack5-host17/239190-repair/
    2018-02-26 12:19:14  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack5-host17/239149-provision/
    2018-02-26 07:30:31  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack5-host17/235829-cleanup/
No servo

chromeos6-row1-rack5-host21
    2018-02-15 16:24:32  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack5-host21/79293-repair/
    2018-02-15 16:19:01  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack5-host21/79260-cleanup/
    2018-02-15 15:36:49  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack5-host21/78965-reset/
No servo

chromeos6-row1-rack5-host9
    2018-02-19 21:23:03  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack5-host9/127082-repair/
    2018-02-19 21:18:10  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack5-host9/127065-cleanup/
    2018-02-19 20:27:41  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack5-host9/126841-reset/

chromeos6-row1-rack7-host3
    2018-02-19 12:27:50  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack7-host3/124659-repair/
    2018-02-19 12:19:08  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack7-host3/124627-provision/
    2018-02-19 09:45:16  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack7-host3/124137-cleanup/
No servo

chromeos6-row1-rack7-host9
    2018-02-26 22:37:32  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack7-host9/247828-repair/
    2018-02-26 22:19:28  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack7-host9/247417-reset/
    2018-02-26 22:19:04  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack7-host9/247393-cleanup/
No servo


(1) Not a catastrophic failure of all DUTs together.
(2) Does banjo have no servo support overall?



Cc: jrbarnette@chromium.org ayatane@chromium.org
stout:

pprabhu@pprabhu:chromiumos$ dut-status -m stout -p bvt -n | xargs -i dut-status -g {}
chromeos6-row2-rack4-host20
    2018-02-21 21:32:38  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack4-host20/537458-repair/
    2018-02-21 21:27:48  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack4-host20/537371-cleanup/
    2018-02-21 20:54:20  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack4-host20/537091-reset/
No servo.

chromeos6-row2-rack4-host17
    2018-02-26 20:26:53  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack4-host17/569922-repair/
    2018-02-26 20:18:21  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack4-host17/569869-provision/
    2018-02-26 15:05:44  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack4-host17/569214-cleanup/
No servo.

chromeos6-row2-rack5-host11
    2018-03-04 13:50:06  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack5-host11/599388-repair/
    2018-03-04 13:47:47  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack5-host11/599382-reset/
    2018-03-04 13:39:45  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/180872353-chromeos-test/
    2018-03-04 13:39:20  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack5-host11/599332-reset/
No servo.


(1) Not a catastrophic failure of all DUTs together.
(2) Does stout have no servo support overall?
> pprabhu@pprabhu:chromiumos$ dut-status -m stout -p bvt -n | xargs -i dut-status -g {}

That's a lot of unnecessary circumlocution.  This is equivalent:
    $ dut-status -m stout -p bvt -n -g

Looking at the failures, they seem spread out over several days with
distinct patterns, so probably there's not a single root cause.


> (2) Does stout have no servo support overall?

Stout has no servo support at all.

scarlet:

No servo support in any of these. Didn't fail together => rebalance.

pprabhu@pprabhu:chromiumos$ dut-status -m scarlet -p bvt -n | xargs -i dut-status -g {}
chromeos2-row1-rack11-host1
    2018-03-02 22:12:26  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack11-host1/62624061-repair/
    2018-03-02 21:29:52  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack11-host1/62623691-provision/
    2018-03-02 11:10:27  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack11-host1/62617495-cleanup/
chromeos2-row2-rack11-host8
    2018-03-02 06:59:25  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack11-host8/62614736-repair/
    2018-03-02 06:14:01  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack11-host8/62614372-provision/
    2018-03-02 06:09:47  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack11-host8/62614319-cleanup/
chromeos2-row2-rack11-host10
    2018-02-27 06:07:27  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack11-host10/62567002-repair/
    2018-02-27 05:25:35  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack11-host10/62566608-provision/
    2018-02-27 05:24:14  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack11-host10/62566593-cleanup/

sand: 3 failed DUTs, separate times, no servo support => rebalance.
pprabhu@pprabhu:chromiumos$ dut-status -n -g -p bvt -m robo360
chromeos2-row4-rack11-host8
    2018-03-05 15:43:45  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack11-host8/341518-repair/
    2018-03-05 15:10:06  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack11-host8/341129-provision/
    2018-03-05 14:42:13  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack11-host8/340901-repair/

servo repair failed because python is missing (This is weird)

chromeos2-row4-rack11-host16
    2018-03-04 10:17:23  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack11-host16/327906-repair/
    2018-03-04 09:39:20  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack11-host16/327575-provision/
    2018-03-04 09:38:13  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack11-host16/327561-cleanup/

issue 695151

chromeos2-row4-rack11-host21
    2018-02-25 13:20:23  NO http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack11-host21/218192-repair/
    2018-02-25 12:37:49  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack11-host21/217554-provision/
    2018-02-25 10:26:56  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack11-host21/216611-cleanup/

Device failed to boot from usb.

Different times => rebalance.
Status: Fixed (was: Assigned)

Sign in to add a comment