New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 640715 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Add Kevin DVT2 to pool:cts

Project Member Reported by alexpau@chromium.org, Aug 24 2016

Issue description

Please add remaining 5 devices to pool:cts 

Locations for the kevin devices are as follows:
chromeos2-row6-rack5-host1  -- pool:bvt
chromeos2-row6-rack5-host2  -- pool:bvt
chromeos2-row6-rack5-host3  -- pool:bvt
chromeos2-row6-rack5-host4  -- pool:bvt
chromeos2-row6-rack5-host5  -- pool:bvt
chromeos2-row6-rack5-host6  
chromeos2-row6-rack6-host20
chromeos2-row6-rack6-host21 -- pool:bvt
chromeos2-row6-rack6-host22
chromeos2-row6-rack5-host7
chromeos2-row6-rack5-host8
 
Cc: jrbarnette@chromium.org
That will leave us with 0 spares. Is that acceptable?
yes, we'll stock the lab with pvts early next week, but we'd like to get coverage asap
I've moved 3 DUTS in the pool:cts, because that's all of the working spares we had. Looking into the broken ones.
0 spares means that the DUTs will no longer qualify as "managed",
which means failures won't be automatically scheduled for manual
repairs.  That's not a problem for the Infra team, and it's not
a problem for the team in Stierlin Ct:  The moment you remove the
spares, the devices aren't our problem anymore.

For anyone that cares about the health of the kevin test pools,
you'll have to track the supply yourself, and file the tickets
when there are problems.

Please note that tickets to fix only 1-2 devices may get low
priority.

jrbarnette - hoping we don't end up with too many unhealthy devices between now and next week when a large # of pvts are deployed to the lab

dgarrett - what can we do to help diagnose broken cts pool candidates?
These are the two broken duts I was starting to look at. Diagnosing broken duts is NOT my specialty.

dgarrett$dut-status chromeos2-row6-rack5-host20 chromeos2-row6-rack5-host6
hostname                       S   last checked         URL
chromeos2-row6-rack5-host20    OK  2016-08-24 14:50:34  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack5-host20/58235032-provision/
chromeos2-row6-rack5-host6     ??  2016-08-19 01:33:35  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack5-host6/58213971-repair/

> what can we do to help diagnose broken cts pool candidates?

Links to the repair logs/failure history of the problem children
are below.  You can click on these, and see if the problem is
something other than "the DUT went offline without explanation."
Otherwise, the fix is to file a ticket to request repair and logs.


$ dut-status -n -g -b kevin
chromeos2-row6-rack5-host5
    2016-08-24 14:16:16  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack5-host5/58234862-repair/
    2016-08-24 14:11:37  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack5-host5/58234848-cleanup/
    2016-08-24 13:36:17  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/74299154-chromeos-test/
    2016-08-24 13:35:28  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack5-host5/58234672-reset/
chromeos2-row6-rack6-host20
    2016-08-22 07:20:55  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack6-host20/58225511-repair/
    2016-08-22 07:16:20  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack6-host20/58225508-reset/
    2016-08-21 14:02:25  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack6-host20/58223022-repair/
chromeos2-row6-rack6-host21
    2016-08-24 13:22:59  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack6-host21/58234607-repair/
    2016-08-24 13:13:12  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack6-host21/58234536-provision/
    2016-08-24 06:40:52  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/74285306-chromeos-test/
    2016-08-24 06:40:07  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack6-host21/58233475-reset/

Regarding the repair logs:  Start with "status.log".  There's a
summary there of what worked and what failed.
The two that I linked above both appear to have repaired successfully in the linked status.log messages.
Cc: moch@chromium.org atul.mog...@synerzip.com

Comment 11 by moch@chromium.org, Aug 25 2016

Are all devices working as expected or are there still failures?

Alex, if devices are still failing can I get one to investigate.
Wait, not all failures justify a full-scale debug effort.

Is there something in the logs to make you suspect the two
down devices are more than merely a generic bug in the software?

If you're not certain, you can request an inspection and logs
here:
    go/cros-lab-device-repair

Comment 13 by moch@chromium.org, Aug 25 2016

I was told that broken devices were delivered to the lab. Can you confirm this is not the case?

Katie/Alex, I'll let you pitch in here regarding debugging required.
> I was told that broken devices were delivered to the
> lab. Can you confirm this is not the case?

Some recently installed kevin DUTs have failed repair.
The failure history is in c#7.

The DUTs were working when deployed.  There's currently no
reason to believe that the reason for failing repair is
anything other than "just a software bug."

Has remaining units been added to pool:cts?

Also, are there any additional failures aside from the 2 that repaired successfully.
I've moved the remaining failed pool:suites DUTs to pool:cts.
I think if there are actual issues because of DUT failures,
they should addressed with a new bug.

$ balance_pool -s cts -t 0 suites kevin

Balancing kevin suites pool:
Total 2 DUTs, 0 working, 2 broken, 0 reserved.
Target is 0 working DUTs; no change to pool size.
kevin suites pool has 3 spares available.
kevin suites pool will return 2 broken DUTs, leaving 0 still in the pool.
Transferring 2 DUTs from suites to cts.
Updating host: chromeos2-row6-rack5-host6.
Removing labels ['pool:suites'] from host chromeos2-row6-rack5-host6
Adding labels ['pool:cts'] to host chromeos2-row6-rack5-host6
Updating host: chromeos2-row6-rack6-host20.
Removing labels ['pool:suites'] from host chromeos2-row6-rack6-host20
Adding labels ['pool:cts'] to host chromeos2-row6-rack6-host20

Status: Fixed (was: Assigned)

Sign in to add a comment