New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 770806 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

wizpigs in the lab are on repair/verify failure loop

Project Member Reported by davidri...@chromium.org, Oct 2 2017

Issue description

There are a number of wizpigs in the lab which appear to be repeatedly repair/verify failing.

The first three I checked from the suites pool are exhibiting this issue.
chromeos2-row8-rack8-host10
chromeos2-row8-rack8-host13
chromeos2-row8-rack8-host14
 
While they need repairing, they aren't causing tests to fail or anything like that. Since they are in the suites pool (which is the dumping ground), there is no need to lock or further isolate them.

However, they do need to be fixed.
Won't they potentially pick out of band test suites?  

Also, if there is some systemic issue with software, product team should be involved to identify/fix these issues.
Still working on the push, but the one I looked at is failing USB repair, I didn't understand why at first glance.

And no, since they are in repair failed state, they won't be used for any tests.
This likely should be WontFix:
  * When devices fail repair, the system periodically
    verifies the devices automatically.  So the "loop"
    behavior described is WAI.
  * As dgarrett@ noted, devices that fail repair aren't
    used for testing, and so don't cause test failures.
  * There are automated processes for identifying failed
    devices and requesting manual fixes.  The CrOS Infra
    team doesn't get involved unless the volume of failures
    reduces supply so much that tests can't run.

Labels: -Pri-3 Pri-1
We've got 3/3 out of devices (that I checked) in this state on a device that is experiencing 3-4% provision failures (last time I checked) which is much higher than other devices, and also mirrors devices from the same board family that is causing issues (cyan).  I'm pretty sure that from looking at the host list that the actual count is probably at least 6-8 wizpigs.

We want to get to the bottom of these provisioning failures which are causing CQ failures, and understanding these failures would be helpful to that end.  Alternatively, since we have a lot of wizpigs, if they're up and going and I could run tests on 10 devices at once instead of 2, it would potentially make reproduction of issue 639301 much easier.
PS: The shard for wizpig was having DB corruption issues yesterday that are believed fixed.

I haven't yet rechecked the devices to see what state they are in, since I'm still trying to get software push to work.
dgarrett: what was the bug for the shard corruption issue? Wizpig is on chromeos-server104, but I don't see a bug mentioning that server recently.
It was the skunk1 issue that Aviv worked on yesterday, I don't know if there was a bug, only that he declared victory (I was working on skunk-1 at the same time, very confusing).
Or do I have it confused? There were many shard issues yesterday, all fixed to the best of my knowledge.
Cc: akes...@chromium.org
Perhaps #7/#9 refer to  issue 770865  "shard_client crashlooping on chromeos-skunk-1 | board:auron_paine outage"




Status: Assigned (was: Untriaged)
 issue 771257  may be related since it shows two wizpig continuously failing in pool:cq, this does block the master-paladin.

Host chromeos6-row2-rack20-host20 continuously fails in Verify/Repair
Host chromeos6-row2-rack20-host4  fails in Provision


Owner: nxia@chromium.org
Transferring outstanding deputy bugs.

Comment 13 by nxia@chromium.org, Nov 13 2017

Cc: dgarr...@chromium.org
Filed the repair ticket at b/69254297

Comment 14 by nxia@chromium.org, Nov 13 2017

Status: Fixed (was: Assigned)

Comment 15 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Comment 16 by dchan@chromium.org, Jan 23 2018

Status: Fixed (was: Archived)

Sign in to add a comment