New issue
Advanced search Search tips

Issue 862387 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 31
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Handful of K swarming devices going bad

Project Member Reported by bsheedy@chromium.org, Jul 10

Issue description

It looks like a handful of the Nexus 5s with KitKat on swarming are going bad, causing very high failure rates, resulting in CQ flakes.

For example, compare the number of failed tasks on a good device (https://chromium-swarm.appspot.com/bot?id=build22-b4--device4&show_all_tasks=true&show_full_names=true&sort_stats=total%3Adesc) to one of the bad devices (https://chromium-swarm.appspot.com/bot?id=build935-m4--device3&show_full_names=true&sort_stats=total%3Adesc)

So far, I've found the following devices to be bad, but there may be more
build935-m4--device3
build95-b4--device1
build259-m1--device4
 
Cc: bsheedy@chromium.org tiborg@chromium.org bajones@chromium.org
 Issue 862362  has been merged into this issue.
 Issue 862361  has been merged into this issue.
 Issue 862333  has been merged into this issue.
 Issue 862332  has been merged into this issue.
 Issue 862330  has been merged into this issue.
Damn... that's not good. Thanks for filing this Brian.

These bots should be quarantining themselves if they fail too many tasks. Clearly that mechanism is broken. For now I'll trigger reflashes on all the affected bots:
https://chromium-swarm.appspot.com/task?id=3e9f0ccc053d8010
https://chromium-swarm.appspot.com/task?id=3e9f0803b200c210
https://chromium-swarm.appspot.com/task?id=3e9f1d4f36830810

Usually that heals them. If not, I'll take them offline.
 Issue 862447  has been merged into this issue.
Reflashes didn't help. Took them all offline.
Status: Fixed (was: Assigned)
Going to mark this as fixed since AFAIK all the devices that were causing issues have been culled.

Sign in to add a comment