Handful of K swarming devices going bad |
||
Issue descriptionIt looks like a handful of the Nexus 5s with KitKat on swarming are going bad, causing very high failure rates, resulting in CQ flakes. For example, compare the number of failed tasks on a good device (https://chromium-swarm.appspot.com/bot?id=build22-b4--device4&show_all_tasks=true&show_full_names=true&sort_stats=total%3Adesc) to one of the bad devices (https://chromium-swarm.appspot.com/bot?id=build935-m4--device3&show_full_names=true&sort_stats=total%3Adesc) So far, I've found the following devices to be bad, but there may be more build935-m4--device3 build95-b4--device1 build259-m1--device4
,
Jul 10
Issue 862361 has been merged into this issue.
,
Jul 10
Issue 862333 has been merged into this issue.
,
Jul 10
Issue 862332 has been merged into this issue.
,
Jul 10
Issue 862330 has been merged into this issue.
,
Jul 10
Damn... that's not good. Thanks for filing this Brian. These bots should be quarantining themselves if they fail too many tasks. Clearly that mechanism is broken. For now I'll trigger reflashes on all the affected bots: https://chromium-swarm.appspot.com/task?id=3e9f0ccc053d8010 https://chromium-swarm.appspot.com/task?id=3e9f0803b200c210 https://chromium-swarm.appspot.com/task?id=3e9f1d4f36830810 Usually that heals them. If not, I'll take them offline.
,
Jul 10
Dremel'ed the swarming_tasks table and found a few more bad apples: https://chromium-swarm.appspot.com/bot?id=build11-b4--device2 https://chromium-swarm.appspot.com/bot?id=build92-b4--device5 https://chromium-swarm.appspot.com/bot?id=build62-b4--device1 Triggered reflashes on them as well.
,
Jul 11
Issue 862447 has been merged into this issue.
,
Jul 11
Reflashes didn't help. Took them all offline.
,
Oct 31
Going to mark this as fixed since AFAIK all the devices that were causing issues have been culled. |
||
►
Sign in to add a comment |
||
Comment 1 by bsheedy@chromium.org
, Jul 10