New issue
Advanced search Search tips

Issue 739180 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jul 2017
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Flaky unreachabe devices on KitKat Tablet Tester

Project Member Reported by perezju@chromium.org, Jul 4 2017

Issue description

chrome_public_test_apk has been often failing on KitKat Tablet Tester [1] due to devices becoming unreachable (either NoDevicesError or DeviceUnreachableError).

[1]: https://luci-milo.appspot.com/buildbot/chromium.android/KitKat%20Tablet%20Tester/

Looking at the devices failing accross shards on the most recent 40 builds we see:

                     latest_build                                    status
device                                                                     
build42-b1--device1          8109  2---1F--------1---1---?--1--1111-111----
build42-b1--device2          8109  11--1---1--F--1---11--2-11---11F--?1--1-
build42-b1--device3          8109  21----1-1-1-1-1---1---2--?--11----1---1-
build42-b1--device4          8109  21--1-1-1-1-1-1---1F1-2-12---1----311-1-
build42-b1--device5          8109  ----------------------------------------
build42-b1--device6          8109  ----------------------------------------
build42-b1--device7          8109  ----------------------------------------


Where:
 1, 2, 3: number of shards failing with device issues.
 F: some shards failed due to other reasons. 
 ?: shard output not found.
 -: shard was successful.

Clearly seems like devices 1--4 are the ones having most of the problems. Maybe those should be replaced?
 
Cc: -jbudorick@chromium.org
Components: Infra>Client>Android
Owner: jbudorick@chromium.org
Status: Started (was: Untriaged)
Looking into it. I'll hand this over to labs if necessary.
Labels: -Pri-3 Pri-2
Triggered provision_devices tasks on all four this morning.
Shards are still failing, but no longer for device reasons, so... yay? :D

                     latest_build                                   status
device                                                                    
build42-b1--device1          8150  -----F-F--F-1-1---1---F-12------1----1F
build42-b1--device2          8150  F---F----FF-F--F---F-----21----1-----1-
build42-b1--device3          8150  ----F-FFF--F11----F-----121-----1-----1
build42-b1--device4          8150  -----FF-FF-F1F-FFF1-----11-------------
build42-b1--device5          8150  -----F-------F--F--F-------------------
build42-b1--device6          8150  -----F------F-----F--------------------
build42-b1--device7          8150  --F--F-----F--F--F---------------------

Is provision_devices something that needs to be now manually triggered every now and then?
It shouldn't be, as swarming does some amount of device resetting between tasks. If that's not sufficient over time, the right answer should be to identify what in provision_devices should be ported over.
Status: Fixed (was: Started)
In recent builds, the primary issues have been:
 - a UrlOverridingTest failing
 - shards hitting the 30 (!) minute timeout

Marking this issue as fixed, will work on the others separately.

Sign in to add a comment