Issue metadata
Sign in to add a comment
|
Android swarming tasks occasionally run on slaves without live devices |
||||||||||||||||||||||||
Issue descriptionI've seen a few intermittent failures on the Android buildbots where all tests appear to have been run (and passed / skipped as appropriate) but the run fails. In each case the failure seemed to be related to an exit code returned from collect_cmd. Here is an example from this run: https://build.chromium.org/p/chromium.linux/builders/Android%20Tests/builds/44513 Last N lines of stdio for the failing webview_instrumentation_test_apk tests: I 511.628s Main All tests completed. C 511.629s Main ******************************************************************************** C 511.629s Main Summary C 511.629s Main ******************************************************************************** C 511.629s Main [==========] 123 tests ran. C 511.629s Main [ PASSED ] 121 tests. C 511.629s Main [ SKIPPED ] Skipped 2 tests, listed below: C 511.629s Main [ SKIPPED ] org.chromium.android_webview.test.AcceptLanguageTest#testAcceptLanguagesWithenUS C 511.629s Main [ SKIPPED ] org.chromium.android_webview.test.AcceptLanguageTest#testAcceptLanguagesWithenUS with --webview-sandboxed-renderer C 511.629s Main ******************************************************************************** I 511.630s TimeoutThread-1-for-individual_device_tear_down(06b9c48513c86b6d) [host]> /b/swarming/w/ir/third_party/android_tools/sdk/platform-tools/adb -s 06b9c48513c86b6d shell '( rm -f /data/local/tmp/android-webview-command-line );echo %$?' I 511.731s TimeoutThread-1-for-individual_device_tear_down(06b9c48513c86b6d) [host]> /b/swarming/w/ir/third_party/android_tools/sdk/platform-tools/adb -s 06b9c48513c86b6d shell '( test -e /data/local/tmp/android-webview-command-line );echo %$?' I 511.794s individual_device_tear_down(06b9c48513c86b6d) Flags now set on the device: [] I 511.795s TimeoutThread-1-for-individual_device_tear_down(06b9c48513c86b6d) [host]> /b/swarming/w/ir/third_party/android_tools/sdk/platform-tools/adb -s 06b9c48513c86b6d shell '( am clear-debug-app );echo %$?' I 512.427s TimeoutThread-1-for-individual_device_tear_down(06b9c48513c86b6d) [host]> /b/swarming/w/ir/third_party/android_tools/sdk/platform-tools/adb -s 06b9c48513c86b6d shell '( rm -f /data/local/tmp/chrome_timeout_scale );echo %$?' I 512.495s tear_down_device(06b9c48513c86b6d) Wrote device cache: /b/swarming/w/ir/out/Release/device_cache_06b9c48513c86b6d.json I 512.760s Main Opening text logdog stream, unified_logcats C 512.812s Main Logcat: https://luci-logdog.appspot.com/v/?s=chromium%2Fandroid%2Fswarming%2Flogcats%2F37d6a9e2ff918511%2F%2B%2Funified_logcats [I2017-08-07T18:06:28.091800Z 18786 0 main.go:349] Terminating. {"returnCode":0} +------------------------------------------------------------------------------------+ | End of shard 0 Pending: 0.4s Duration: 514.2s Bot: build85-b4--device3 Exit: 0 | +------------------------------------------------------------------------------------+ Total duration: 4016.5s WARNING:root:collect_cmd had non-zero return code: 87 step returned non-zero exit code: 87
,
Aug 7 2017
,
Aug 7 2017
,
Aug 7 2017
+John specifically for triage
,
Aug 7 2017
The examples you linked are two separate issues, neither of which is related to the exit code. #0 has one of the shards (https://chromium-swarm.appspot.com/task?id=37d6a9ece09a5010) failing because it got assigned a bot without a live device. +bpastene and I were talking about this last week -- something about the device dying in on_before_task? Repurposing this bug for that. #1 is symptomatic of a problem I noticed looking at the bots this morning -- runtimes for unit_tests on Android Tests (dbg) have been rising conspicuously since July 27. I'm not yet sure why. Filed as https://bugs.chromium.org/p/chromium/issues/detail?id=753059
,
Aug 8 2017
I'll take this for the no devices problem.
,
Aug 8 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/d2aefb6d321d8f24eba47bbaf185214d1f164c63 commit d2aefb6d321d8f24eba47bbaf185214d1f164c63 Author: Benjamin Pastene <bpastene@chromium.org> Date: Tue Aug 08 18:43:55 2017
,
Aug 8 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/182ff74d29ba7f082d8311fc74821ee9faa2a080 commit 182ff74d29ba7f082d8311fc74821ee9faa2a080 Author: Benjamin Pastene <bpastene@chromium.org> Date: Tue Aug 08 21:34:46 2017
,
Aug 14 2017
Do you have an update on this issue. We appear to still be running into "all devices are blacklisted" errors: https://uberchromegw.corp.google.com/i/chromium.linux/builders/Android%20Tests/builds/44770
,
Aug 14 2017
I've got another change coming down that should fix that particular problem. It looks like we're rebooting the device but not waiting for it to fully complete the boot-up process before scheduling a new test. Hence why we can see the device and can run some simple commands, but can't do anything more complex like touch the sd card or talk with the package manager. Raising pri since this looks to be responsible for quite a bit of flake on https://build.chromium.org/p/chromium.linux/builders/Android%20Tests
,
Aug 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/f57737b20a79c7b9672f08de0f58c98c647c0cb6 commit f57737b20a79c7b9672f08de0f58c98c647c0cb6 Author: Benjamin Pastene <bpastene@chromium.org> Date: Tue Aug 15 19:00:21 2017 swarming: Roll python-adb to 55aea2... Also quarantine a device if IsFullyBooted is false. Bug: 753046 Change-Id: I81c2d6d40a99d48f4f47c311f2df2c315891c155 Reviewed-on: https://chromium-review.googlesource.com/614622 Reviewed-by: Kevin Lubick <kjlubick@chromium.org> Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> Commit-Queue: Benjamin Pastene <bpastene@chromium.org> [modify] https://crrev.com/f57737b20a79c7b9672f08de0f58c98c647c0cb6/appengine/swarming/swarming_bot/api/os_utilities.py [modify] https://crrev.com/f57737b20a79c7b9672f08de0f58c98c647c0cb6/appengine/third_party/python-adb/README.swarming [modify] https://crrev.com/f57737b20a79c7b9672f08de0f58c98c647c0cb6/appengine/third_party/python-adb/adb/contrib/high.py
,
Aug 16 2017
Just had the same issue (?) when trying to submit a change to the CQ: https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/363734 chrome_public_test_apk step is red, but all tests appear to have passed.
,
Aug 16 2017
Yeah, same issue. That test ran on a bot that should have been quarantined: https://chromium-swarm.appspot.com/bot?id=build65-b4--device2 There was a string of test failures that lasted a few hours when the device was unsuitable for tasks. Our bot-health checks should have caught that the device was wonkers and quarantined it. It was eventually quarantined but not soon enough: http://shortn/_sCDaHUj0sP I need to look into why we didn't catch that sooner.
,
Aug 16 2017
Duping this into bug 748145 since it's the same root cause. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by joedow@google.com
, Aug 7 2017