Purple tests on many android devices |
||||
Issue descriptionRandy, I'm not sure if you have any ideas on how to triage; I know you wrote some of the code to turn android perf steps purple but not sure if this is related? Seeing this on several bots: https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5%20Perf%20%281%29/builds/4638 https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5%20Perf%20%282%29/builds/4284 https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus7v2%20Perf%20%282%29/builds/3231 https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus9%20Perf%20%282%29/builds/3515 Some, but not all of the tests on a device go purple. It's my understanding that the order the tests are shown on the buildbot status page is not necessarily the order they were run in, so my guess is that something is going wrong and the rest of the runs on the device fail afterwards. But all I see in the logs is basically a message that telemetry was never run: /usr/bin/python /b/rr/tmpCPIpw6/w/src/build/android/test_runner.py perf --print-step page_cycler_v2.intl_ko_th_vi --verbose --adb-path /b/rr/tmpCPIpw6/w/src/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb --blacklist-file /b/rr/tmpCPIpw6/w/src/out/bad_devices.json --output-chartjson-data=/tmp/tmpoV_HSE I 0.003s Main command: /b/rr/tmpCPIpw6/w/src/build/android/test_runner.py perf --print-step page_cycler_v2.intl_ko_th_vi --verbose --adb-path /b/rr/tmpCPIpw6/w/src/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb --blacklist-file /b/rr/tmpCPIpw6/w/src/out/bad_devices.json --output-chartjson-data=/tmp/tmpoV_HSE E 0.003s Main File not found /b/rr/tmpCPIpw6/w/src/out/step_results/page_cycler_v2.intl_ko_th_vi E 0.004s Main Error occurred. Traceback (most recent call last): File "/b/rr/tmpCPIpw6/w/src/build/android/test_runner.py", line 869, in main return RunTestsCommand(args) File "/b/rr/tmpCPIpw6/w/src/build/android/test_runner.py", line 693, in RunTestsCommand return RunTestsInPlatformMode(args) File "/b/rr/tmpCPIpw6/w/src/build/android/test_runner.py", line 760, in RunTestsInPlatformMode raw_results = test_run.RunTests() File "/b/rr/tmpCPIpw6/w/src/build/android/pylib/local/device/local_device_perf_test_run.py", line 498, in RunTests result_type = self._test_instance.PrintTestOutput() File "/b/rr/tmpCPIpw6/w/src/build/android/pylib/perf/perf_test_instance.py", line 124, in PrintTestOutput raise PersistentDataError('No data for test %s found.' % self._print_step) PersistentDataError: No data for test page_cycler_v2.intl_ko_th_vi found. <Thread(Thread-1, started 140700227208960)> ProcessRead: proc.stdout finished. <Thread(Thread-1, started 140700227208960)> ProcessRead: cleaning up. <Thread(Thread-2, started daemon 140700218816256)> TimedFlush: Finished <Thread(Thread-1, started 140700227208960)> ProcessRead: finished. exit code (as seen by runtest.py): 87
,
Nov 9 2016
Making the print order match the execution order would be great. Right now it seems to be alphabetical within a shard at print time. As for why it's getting blacklisted multiple times: seems like we continue trying to use the device within RecoverDevice even if we've blacklisted it with a reboot timeout on the first call. https://codesearch.chromium.org/chromium/src/third_party/catapult/devil/devil/android/tools/device_recovery.py?rcl=0&l=89
,
Nov 9 2016
We never get to the sysrq reboot because we time out on the preceding Root call. If we get to that point, maybe we should just try the sysrq reboot anyway. i.e., this: W 13397.414s device_shard_helper(4) Timed out while attempting to reboot 0accace943e4af1c normally.Attempting alternative reboot. I 13397.415s TimeoutThread-1-for-device_shard_helper(4) [host]> /b/rr/tmpq9NLiP/w/src/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 0accace943e4af1c root C 13427.456s device_shard_helper(4) Timed out. Dumping threads. is: https://codesearch.chromium.org/chromium/src/third_party/catapult/devil/devil/android/tools/device_recovery.py?rcl=0&l=74
,
Nov 9 2016
,
Nov 9 2016
There is a CL out for running them in alphabetical order: https://codereview.chromium.org/2486993003/ And a CL for trying to reboot via sysrq even if root times out: https://codereview.chromium.org/2491493003/
,
Nov 9 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4fea9e224b390c824a12c579a97fa2c75d0e420c commit 4fea9e224b390c824a12c579a97fa2c75d0e420c Author: rnephew <rnephew@chromium.org> Date: Wed Nov 09 20:27:12 2016 [Android] Make android test runner run perf tests in alphabetical order. BUG= 663748 Review-Url: https://codereview.chromium.org/2486993003 Cr-Commit-Position: refs/heads/master@{#431023} [modify] https://crrev.com/4fea9e224b390c824a12c579a97fa2c75d0e420c/build/android/pylib/local/device/local_device_perf_test_run.py
,
Nov 9 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/43d8001b1061e74089931af72646a94199c3ae73 commit 43d8001b1061e74089931af72646a94199c3ae73 Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Wed Nov 09 21:45:24 2016 Roll src/third_party/catapult/ 63e5a71b1..c93c05da3 (2 commits). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/63e5a71b1a49..c93c05da3068 $ git log 63e5a71b1..c93c05da3 --date=short --no-merges --format='%ad %ae %s' 2016-11-09 rnephew [Devil] Attempt to reboot via sysrq even if root fails. 2016-11-09 jbudorick [devil] Check that the process hasn't exited before killing it. BUG= 663748 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, see: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2489093002 Cr-Commit-Position: refs/heads/master@{#431041} [modify] https://crrev.com/43d8001b1061e74089931af72646a94199c3ae73/DEPS
,
Nov 10 2016
The tests are run in order now, so that should help seeing what is happening. https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus9%20Perf%20%282%29/builds/3523 page_cycler_v2.basic_oopif fails and every test after that is purple on that device. First thing bad that happens is during the fifa page. ERROR:root:Problem when trying to gather stack trace: (device: HT4B7JT01059) adb shell '( uiautomator dump /data/local/tmp/temp_file-64d76ce8e3254 );echo %$?': failed with exit status 255 and output: - error: device 'HT4B7JT01059' not found INFO:root:*************** BROWSER STANDARD OUTPUT *************** INFO:root:Cannot get standard output on Android INFO:root:*********** END OF BROWSER STANDARD OUTPUT ************ INFO:root:********************* BROWSER LOG ********************* INFO:root:No log file INFO:root:***************** END OF BROWSER LOG ****************** Then it looks like telemetry thinks devtools crashed: Traceback (most recent call last): _RunStoryAndProcessErrorIfNeeded at /b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py:110 test.DidRunStory(state.platform) DidRunStory at /b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py:306 platform.tracing_controller.StopTracing() StopTracing at /b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/core/tracing_controller.py:47 return self._tracing_controller_backend.StopTracing() StopTracing at /b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:108 self._IssueClockSyncMarker() _IssueClockSyncMarker at /b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:203 self._RecordIssuerClockSyncMarker) RecordClockSyncMarker at /b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py:175 raise ChromeClockSyncError('Cannot issue clock sync. No devtools clients') ChromeClockSyncError: Cannot issue clock sync. No devtools clients ******************************************************************************** (/b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:397 _ConvertExceptionFromInspectorWebsocket) Original exception: ******************************************************************************** (/b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:418 _AddDebuggingInformation) Received a socket error in the browser connection and the tab no longer exists. The tab probably crashed. ******************************************************************************** (/b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:419 _AddDebuggingInformation) Debugger url: ws://127.0.0.1:54462/devtools/page/0 Found Minidump: False Stack Trace: ******************************************************************************** ******************************************************************************** Standard output: ******************************************************************************** From the test runner logs we get this after the failure: I 5375.715s device_shard_helper(2) page_cycler_v2.basic_oopif : exit_code=255 in 688 secs on device HT4B7JT01059 I 5375.718s device_shard_helper(2) Unmapping device ports for HT4B7JT01059. I 5375.719s device_shard_helper(2) [host]> /b/rr/tmpJQd1OL/w/src/out/Release/host_forwarder --adb=/b/rr/tmpJQd1OL/w/src/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb --serial-id=HT4B7JT01059 --unmap-all E 5375.760s device_shard_helper(2) Exception when resetting ports. Traceback (most recent call last): File "/b/rr/tmpJQd1OL/w/src/build/android/pylib/local/device/local_device_perf_test_run.py", line 289, in _TestTearDown forwarder.Forwarder.UnmapAllDevicePorts(self._device) File "/b/rr/tmpJQd1OL/w/src/third_party/catapult/devil/devil/android/forwarder.py", line 218, in UnmapAllDevicePorts raise HostForwarderError('\n'.join(error_msg)) HostForwarderError: `/b/rr/tmpJQd1OL/w/src/out/Release/host_forwarder --adb=/b/rr/tmpJQd1OL/w/src/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb --serial-id=HT4B7JT01059 --unmap-all` exited with 1 [1110/072724:ERROR:host_forwarder_main.cc(477)] ERROR: could not get adb port for device. You might need to add 'adb' to your PATH or provide the device serial id. For this one at least, it really just looks like adb cannot find the device. I am not 100% sure on what to do about that. The next run is going and the device appears to be fine now.
,
Dec 9 2016
,
Jan 18 2017
Ping - please provide an update to your high priority bug. This bug is stale. Is it really P-1?
,
Aug 4 2017
This issue was created > 6 months ago. The perf waterfall has changed significantly since then. If this bug is still relevant, please re-open. |
||||
►
Sign in to add a comment |
||||
Comment 1 by rnep...@chromium.org
, Nov 9 2016