New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 663748 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: ----

Blocked on:
issue 667470
issue 691654



Sign in to add a comment

Purple tests on many android devices

Project Member Reported by sullivan@chromium.org, Nov 9 2016

Issue description

Randy, I'm not sure if you have any ideas on how to triage; I know you wrote some of the code to turn android perf steps purple but not sure if this is related?

Seeing this on several bots:
https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5%20Perf%20%281%29/builds/4638
https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5%20Perf%20%282%29/builds/4284
https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus7v2%20Perf%20%282%29/builds/3231
https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus9%20Perf%20%282%29/builds/3515

Some, but not all of the tests on a device go purple. It's my understanding that the order the tests are shown on the buildbot status page is not necessarily the order they were run in, so my guess is that something is going wrong and the rest of the runs on the device fail afterwards. But all I see in the logs is basically a message that telemetry was never run:

/usr/bin/python /b/rr/tmpCPIpw6/w/src/build/android/test_runner.py perf --print-step page_cycler_v2.intl_ko_th_vi --verbose --adb-path /b/rr/tmpCPIpw6/w/src/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb --blacklist-file /b/rr/tmpCPIpw6/w/src/out/bad_devices.json --output-chartjson-data=/tmp/tmpoV_HSE
I    0.003s Main  command: /b/rr/tmpCPIpw6/w/src/build/android/test_runner.py perf --print-step page_cycler_v2.intl_ko_th_vi --verbose --adb-path /b/rr/tmpCPIpw6/w/src/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb --blacklist-file /b/rr/tmpCPIpw6/w/src/out/bad_devices.json --output-chartjson-data=/tmp/tmpoV_HSE
E    0.003s Main  File not found /b/rr/tmpCPIpw6/w/src/out/step_results/page_cycler_v2.intl_ko_th_vi
E    0.004s Main  Error occurred.
Traceback (most recent call last):
  File "/b/rr/tmpCPIpw6/w/src/build/android/test_runner.py", line 869, in main
    return RunTestsCommand(args)
  File "/b/rr/tmpCPIpw6/w/src/build/android/test_runner.py", line 693, in RunTestsCommand
    return RunTestsInPlatformMode(args)
  File "/b/rr/tmpCPIpw6/w/src/build/android/test_runner.py", line 760, in RunTestsInPlatformMode
    raw_results = test_run.RunTests()
  File "/b/rr/tmpCPIpw6/w/src/build/android/pylib/local/device/local_device_perf_test_run.py", line 498, in RunTests
    result_type = self._test_instance.PrintTestOutput()
  File "/b/rr/tmpCPIpw6/w/src/build/android/pylib/perf/perf_test_instance.py", line 124, in PrintTestOutput
    raise PersistentDataError('No data for test %s found.' % self._print_step)
PersistentDataError: No data for test page_cycler_v2.intl_ko_th_vi found.
<Thread(Thread-1, started 140700227208960)> ProcessRead: proc.stdout finished.
<Thread(Thread-1, started 140700227208960)> ProcessRead: cleaning up.
<Thread(Thread-2, started daemon 140700218816256)> TimedFlush: Finished
<Thread(Thread-1, started 140700227208960)> ProcessRead: finished.
exit code (as seen by runtest.py): 87

 
That error means it is not finding the output to the test run. This happens when a device is blacklisted and can no longer run tests.

The tests are not guaranteed to run in the order they are printed out. I will look into fixing that. I think it would make debugging things easier.

v8.browsing_mobile fails with a chrome crash.
v8.browsing_mobile_ignition fails with a chrome crash.

If you look at the sharded perf output, a device is blacklisted for timing out during reboot.
Adding 0accace943e4af1c to blacklist /b/rr/tmpq9NLiP/w/src/out/bad_devices.json for reason: reboot_timeout

The same device is also added to the blacklist in other spots as well, I will investigate why its being added multiple times but I dont think thats what is causing the device problems. We can try increasing the timeout time for rebooting on perf bots.
Making the print order match the execution order would be great. Right now it seems to be alphabetical within a shard at print time.

As for why it's getting blacklisted multiple times: seems like we continue trying to use the device within RecoverDevice even if we've blacklisted it with a reboot timeout on the first call. https://codesearch.chromium.org/chromium/src/third_party/catapult/devil/devil/android/tools/device_recovery.py?rcl=0&l=89
We never get to the sysrq reboot because we time out on the preceding Root call. If we get to that point, maybe we should just try the sysrq reboot anyway.

i.e., this:

W 13397.414s device_shard_helper(4)  Timed out while attempting to reboot 0accace943e4af1c normally.Attempting alternative reboot.
I 13397.415s TimeoutThread-1-for-device_shard_helper(4)  [host]> /b/rr/tmpq9NLiP/w/src/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 0accace943e4af1c root
C 13427.456s device_shard_helper(4)  Timed out. Dumping threads.

is: https://codesearch.chromium.org/chromium/src/third_party/catapult/devil/devil/android/tools/device_recovery.py?rcl=0&l=74
Owner: rnep...@chromium.org
Status: Assigned (was: Untriaged)
There is a CL out for running them in alphabetical order:
https://codereview.chromium.org/2486993003/

And a CL for trying to reboot via sysrq even if root times out:
https://codereview.chromium.org/2491493003/
Project Member

Comment 6 by bugdroid1@chromium.org, Nov 9 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4fea9e224b390c824a12c579a97fa2c75d0e420c

commit 4fea9e224b390c824a12c579a97fa2c75d0e420c
Author: rnephew <rnephew@chromium.org>
Date: Wed Nov 09 20:27:12 2016

[Android] Make android test runner run perf tests in alphabetical order.

BUG= 663748 

Review-Url: https://codereview.chromium.org/2486993003
Cr-Commit-Position: refs/heads/master@{#431023}

[modify] https://crrev.com/4fea9e224b390c824a12c579a97fa2c75d0e420c/build/android/pylib/local/device/local_device_perf_test_run.py

Project Member

Comment 7 by bugdroid1@chromium.org, Nov 9 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/43d8001b1061e74089931af72646a94199c3ae73

commit 43d8001b1061e74089931af72646a94199c3ae73
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Wed Nov 09 21:45:24 2016

Roll src/third_party/catapult/ 63e5a71b1..c93c05da3 (2 commits).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/63e5a71b1a49..c93c05da3068

$ git log 63e5a71b1..c93c05da3 --date=short --no-merges --format='%ad %ae %s'
2016-11-09 rnephew [Devil] Attempt to reboot via sysrq even if root fails.
2016-11-09 jbudorick [devil] Check that the process hasn't exited before killing it.

BUG= 663748 

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls

CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2489093002
Cr-Commit-Position: refs/heads/master@{#431041}

[modify] https://crrev.com/43d8001b1061e74089931af72646a94199c3ae73/DEPS

The tests are run in order now, so that should help seeing what is happening.

https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus9%20Perf%20%282%29/builds/3523

page_cycler_v2.basic_oopif fails and every test after that is purple on that device. 


First thing bad that happens is during the fifa page.
ERROR:root:Problem when trying to gather stack trace: (device: HT4B7JT01059) adb shell '( uiautomator dump /data/local/tmp/temp_file-64d76ce8e3254 );echo %$?': failed with exit status 255 and output:
- error: device 'HT4B7JT01059' not found

INFO:root:*************** BROWSER STANDARD OUTPUT ***************
INFO:root:Cannot get standard output on Android
INFO:root:*********** END OF BROWSER STANDARD OUTPUT ************
INFO:root:********************* BROWSER LOG *********************
INFO:root:No log file
INFO:root:***************** END OF BROWSER LOG ******************


Then it looks like telemetry thinks devtools crashed:
Traceback (most recent call last):
  _RunStoryAndProcessErrorIfNeeded at /b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py:110
    test.DidRunStory(state.platform)
  DidRunStory at /b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py:306
    platform.tracing_controller.StopTracing()
  StopTracing at /b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/core/tracing_controller.py:47
    return self._tracing_controller_backend.StopTracing()
  StopTracing at /b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:108
    self._IssueClockSyncMarker()
  _IssueClockSyncMarker at /b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:203
    self._RecordIssuerClockSyncMarker)
  RecordClockSyncMarker at /b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py:175
    raise ChromeClockSyncError('Cannot issue clock sync. No devtools clients')
ChromeClockSyncError: Cannot issue clock sync. No devtools clients

********************************************************************************
(/b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:397 _ConvertExceptionFromInspectorWebsocket) Original exception:

********************************************************************************
(/b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:418 _AddDebuggingInformation) Received a socket error in the browser connection and the tab no longer exists. The tab probably crashed.
********************************************************************************
(/b/rr/tmpJQd1OL/w/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:419 _AddDebuggingInformation) Debugger url: ws://127.0.0.1:54462/devtools/page/0
Found Minidump: False
Stack Trace:
********************************************************************************
********************************************************************************
Standard output:
********************************************************************************


From the test runner logs we get this after the failure:
I 5375.715s device_shard_helper(2)  page_cycler_v2.basic_oopif : exit_code=255 in 688 secs on device HT4B7JT01059
I 5375.718s device_shard_helper(2)  Unmapping device ports for HT4B7JT01059.
I 5375.719s device_shard_helper(2)  [host]> /b/rr/tmpJQd1OL/w/src/out/Release/host_forwarder --adb=/b/rr/tmpJQd1OL/w/src/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb --serial-id=HT4B7JT01059 --unmap-all
E 5375.760s device_shard_helper(2)  Exception when resetting ports.
Traceback (most recent call last):
  File "/b/rr/tmpJQd1OL/w/src/build/android/pylib/local/device/local_device_perf_test_run.py", line 289, in _TestTearDown
    forwarder.Forwarder.UnmapAllDevicePorts(self._device)
  File "/b/rr/tmpJQd1OL/w/src/third_party/catapult/devil/devil/android/forwarder.py", line 218, in UnmapAllDevicePorts
    raise HostForwarderError('\n'.join(error_msg))
HostForwarderError: `/b/rr/tmpJQd1OL/w/src/out/Release/host_forwarder --adb=/b/rr/tmpJQd1OL/w/src/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb --serial-id=HT4B7JT01059 --unmap-all` exited with 1
[1110/072724:ERROR:host_forwarder_main.cc(477)] ERROR: could not get adb port for device. You might need to add 'adb' to your PATH or provide the device serial id.


For this one at least, it really just looks like adb cannot find the device. I am not 100% sure on what to do about that. The next run is going and the device appears to be fine now.

Comment 9 by pasko@chromium.org, Dec 9 2016

Blocking: 667470
Ping - please provide an update to your high priority bug. This bug is stale. Is it really P-1?
Blockedon: 691654 667470
Blocking: -667470
Status: Archived (was: Assigned)
This issue was created > 6 months ago. The perf waterfall has changed significantly since then. If this bug is still relevant, please re-open.

Sign in to add a comment