New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 650674 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: ----

Blocked on:
issue 634052



Sign in to add a comment

Device offline or forwarder failure on chromium.perf

Project Member Reported by sullivan@chromium.org, Sep 27 2016

Issue description

Link to buildbot status page:
https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5%20Perf%20%281%29

All the tests on device 0d88c7fd25995e62 attached to build13-b1 are failing.

Randy, can you help triage since:
* The tests show as red, not purple
* Error logs show forwarder failures, but also a lot that looks like device failures
* This is only happening on one device.

Sample log:
https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus5%20Perf%20%281%29/builds/4326/steps/v8.browsing_mobile_ignition/logs/stdio


Traceback (most recent call last):
  RunBenchmark at /b/rr/tmpqo6Lrl/w/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py:336
    benchmark.ShouldTearDownStateAfterEachStorySetRun())
  Run at /b/rr/tmpqo6Lrl/w/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py:222
    test, finder_options.Copy(), story_set)
  traced_function at /b/rr/tmpqo6Lrl/w/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
    return func(*args, **kwargs)
  __init__ at /b/rr/tmpqo6Lrl/w/src/third_party/catapult/telemetry/telemetry/page/shared_page_state.py:101
    use_live_traffic=use_live_traffic)
  traced_function at /b/rr/tmpqo6Lrl/w/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
    return func(*args, **kwargs)
  InitializeIfNeeded at /b/rr/tmpqo6Lrl/w/src/third_party/catapult/telemetry/telemetry/core/network_controller.py:21
    self._network_controller_backend.InitializeIfNeeded(use_live_traffic)
  InitializeIfNeeded at /b/rr/tmpqo6Lrl/w/src/third_party/catapult/telemetry/telemetry/internal/platform/network_controller_backend.py:64
    self._platform_backend.GetPortPairForForwarding(local_port))
  Create at /b/rr/tmpqo6Lrl/w/src/third_party/catapult/telemetry/telemetry/internal/forwarders/android_forwarder.py:25
    return AndroidForwarder(self._device, port_pair)
  __init__ at /b/rr/tmpqo6Lrl/w/src/third_party/catapult/telemetry/telemetry/internal/forwarders/android_forwarder.py:60
    [(port_pair.remote_port, port_pair.local_port)], self._device)
  Map at /b/rr/tmpqo6Lrl/w/src/third_party/catapult/devil/devil/android/forwarder.py:150
    exit_code, '\n'.join(output)))
HostForwarderError: /b/rr/tmpqo6Lrl/w/src/third_party/catapult/devil/bin/deps/linux2/x86_64/forwarder_host exited with 1:

 
Cc: jbudorick@chromium.org
* The tests show as red, not purple
The test is failing during the run_benchmark invocation. As far as the infra code is concerned this is not an infra issue. If run_benchmark returned with an infra error code it could be turned purple, but that would be the only way. As of right now it is returning 1.
"exit code (as seen by runtest.py): 1"
"step returned non-zero exit code: 1"

* Error logs show forwarder failures, but also a lot that looks like device failures
An error on the device could be causing the forwarder to freak out.

* This is only happening on one device.
It leads me to believe it might be a device issue, ie. the device is somehow getting into a bad state  (since it is happening on single devices and not entire setups). 

John was doing something about the forwarder recently, but I am not sure what exactly. Adding him.
Blockedon: 634052
!

The N5 (at least) looks almost exactly like  issue 634052 . Investigating that was the reason for the changes I was making. Haven't solved it yet, though.

Marking as blocked on that as I'm not sure it's a complete dup.
Cc: -jbudorick@chromium.org rnep...@chromium.org
Owner: jbudorick@chromium.org
Status: Assigned (was: Untriaged)
... and self-assigning.
Cc: sullivan@chromium.org
 Issue 651116  has been merged into this issue.
One other case that's a bit strange:

https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus6%20WebView%20Perf%20%283%29/builds/138

Some of the tests on ZX1G22KZXV are passing, and some are failing with the HostForwarderError.
johnw re-flashed and re-connected all the devices except the N5 in  bug 651180 . Two of them are better, but we're still seeing the problem with 070b074f: https://build.chromium.org/p/chromium.perf/builders/Android%20Nexus7v2%20Perf%20%281%29/builds/3830

Should I ask for a device replacement? Any ideas?
I think this has something to do with page_cycler_v2.typical_25, which in some cases appears to time out and then leave the device unusable.

This is hard to see, though, because we currently print tests on the build page in alphabetical order within each shard rather than in execution order within each shard.

(Still investigating why page_cycler_v2.typical_25 times out and why we don't clean up properly after it does.)
Status: Started (was: Assigned)
nvm, I know what's going on here.

We attempt to recover the device after a test failure (in this case, a timeout): https://codesearch.chromium.org/chromium/src/build/android/pylib/local/device/local_device_perf_test_run.py?rcl=0&l=239

This includes rebooting the device, among other things, so when the device comes back, the device forwarder daemon is no longer running. However, when we go to attempt to use the forwarder, we think we've initialized the device already: https://codesearch.chromium.org/chromium/src/third_party/catapult/devil/devil/android/forwarder.py?rcl=0&l=323

so we never restart the device forwarder daemon & the forwarding fails.

Should be able to get a fix in today or tomorrow.
(...though this will not include a fix for page_cycler_v2.typical_25)
I think I was wrong in #10. Still looking.
Project Member

Comment 13 by bugdroid1@chromium.org, Sep 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3d77e97fb46b4f0a9f9255c30b617abf6721ad33

commit 3d77e97fb46b4f0a9f9255c30b617abf6721ad33
Author: jbudorick <jbudorick@chromium.org>
Date: Fri Sep 30 14:59:17 2016

[Android] Add --unmap-all to forwarder2.

In some scenarios (e.g., single-device restart), we want to unmap all
ports forwarded from a given device up to the host and clear the existing
cached adb port for that device. We want to be able to do this even if
the calling process doesn't know all of those ports. This change adds
the --unmap-all command to forwarder2 to support such use cases.

BUG= 634052 , 650674 

Review-Url: https://codereview.chromium.org/2381063004
Cr-Commit-Position: refs/heads/master@{#422113}

[modify] https://crrev.com/3d77e97fb46b4f0a9f9255c30b617abf6721ad33/tools/android/forwarder2/host_forwarder_main.cc

Project Member

Comment 14 by bugdroid1@chromium.org, Oct 1 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/dccd754c3b5cc5be5c809ffd6a9b742053f25c76

commit dccd754c3b5cc5be5c809ffd6a9b742053f25c76
Author: jbudorick <jbudorick@chromium.org>
Date: Sat Oct 01 01:51:20 2016

[Android] Run shell commands from the forwarder without passing fds.

The forwarder daemon was running commands with system(). This would give
the newly forked process copies of the same file handles held by the
daemon, notably including the unix domain socket.

If the adb server wasn't already running and the daemon called an adb
command, the adb server would be forked from the adb client process
with those same file handles -- including the unix domain socket. This
would interfere both with shutting down the host forwarder daemon
(as we'd see the unix domain socket still held by the adb server) and
with subsequent attempts to bring it up (same reason).

BUG= 634052 , 650674 

Review-Url: https://codereview.chromium.org/2374183008
Cr-Commit-Position: refs/heads/master@{#422263}

[modify] https://crrev.com/dccd754c3b5cc5be5c809ffd6a9b742053f25c76/tools/android/forwarder2/host_forwarder_main.cc

Project Member

Comment 15 by bugdroid1@chromium.org, Oct 1 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/84526ade9b6d246a8834309d0519d2255c0db91d

commit 84526ade9b6d246a8834309d0519d2255c0db91d
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Sat Oct 01 08:06:33 2016

Roll src/third_party/catapult/ f00b66029..507bed462 (2 commits).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/f00b66029517..507bed4626dd

$ git log f00b66029..507bed462 --date=short --no-merges --format='%ad %ae %s'
2016-09-30 jbudorick [telemetry] Update {device,host}_forwarder binaries.
2016-09-30 jbudorick [devil] Use --unmap-all in Forwarder.UnmapAllDevicePorts.

BUG= 634052 , 650674 , 634052 , 650674 

CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2378773016
Cr-Commit-Position: refs/heads/master@{#422308}

[modify] https://crrev.com/84526ade9b6d246a8834309d0519d2255c0db91d/DEPS

From looking at the logs this morning, I think this is fixed now, but a bunch of the bots are failing to upload to the dashboard.
The dashboard upload failures are fixed, so you should be able to verify now.
Perf bothealth Sheriff Ping:
Pri-1 bugs should be pinged daily, and checked to make sure someone is following up.
John@ can we close this bug as fixed?
Status: Fixed (was: Started)
I think so. We'll open new forwarder bugs as necessary.

Sign in to add a comment