New issue
Advanced search Search tips

Issue 859571 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: 2018-07-03
OS: ----
Pri: 1
Type: ----



Sign in to add a comment

host forward failing to connect to daemon Unix socket when running blink_perf.bindings

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Jul 2

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of charliea@chromium.org

blink_perf.bindings/append-child.html and 38 other(s) in blink_perf.bindings failing on chromium.perf/Android Nexus5X WebView Perf

Builders failed on: 
- Android Nexus5X WebView Perf: 
  https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus5X%20WebView%20Perf

It looks like this has happened periodically for the last couple of days, at least on this bot. The failure appears related to the host forwarder:

https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_Nexus5X_WebView_Perf%2F2039%2F%2B%2Frecipes%2Fsteps%2Fblink_perf.bindings_on_Android_device_Nexus_5X%2F0%2Fstdout

Traceback (most recent call last):
  RunBenchmark at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:366
    expectations=expectations, max_num_values=benchmark.MAX_NUM_VALUES)
  Run at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:214
    test, finder_options.Copy(), story_set)
  traced_function at /b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
    return func(*args, **kwargs)
  __init__ at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/page/shared_page_state.py:86
    self.platform.network_controller.Open(wpr_mode)
  traced_function at /b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
    return func(*args, **kwargs)
  Open at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/core/network_controller.py:28
    self._network_controller_backend.Open(wpr_mode)
  Open at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/network_controller_backend.py:72
    local_port=local_port, remote_port=None)
  Create at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/forwarders/android_forwarder.py:29
    return AndroidForwarder(self._device, local_port, remote_port)
  __init__ at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/forwarders/android_forwarder.py:90
    forwarder.Forwarder.Map([(remote_port or 0, local_port)], self._device)
  Map at /b/swarming/w/ir/third_party/catapult/devil/devil/android/forwarder.py:178
    formatted_output))
HostForwarderError: `/b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/forwarder_host --adb=/b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb --serial-id=008671d42589ad4c --map 0 44104` exited with 1:
[0702/023708.436694:ERROR:daemon.cc(215)] Could not connect to daemon's Unix Daemon socket
Locals:
  device           : <devil.android.device_utils.DeviceUtils object at 0x7f754cab5250>
  device_port      : 0
  device_serial    : '008671d42589ad4c'
  exit_code        : 1
  formatted_output : "[0702/023708.436694:ERROR:daemon.cc(215)] Could not connect to daemon's Unix Daemon socket\n"
  host_port        : 44104
  instance         : <devil.android.forwarder.Forwarder object at 0x7f7561e3e610>
  map_arg_list     : [u'--adb=/b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb', '--serial-id=008671d42589ad4c', '--map', '0', '44104']
  map_arg_lists    : [[u'--adb=/b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb', '--serial-id=008671d42589ad4c', '--map', '0', '44104']]
  map_cmd          : [u'/b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/forwarder_host', u'--adb=/b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb', '--serial-id=008671d42589ad4c', '--map', '0', '44104']
  output           : "[0702/023708.436694:ERROR:daemon.cc(215)] Could not connect to daemon's Unix Daemon socket\n"
  port_pairs       : [(0, 44104)]
  tool             : <devil.android.valgrind_tools.base_tool.BaseTool object at 0x7f7561e3e810>
 
Components: Speed>Benchmarks>Waterfall
NextAction: 2018-07-03
Owner: bpastene@chromium.org
Status: Assigned (was: Available)
Summary: host forward failing to connect to daemon Unix socket when running blink_perf.bindings (was: blink_perf.bindings/append-child.html and 38 other(s) in blink_perf.bindings failing on chromium.perf/Android Nexus5X WebView Perf)
Ben, can you take a look at this? If I recall correctly, you know the most about this area of code.

If I see other instances where this looks like a problem, I'll add them to this bug. This specific case is on Nexus 5X webview perf.
Cc: jbudorick@chromium.org
This is the device forwarder, which I'm not too familiar with. +jbud who likely is.

Looks like it's having trouble opening a socket on the host. Maybe a process from the previous task is lingering and hogging it? This sounds potentially similar to bug 845510 where the task has trouble acquiring a flock. One potential option is to restart the swarming bot's container after every task. This would induce a clean slate for each test, but increase the overhead in between two consecutive tasks on a single bot.

I can look into that if no other solution presents itself.
#0: you mention this happening periodically; do you have links to builds other than 2039? I'm primarily seeing pseudo-lock acquisition timeouts.
The NextAction date has arrived: 2018-07-03
/ping bpastene@
Ok, I'm going to make the containers reboot after every task. That's guaranteed to clear out any stale processes hanging around holding locks.

It'll increase the overhead between two consecutive swarming tasks on a single bot, but FWIU, there's only one swarming task per swarming bot per build in OBBS now, so that's not really a problem.

Sign in to add a comment