Issue metadata
Sign in to add a comment
|
host forward failing to connect to daemon Unix socket when running blink_perf.bindings |
||||||||||||||||||||
Issue descriptionFiled by sheriff-o-matic@appspot.gserviceaccount.com on behalf of charliea@chromium.org blink_perf.bindings/append-child.html and 38 other(s) in blink_perf.bindings failing on chromium.perf/Android Nexus5X WebView Perf Builders failed on: - Android Nexus5X WebView Perf: https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus5X%20WebView%20Perf It looks like this has happened periodically for the last couple of days, at least on this bot. The failure appears related to the host forwarder: https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_Nexus5X_WebView_Perf%2F2039%2F%2B%2Frecipes%2Fsteps%2Fblink_perf.bindings_on_Android_device_Nexus_5X%2F0%2Fstdout Traceback (most recent call last): RunBenchmark at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:366 expectations=expectations, max_num_values=benchmark.MAX_NUM_VALUES) Run at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:214 test, finder_options.Copy(), story_set) traced_function at /b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) __init__ at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/page/shared_page_state.py:86 self.platform.network_controller.Open(wpr_mode) traced_function at /b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) Open at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/core/network_controller.py:28 self._network_controller_backend.Open(wpr_mode) Open at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/network_controller_backend.py:72 local_port=local_port, remote_port=None) Create at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/forwarders/android_forwarder.py:29 return AndroidForwarder(self._device, local_port, remote_port) __init__ at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/forwarders/android_forwarder.py:90 forwarder.Forwarder.Map([(remote_port or 0, local_port)], self._device) Map at /b/swarming/w/ir/third_party/catapult/devil/devil/android/forwarder.py:178 formatted_output)) HostForwarderError: `/b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/forwarder_host --adb=/b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb --serial-id=008671d42589ad4c --map 0 44104` exited with 1: [0702/023708.436694:ERROR:daemon.cc(215)] Could not connect to daemon's Unix Daemon socket Locals: device : <devil.android.device_utils.DeviceUtils object at 0x7f754cab5250> device_port : 0 device_serial : '008671d42589ad4c' exit_code : 1 formatted_output : "[0702/023708.436694:ERROR:daemon.cc(215)] Could not connect to daemon's Unix Daemon socket\n" host_port : 44104 instance : <devil.android.forwarder.Forwarder object at 0x7f7561e3e610> map_arg_list : [u'--adb=/b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb', '--serial-id=008671d42589ad4c', '--map', '0', '44104'] map_arg_lists : [[u'--adb=/b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb', '--serial-id=008671d42589ad4c', '--map', '0', '44104']] map_cmd : [u'/b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/forwarder_host', u'--adb=/b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb', '--serial-id=008671d42589ad4c', '--map', '0', '44104'] output : "[0702/023708.436694:ERROR:daemon.cc(215)] Could not connect to daemon's Unix Daemon socket\n" port_pairs : [(0, 44104)] tool : <devil.android.valgrind_tools.base_tool.BaseTool object at 0x7f7561e3e810>
,
Jul 2
This is the device forwarder, which I'm not too familiar with. +jbud who likely is. Looks like it's having trouble opening a socket on the host. Maybe a process from the previous task is lingering and hogging it? This sounds potentially similar to bug 845510 where the task has trouble acquiring a flock. One potential option is to restart the swarming bot's container after every task. This would induce a clean slate for each test, but increase the overhead in between two consecutive tasks on a single bot. I can look into that if no other solution presents itself.
,
Jul 2
#0: you mention this happening periodically; do you have links to builds other than 2039? I'm primarily seeing pseudo-lock acquisition timeouts.
,
Jul 3
The NextAction date has arrived: 2018-07-03
,
Jul 3
John, sorry about that - I looked at a single failure and assumed it was representative of the problem. Far more runs seem to be failing with pseudo lock problems: - https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_Nexus5X_WebView_Perf%2F2040%2F%2B%2Frecipes%2Fsteps%2Fblink_perf.bindings_on_Android_device_Nexus_5X%2F0%2Fstdout - https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_Nexus5X_WebView_Perf%2F2036%2F%2B%2Frecipes%2Fsteps%2Fblink_perf.bindings_on_Android_device_Nexus_5X%2F0%2Fstdout - https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_Nexus5X_WebView_Perf%2F2035%2F%2B%2Frecipes%2Fsteps%2Fblink_perf.bindings_on_Android_device_Nexus_5X%2F0%2Fstdout IIRC, bpastene@ knows the most about this class of failure.
,
Jul 9
/ping bpastene@
,
Jul 10
Ok, I'm going to make the containers reboot after every task. That's guaranteed to clear out any stale processes hanging around holding locks. It'll increase the overhead between two consecutive swarming tasks on a single bot, but FWIU, there's only one swarming task per swarming bot per build in OBBS now, so that's not really a problem. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by charliea@chromium.org
, Jul 2NextAction: 2018-07-03
Owner: bpastene@chromium.org
Status: Assigned (was: Available)
Summary: host forward failing to connect to daemon Unix socket when running blink_perf.bindings (was: blink_perf.bindings/append-child.html and 38 other(s) in blink_perf.bindings failing on chromium.perf/Android Nexus5X WebView Perf)