LOCK_ACQUISITION_TIMEOUT in some perf android swarming tasks |
||||||
Issue descriptionFiled by sheriff-o-matic@appspot.gserviceaccount.com on behalf of sullivan@google.com Three failures in a row: blink_perf.canvas failing on chromium.perf/Android Nexus5X WebView Perf Builders failed on: - Android Nexus5X WebView Perf: https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus5X%20WebView%20Perf Can the trooper help triage? Sample log: https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_Nexus5X_WebView_Perf%2F1873%2F%2B%2Frecipes%2Fsteps%2Fblink_perf.canvas_on_Android%2F0%2Fstdout Traceback (most recent call last): File "../../third_party/catapult/devil/devil/android/tools/system_app.py", line 218, in <module> sys.exit(main(sys.argv[1:])) File "../../third_party/catapult/devil/devil/android/tools/system_app.py", line 208, in main devices = script_common.GetDevices(args.devices, args.blacklist_file) File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/tools/script_common.py", line 45, in GetDevices devices = device_utils.DeviceUtils.HealthyDevices(blacklist_file) File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/device_utils.py", line 2815, in HealthyDevices return _get_devices() File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/device_utils.py", line 2804, in _get_devices for adb in adb_wrapper.AdbWrapper.Devices(): File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 387, in Devices retries=retries) File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 407, in _RawDevices output = cls._RunAdbCmd(cmd, timeout=timeout, retries=retries) File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/decorators.py", line 57, in timeout_retry_wrapper retry_if_func=retry_if_func) File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/timeout_retry.py", line 158, in Run error_log_func=error_log_func) File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 189, in JoinAll self._JoinAll(watcher, timeout) File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 161, in _JoinAll thread.ReraiseIfException() File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 84, in run self._ret = self._func(*self._args, **self._kwargs) File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/timeout_retry.py", line 151, in <lambda> child_thread = reraiser_thread.ReraiserThread(lambda: func(*args, **kwargs), File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/decorators.py", line 47, in impl return f(*args, **kwargs) File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 264, in _RunAdbCmd cls._BuildAdbCmd(args, device_serial, cpu_affinity=cpu_affinity), File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 250, in _BuildAdbCmd cmd.append(cls.GetAdbPath()) File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 238, in GetAdbPath return cls._adb_path.read() File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/lazy/weak_constant.py", line 39, in read self._val = initializer_thread.GetReturnValue() File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 77, in GetReturnValue self.ReraiseIfException() File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 84, in run self._ret = self._func(*self._args, **self._kwargs) File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 83, in _FindAdb return devil_env.config.FetchPath('adb') File "/b/swarming/w/ir/third_party/catapult/devil/devil/devil_env.py", line 175, in FetchPath return self._dm.FetchPath(dependency, GetPlatform(arch, device)) File "/b/swarming/w/ir/third_party/catapult/dependency_manager/dependency_manager/manager.py", line 130, in FetchPath path, _ = self.FetchPathWithVersion(dependency, platform) File "/b/swarming/w/ir/third_party/catapult/dependency_manager/dependency_manager/manager.py", line 93, in FetchPathWithVersion path = dependency_info.GetRemotePath() File "/b/swarming/w/ir/third_party/catapult/dependency_manager/dependency_manager/dependency_info.py", line 84, in GetRemotePath return self._cloud_storage_info.GetRemotePath() File "/b/swarming/w/ir/third_party/catapult/dependency_manager/dependency_manager/cloud_storage_info.py", line 80, in GetRemotePath self._cs_hash) File "/b/swarming/w/ir/third_party/catapult/common/py_utils/py_utils/cloud_storage.py", line 395, in GetIfHashChanged with _FileLock(download_path): File "/b/swarming/w/ir/.swarming_module/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/b/swarming/w/ir/third_party/catapult/common/py_utils/py_utils/cloud_storage.py", line 267, in _FileLock LOCK_ACQUISITION_TIMEOUT) File "/b/swarming/w/ir/third_party/catapult/common/py_utils/py_utils/__init__.py", line 148, in WaitFor (timeout, GetConditionString())) py_utils.TimeoutException: Timed out while waiting 10s for py_utils.WaitFor(lambda: _AttemptPseudoLockAcquisition(pseudo_lock_path, pseudo_lock_fd_return), LOCK_ACQUISITION_TIMEOUT).
,
May 24 2018
+Ned in case you have insight into comment 1 (or know who to cc)
,
May 24 2018
#1: I highly suspect that there was some left over locked files. The locking stuffs is quite tricky and I honestly think we should just try to prefetch all the files needed (we do that for telemetry by invoking https://cs.chromium.org/chromium/src/DEPS?rcl=f85585b37189469ba6c9b53dfea57d88f1a9fe0f&l=1806)
,
May 24 2018
Strange how it only happens in blink_perf.canvas on that bot, and not in any other tests... Just happened again as well: https://chrome-swarming.appspot.com/task?id=3dac7411d624d910 Maybe it's a consequence of the test that runs right before it? I see that it's always smoothness.image_decoding_cases. Lemme keep digging... the fact that it keeps happening (and only in blink_perf.canvas) is pretty weird.
,
Jun 11 2018
,
Jun 20 2018
Any update on this? I just saw this again perf sheriffing: https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus6%20WebView%20Perf/1966 https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_Nexus6_WebView_Perf%2F1966%2F%2B%2Frecipes%2Fsteps%2Fblink_perf.svg_on_Android_device_Nexus_6%2F0%2Fstdout
,
Jun 20 2018
This was happening frequently on https://chrome-swarming.appspot.com/bot?id=build188-b7--device7 I clobbered its entire checkout (see the task I leased) and it hasn't happened since. I'll do the same to the other bots up above
,
Jun 25 2018
Rate of occurrences seems to have gone down, so lowering pri, but it still seems to be happening occasionally: https://chrome-swarming.appspot.com/task?id=3e515bb034a1db10 https://chrome-swarming.appspot.com/task?id=3e5089ca3c3f6310 I tried investigating but had trouble repro'ing it on the bot. I don't think this is an infra issue (at least from what I could find.) More of an issue with telemetry's/catapult's file-locking lib I think, so moving the bug there.
,
Jul 24
This just happen again in https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus6%20WebView%20Perf/2126
,
Jan 16
(6 days ago)
,
Jan 16
(6 days ago)
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by bpastene@chromium.org
, May 24 2018Status: Assigned (was: Available)