New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 845510 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: ----



Sign in to add a comment

LOCK_ACQUISITION_TIMEOUT in some perf android swarming tasks

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, May 22 2018

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of sullivan@google.com

Three failures in a row:
blink_perf.canvas failing on chromium.perf/Android Nexus5X WebView Perf

Builders failed on: 
- Android Nexus5X WebView Perf: 
  https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus5X%20WebView%20Perf

Can the trooper help triage? Sample log: https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_Nexus5X_WebView_Perf%2F1873%2F%2B%2Frecipes%2Fsteps%2Fblink_perf.canvas_on_Android%2F0%2Fstdout

Traceback (most recent call last):
  File "../../third_party/catapult/devil/devil/android/tools/system_app.py", line 218, in <module>
    sys.exit(main(sys.argv[1:]))
  File "../../third_party/catapult/devil/devil/android/tools/system_app.py", line 208, in main
    devices = script_common.GetDevices(args.devices, args.blacklist_file)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/tools/script_common.py", line 45, in GetDevices
    devices = device_utils.DeviceUtils.HealthyDevices(blacklist_file)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/device_utils.py", line 2815, in HealthyDevices
    return _get_devices()
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/device_utils.py", line 2804, in _get_devices
    for adb in adb_wrapper.AdbWrapper.Devices():
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 387, in Devices
    retries=retries)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 407, in _RawDevices
    output = cls._RunAdbCmd(cmd, timeout=timeout, retries=retries)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/decorators.py", line 57, in timeout_retry_wrapper
    retry_if_func=retry_if_func)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/timeout_retry.py", line 158, in Run
    error_log_func=error_log_func)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 189, in JoinAll
    self._JoinAll(watcher, timeout)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 161, in _JoinAll
    thread.ReraiseIfException()
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 84, in run
    self._ret = self._func(*self._args, **self._kwargs)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/timeout_retry.py", line 151, in <lambda>
    child_thread = reraiser_thread.ReraiserThread(lambda: func(*args, **kwargs),
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/decorators.py", line 47, in impl
    return f(*args, **kwargs)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 264, in _RunAdbCmd
    cls._BuildAdbCmd(args, device_serial, cpu_affinity=cpu_affinity),
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 250, in _BuildAdbCmd
    cmd.append(cls.GetAdbPath())
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 238, in GetAdbPath
    return cls._adb_path.read()
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/lazy/weak_constant.py", line 39, in read
    self._val = initializer_thread.GetReturnValue()
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 77, in GetReturnValue
    self.ReraiseIfException()
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 84, in run
    self._ret = self._func(*self._args, **self._kwargs)
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 83, in _FindAdb
    return devil_env.config.FetchPath('adb')
  File "/b/swarming/w/ir/third_party/catapult/devil/devil/devil_env.py", line 175, in FetchPath
    return self._dm.FetchPath(dependency, GetPlatform(arch, device))
  File "/b/swarming/w/ir/third_party/catapult/dependency_manager/dependency_manager/manager.py", line 130, in FetchPath
    path, _ = self.FetchPathWithVersion(dependency, platform)
  File "/b/swarming/w/ir/third_party/catapult/dependency_manager/dependency_manager/manager.py", line 93, in FetchPathWithVersion
    path = dependency_info.GetRemotePath()
  File "/b/swarming/w/ir/third_party/catapult/dependency_manager/dependency_manager/dependency_info.py", line 84, in GetRemotePath
    return self._cloud_storage_info.GetRemotePath()
  File "/b/swarming/w/ir/third_party/catapult/dependency_manager/dependency_manager/cloud_storage_info.py", line 80, in GetRemotePath
    self._cs_hash)
  File "/b/swarming/w/ir/third_party/catapult/common/py_utils/py_utils/cloud_storage.py", line 395, in GetIfHashChanged
    with _FileLock(download_path):
  File "/b/swarming/w/ir/.swarming_module/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/b/swarming/w/ir/third_party/catapult/common/py_utils/py_utils/cloud_storage.py", line 267, in _FileLock
    LOCK_ACQUISITION_TIMEOUT)
  File "/b/swarming/w/ir/third_party/catapult/common/py_utils/py_utils/__init__.py", line 148, in WaitFor
    (timeout, GetConditionString()))
py_utils.TimeoutException: Timed out while waiting 10s for py_utils.WaitFor(lambda: _AttemptPseudoLockAcquisition(pseudo_lock_path,
                                                         pseudo_lock_fd_return),
                   LOCK_ACQUISITION_TIMEOUT).
 
Owner: bpastene@chromium.org
Status: Assigned (was: Available)
Looks like it's trying to download adb from google storage via "dependency_manager". It tries to grab a flock before doing so, but times out. Not sure why else could be holding the lock... The bot should be the only thing running in the container. Maybe the task before it didn't properly release the flock?

Comment 2 by sullivan@google.com, May 24 2018

Cc: nednguyen@chromium.org
+Ned in case you have insight into comment 1 (or know who to cc)
Cc: jbudorick@chromium.org
#1: I highly suspect that there was some left over locked files. The locking stuffs is quite tricky and I honestly think we should just try to prefetch all the files needed (we do that for telemetry by invoking https://cs.chromium.org/chromium/src/DEPS?rcl=f85585b37189469ba6c9b53dfea57d88f1a9fe0f&l=1806)
Strange how it only happens in blink_perf.canvas on that bot, and not in any other tests...

Just happened again as well:
https://chrome-swarming.appspot.com/task?id=3dac7411d624d910

Maybe it's a consequence of the test that runs right before it? I see that it's always smoothness.image_decoding_cases. Lemme keep digging... the fact that it keeps happening (and only in blink_perf.canvas) is pretty weird.

Comment 5 by hzl@chromium.org, Jun 11 2018

Cc: bpastene@chromium.org charliea@chromium.org
 Issue 851552  has been merged into this issue.
This was happening frequently on https://chrome-swarming.appspot.com/bot?id=build188-b7--device7

I clobbered its entire checkout (see the task I leased) and it hasn't happened since. I'll do the same to the other bots up above
Components: Tests>Telemetry
Labels: -Pri-1 -Infra-Troopers Pri-2
Owner: ----
Status: Available (was: Assigned)
Summary: LOCK_ACQUISITION_TIMEOUT in some perf android swarming tasks (was: blink_perf.canvas failing with LOCK_ACQUISITION_TIMEOUT on chromium.perf/Android Nexus5X WebView Perf)
Rate of occurrences seems to have gone down, so lowering pri, but it still seems to be happening occasionally:
https://chrome-swarming.appspot.com/task?id=3e515bb034a1db10
https://chrome-swarming.appspot.com/task?id=3e5089ca3c3f6310

I tried investigating but had trouble repro'ing it on the bot. I don't think this is an infra issue (at least from what I could find.) More of an issue with telemetry's/catapult's file-locking lib I think, so moving the bug there.

Comment 15 by benhenry@google.com, Jan 16 (6 days ago)

Components: Test>Telemetry

Comment 16 by benhenry@google.com, Jan 16 (6 days ago)

Components: -Tests>Telemetry

Sign in to add a comment