New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 682005 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug



Sign in to add a comment

BattOr benchmarks time out waiting for the cloud storage lock file to become available

Project Member Reported by charliea@chromium.org, Jan 17 2017

Issue description

I first saw this on the win-high-dpi bot on Friday and, while I've only seen it on Windows thus far, I have no reason to believe that it couldn't also occur on other platforms.

Link to logs: https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FWin_10_High-DPI_Perf%2F192%2F%2B%2Frecipes%2Fsteps%2Fbattor.power_cases_on_Intel_GPU_on_Windows_on_Windows-10-10240%2F0%2Fstdout

Relevant callstack: 

"""
Traceback (most recent call last):
  File "c:\b\s\w\irlcwumq\third_party\catapult\telemetry\telemetry\internal\story_runner.py", line 86, in _RunStoryAndProcessErrorIfNeeded
    test.WillRunStory(state.platform)
  File "c:\b\s\w\irlcwumq\third_party\catapult\telemetry\telemetry\web_perf\timeline_based_measurement.py", line 285, in WillRunStory
    platform.tracing_controller.StartTracing(self._tbm_options.config)
  File "c:\b\s\w\irlcwumq\third_party\catapult\telemetry\telemetry\core\tracing_controller.py", line 43, in StartTracing
    self._tracing_controller_backend.StartTracing(tracing_config, timeout)
  File "c:\b\s\w\irlcwumq\third_party\catapult\telemetry\telemetry\internal\platform\tracing_controller_backend.py", line 88, in StartTracing
    if agent.StartAgentTracing(config, timeout):
  File "c:\b\s\w\irlcwumq\third_party\catapult\telemetry\telemetry\internal\platform\tracing_agent\battor_tracing_agent.py", line 73, in StartAgentTracing
    self._battor.StartTracing()
  File "c:\b\s\w\irlcwumq\third_party\catapult\common\battor\battor\battor_wrapper.py", line 215, in StartTracing
    self._FlashBattOr()
  File "c:\b\s\w\irlcwumq\third_party\catapult\common\battor\battor\battor_wrapper.py", line 145, in _FlashBattOr
    'battor_firmware', 'default')
  File "c:\b\s\w\irlcwumq\third_party\catapult\dependency_manager\dependency_manager\manager.py", line 93, in FetchPathWithVersion
    path = dependency_info.GetRemotePath()
  File "c:\b\s\w\irlcwumq\third_party\catapult\dependency_manager\dependency_manager\dependency_info.py", line 84, in GetRemotePath
    return self._cloud_storage_info.GetRemotePath()
  File "c:\b\s\w\irlcwumq\third_party\catapult\dependency_manager\dependency_manager\cloud_storage_info.py", line 80, in GetRemotePath
    self._cs_hash)
  File "c:\b\s\w\irlcwumq\third_party\catapult\common\py_utils\py_utils\cloud_storage.py", line 329, in GetIfHashChanged
    with _FileLock(download_path):
  File "c:\b\depot_tools\python276_bin\lib\contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "c:\b\s\w\irlcwumq\third_party\catapult\common\py_utils\py_utils\cloud_storage.py", line 246, in _FileLock
    PSEUDO_LOCK_ACQUISITION_TIMEOUT)
  File "c:\b\s\w\irlcwumq\third_party\catapult\common\py_utils\py_utils\__init__.py", line 132, in WaitFor
    (timeout, GetConditionString()))
TimeoutException: Timed out while waiting 10s for py_utils.WaitFor(lambda: not os.path.exists(pseudo_lock_path),
                   PSEUDO_LOCK_ACQUISITION_TIMEOUT).

INFO:root:Try printing formatted exception: None None None

Exception raised when cleaning story run: 

Traceback (most recent call last):
  _RunStoryAndProcessErrorIfNeeded at c:\b\s\w\irlcwumq\third_party\catapult\telemetry\telemetry\internal\story_runner.py:113
    state.DidRunStory(results)
  traced_function at c:\b\s\w\irlcwumq\third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:75
    return func(*args, **kwargs)
  DidRunStory at c:\b\s\w\irlcwumq\third_party\catapult\telemetry\telemetry\page\shared_page_state.py:155
    if self._current_page.credentials and self._did_login_for_current_page:
AttributeError: 'NoneType' object has no attribute 'credentials'
"""

Basically, it looks like somehow the BattOr firmware lock file isn't being released, which is causing the benchmark to time out while waiting for it. I believe the current timeout is set to 10 seconds, but because the firmware isn't large and doesn't take long to download, I don't anticipate increasing that timeout would help at all. I think what might be happening is that somehow an old lock file isn't getting cleaned up (possible if Telemetry is getting killed in the middle of downloading, for example).

I believe what should happen is that we should have these lock files cleaned up between benchmark runs. Ned suggested that, in the swarming world, if we put the lock file in /tmp, it should get cleaned up in this way. I'm going to verify with maruel@ before proceeding with the fix.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Jan 26 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c3b4a2f32560ad82a6892a93f64bb260def54c51

commit c3b4a2f32560ad82a6892a93f64bb260def54c51
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Thu Jan 26 20:04:33 2017

Roll src/third_party/catapult/ e1e778d78..7a2a837ac (29 commits).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/e1e778d78de1..7a2a837ac3ae

$ git log e1e778d78..7a2a837ac --date=short --no-merges --format='%ad %ae %s'
2017-01-26 benjhayden Translate RelatedHistogramSet to python.
2017-01-26 benjhayden Translate RelatedEventSet to python.
2017-01-26 yolandyan Revert of Change apk_helper.py for apk with multi instrumentations and JUnit4 (patchset #10 id:180001 of https://codereview.chromium.org/2632763003/ )
2017-01-26 nednguyen Update labels to tag in story_set_smoke_test
2017-01-26 hjd [tracing] Cache number formatters in Unit
2017-01-25 dtu [pinpoint] RunTest (Swarming) Quest and Execution.
2017-01-25 charliea Set DISABLE_CLOUD_STORAGE_IO back after psuedo lock tests
2017-01-25 alexandermont Make the whole story power metric not depend on Chrome trace.
2017-01-25 benjhayden Fix TelemetryInfo.
2017-01-25 benjhayden Redesign breakdown-span.
2017-01-25 benjhayden Translate DeviceInfo to python.
2017-01-25 benjhayden Translate TelemetryInfo to python.
2017-01-25 benjhayden Allow metrics to resegment the UserModel.
2017-01-25 benjhayden Translate BuildbotInfo to python.
2017-01-24 simonhatch Dashboard - Remove some old queues.
2017-01-24 sullivan Add ref build back into charts on /group_report page.
2017-01-24 benjhayden Translate Diagnostics to Python.
2017-01-24 benjhayden Make trace2html accept gzipped trace json files in addition to unzipped files.
2017-01-24 benjhayden Add Segments to the UserModel.
2017-01-24 alexandermont Fix function scope bug in tquery.
2017-01-24 charliea Fix bug where stale lock file can cause cloud storage timeouts
2017-01-24 benjhayden Improve BarChart and ColumnChart hover boxes.
2017-01-24 yolandyan Change apk_helper.py for apk with multi instrumentations and JUnit4
2017-01-24 kbr Only display 200 lines of syslog upon sub-process crash on macOS.
2017-01-24 kraynov Fix wrong upload of memtrack_helper for arm64 CPU.
2017-01-23 zheda.chen Change smoothness frame-times metrics on CrOS
2017-01-23 benjhayden Delete systemHealthMetrics meta-metric.
2017-01-23 simonhatch Dashboard - Fix output when tests fail to produce output.
2017-01-23 nednguyen [Telemetry] Remove labels field from story.Story constructor & labels related flags

BUG= 682005 , 682005 , 682819 ,672780, 675846 , 683998 

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls

CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2654253003
Cr-Commit-Position: refs/heads/master@{#446416}

[modify] https://crrev.com/c3b4a2f32560ad82a6892a93f64bb260def54c51/DEPS

Status: Fixed (was: Assigned)

Sign in to add a comment