New issue
Advanced search Search tips

Issue 754746 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 757508
Owner:
Closed: Aug 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: ----
Type: ----



Sign in to add a comment

Many BattOr benchmarks failing with crashed BattOr agent but without any serial logs or debug information

Project Member Reported by sullivan@chromium.org, Aug 11 2017

Issue description

Charlie, feel free to dupe this.

battor.steady_state failing on chromium.perf/Mac Retina Perf

Builders failed on: 
- Mac Retina Perf: 
  https://build.chromium.org/p/chromium.perf/builders/Mac%20Retina%20Perf


https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FMac_Retina_Perf%2F996%2F%2B%2Frecipes%2Fsteps%2Fbattor.steady_state_on_Intel_GPU_on_Mac_on_Mac-10.11%2F0%2Fstdout

[ RUN      ] http://www.slideshare.net/patrickmeenan
(INFO) 2017-08-10 23:12:38,723 chrome_tracing_agent._CreateTraceConfigFile:286  Trace config file string: {"trace_config": {"included_categories": ["toplevel", "rail", "blink.console"], "record_mode": "record-as-much-as-possible"}}
(INFO) 2017-08-10 23:12:38,738 cloud_storage._GetLocked:339  Downloading gs://chromium-telemetry/binary_dependencies/battor/battor_agent_binary_2d1610374495130d8eefc731f3f38be63bb21f2e to /b/s/w/ir/third_party/catapult/common/battor/bin/darwin/x86_64/battor_agent
(ERROR) 2017-08-10 23:13:19,311 battor_wrapper._FlashBattOr:175  Git hash returned from BattOr was not as expected: [0810/231319.310947:FATAL:battor_agent_bin.cc(91)] Fatal error when communicating with the BattOr: TOO MANY COMMAND RETRIES
Traceback (most recent call last):
  File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 162, in _FlashBattOr
    device_git_hash = self.GetFirmwareGitHash()
  File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 393, in GetFirmwareGitHash
    int(self._git_hash, 16)
ValueError: invalid literal for int() with base 16: '[0810/231319.310947:FATAL:battor_agent_bin.cc(91)] Fatal error when communicating with the BattOr: TOO MANY COMMAND RETRIES'
(WARNING) 2017-08-10 23:13:19,313 shared_page_state.DumpStateUponFailure:140  Cannot dump browser state: No browser.
Traceback (most recent call last):
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 97, in _RunStoryAndProcessErrorIfNeeded
    test.WillRunStory(state.platform)
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py", line 280, in WillRunStory
    platform.tracing_controller.StartTracing(self._tbm_options.config)
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller.py", line 43, in StartTracing
    self._tracing_controller_backend.StartTracing(tracing_config, timeout)
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 91, in StartTracing
    started = agent.StartAgentTracing(config, timeout)
  File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/battor_tracing_agent.py", line 73, in StartAgentTracing
    self._battor.StartTracing()
  File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 234, in StartTracing
    self._FlashBattOr()
  File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 176, in _FlashBattOr
    self.StopShell()
  File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 221, in StopShell
    self._SendBattOrCommand(self._EXIT_CMD, check_return=False)
  File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 346, in _SendBattOrCommand
    status = self._SendBattOrCommandImpl(cmd)
  File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 341, in _SendBattOrCommandImpl
    self._battor_shell.stdin.write('%s\n' % cmd)
IOError: [Errno 32] Broken pipe
(INFO) 2017-08-10 23:13:19,491 exception_formatter.PrintFormattedException:19  Try printing formatted exception: None None None
Exception raised when cleaning story run: 
Traceback (most recent call last):
  _RunStoryAndProcessErrorIfNeeded at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:130
    test.DidRunStory(state.platform, results)
  DidRunStory at /b/s/w/ir/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py:310
    trace_result = platform.tracing_controller.StopTracing()
  StopTracing at /b/s/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller.py:47
    return self._tracing_controller_backend.StopTracing()
  StopTracing at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:109
    self._IssueClockSyncMarker()
  _IssueClockSyncMarker at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:213
    self._RecordIssuerClockSyncMarker)
  RecordClockSyncMarker at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py:178
    raise ChromeClockSyncError('Cannot issue clock sync. No devtools clients')
ChromeClockSyncError: Cannot issue clock sync. No devtools clients
Locals:
  devtools_clients                             : []
  record_controller_clock_sync_marker_callback : <bound method TracingControllerBackend._RecordIssuerClockSyncMarker of <telemetry.internal.platform.tracing_controller_backend.TracingControllerBackend object at 0x10688b710>>
  sync_id                                      : '7d5ac5fc-52df-4d16-a3c7-1a514ba05b19'
[  FAILED  ] http://www.slideshare.net/patrickmeenan (40769 ms)
(INFO) 2017-08-10 23:13:20,411 exception_formatter.PrintFormattedException:19  Try printing formatted exception: None None None
Traceback (most recent call last):
  RunBenchmark at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:340
    expectations=expectations, metadata=benchmark.GetMetadata())
  Run at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:203
    _RunStoryAndProcessErrorIfNeeded(story, results, state, test)
  _RunStoryAndProcessErrorIfNeeded at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:97
    test.WillRunStory(state.platform)
  WillRunStory at /b/s/w/ir/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py:280
    platform.tracing_controller.StartTracing(self._tbm_options.config)
  StartTracing at /b/s/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller.py:43
    self._tracing_controller_backend.StartTracing(tracing_config, timeout)
  StartTracing at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:91
    started = agent.StartAgentTracing(config, timeout)
  StartAgentTracing at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/battor_tracing_agent.py:73
    self._battor.StartTracing()
  StartTracing at /b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py:234
    self._FlashBattOr()
  _FlashBattOr at /b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py:176
    self.StopShell()
  StopShell at /b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py:221
    self._SendBattOrCommand(self._EXIT_CMD, check_return=False)
  _SendBattOrCommand at /b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py:346
    status = self._SendBattOrCommandImpl(cmd)
  _SendBattOrCommandImpl at /b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py:341
    self._battor_shell.stdin.write('%s\n' % cmd)
IOError: [Errno 32] Broken pipe
Locals:
  cmd : 'Exit'
(INFO) 2017-08-10 23:13:20,412 cloud_storage.Insert:377  Uploading /b/s/w/itZwyXBu/tmpr_zM1h.png to gs://chrome-telemetry-output/profiler-file-id_0-2017-08-10_23-13-2062972.png
View generated profiler files online at https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/profiler-file-id_0-2017-08-10_23-13-2062972.png for page http://www.slideshare.net/patrickmeenan
[  PASSED  ] 1 test.
[  FAILED  ] 1 test, listed below:
[  FAILED  ]  http://www.slideshare.net/patrickmeenan
1 FAILED TEST
View result at file:///b/s/w/itZwyXBu/tmpIizmkBtelemetry/results-chart.json
View result at file:///b/s/w/itZwyXBu/tmpIizmkBtelemetry/test-results.json
+------------------------------------------------------------------------------+
| End of shard 0  Pending: 1.3s  Duration: 3650.3s  Bot: build30-b4  Exit: -15 |
+------------------------------------------------------------------------------+
Total duration: 3650.3s
WARNING:root:collect_cmd had non-zero return code: 241
Traceback (most recent call last):
  File "/b/rr/tmp6VnB9N/rw/checkout/scripts/slave/recipe_modules/swarming/resources/standard_isolated_script_merge.py", line 45, in <module>
    sys.exit(main())
  File "/b/rr/tmp6VnB9N/rw/checkout/scripts/slave/recipe_modules/swarming/resources/standard_isolated_script_merge.py", line 41, in main
    return StandardIsolatedScriptMerge(args.output_json, args.jsons_to_merge)
  File "/b/rr/tmp6VnB9N/rw/checkout/scripts/slave/recipe_modules/swarming/resources/standard_isolated_script_merge.py", line 24, in StandardIsolatedScriptMerge
    shard_results_list.append(json.load(f))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 290, in load
    **kw)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 384, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
 
Summary: Many BattOr benchmarks failing with crashed BattOr agent but without any serial logs or debug information (was: battor.steady_state failing on chromium.perf/Mac Retina Perf)
Cc: charliea@chromium.org
 Issue 754749  has been merged into this issue.
It looks like when this happens, it may take a _really_ long time (~1h) to manifest, at which point these tests are timing out. I have no idea why this might be the case.
Project Member

Comment 5 by bugdroid1@chromium.org, Aug 16 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d7a7e1777dfe861049b97ec606594202ea31ef0b

commit d7a7e1777dfe861049b97ec606594202ea31ef0b
Author: catapult-deps-roller@chromium.org <catapult-deps-roller@chromium.org>
Date: Wed Aug 16 15:02:40 2017

Roll src/third_party/catapult/ b153f210e..b520c7627 (1 commit)

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/b153f210e275..b520c76271ee

$ git log b153f210e..b520c7627 --date=short --no-merges --format='%ad %ae %s'
2017-08-16 charliea Make the BattOr wrapper gracefully handle broken pipe during shutdown

Created with:
  roll-dep src/third_party/catapult
BUG= 754746 


Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls


CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=sullivan@chromium.org

Change-Id: I697a3a29cd990af7e2edca358256e60e5068b88f
Reviewed-on: https://chromium-review.googlesource.com/616800
Reviewed-by: <catapult-deps-roller@chromium.org>
Commit-Queue: <catapult-deps-roller@chromium.org>
Cr-Commit-Position: refs/heads/master@{#494781}
[modify] https://crrev.com/d7a7e1777dfe861049b97ec606594202ea31ef0b/DEPS

Mergedinto: 757508
Status: Duplicate (was: Available)
Going to mark this as a duplicate now that the serial logs confirm such.

Sign in to add a comment