Issue metadata
Sign in to add a comment
|
Many BattOr benchmarks failing with crashed BattOr agent but without any serial logs or debug information |
||||||||||||||||||||||
Issue descriptionCharlie, feel free to dupe this. battor.steady_state failing on chromium.perf/Mac Retina Perf Builders failed on: - Mac Retina Perf: https://build.chromium.org/p/chromium.perf/builders/Mac%20Retina%20Perf https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FMac_Retina_Perf%2F996%2F%2B%2Frecipes%2Fsteps%2Fbattor.steady_state_on_Intel_GPU_on_Mac_on_Mac-10.11%2F0%2Fstdout [ RUN ] http://www.slideshare.net/patrickmeenan (INFO) 2017-08-10 23:12:38,723 chrome_tracing_agent._CreateTraceConfigFile:286 Trace config file string: {"trace_config": {"included_categories": ["toplevel", "rail", "blink.console"], "record_mode": "record-as-much-as-possible"}} (INFO) 2017-08-10 23:12:38,738 cloud_storage._GetLocked:339 Downloading gs://chromium-telemetry/binary_dependencies/battor/battor_agent_binary_2d1610374495130d8eefc731f3f38be63bb21f2e to /b/s/w/ir/third_party/catapult/common/battor/bin/darwin/x86_64/battor_agent (ERROR) 2017-08-10 23:13:19,311 battor_wrapper._FlashBattOr:175 Git hash returned from BattOr was not as expected: [0810/231319.310947:FATAL:battor_agent_bin.cc(91)] Fatal error when communicating with the BattOr: TOO MANY COMMAND RETRIES Traceback (most recent call last): File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 162, in _FlashBattOr device_git_hash = self.GetFirmwareGitHash() File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 393, in GetFirmwareGitHash int(self._git_hash, 16) ValueError: invalid literal for int() with base 16: '[0810/231319.310947:FATAL:battor_agent_bin.cc(91)] Fatal error when communicating with the BattOr: TOO MANY COMMAND RETRIES' (WARNING) 2017-08-10 23:13:19,313 shared_page_state.DumpStateUponFailure:140 Cannot dump browser state: No browser. Traceback (most recent call last): File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 97, in _RunStoryAndProcessErrorIfNeeded test.WillRunStory(state.platform) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py", line 280, in WillRunStory platform.tracing_controller.StartTracing(self._tbm_options.config) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller.py", line 43, in StartTracing self._tracing_controller_backend.StartTracing(tracing_config, timeout) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 91, in StartTracing started = agent.StartAgentTracing(config, timeout) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/battor_tracing_agent.py", line 73, in StartAgentTracing self._battor.StartTracing() File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 234, in StartTracing self._FlashBattOr() File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 176, in _FlashBattOr self.StopShell() File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 221, in StopShell self._SendBattOrCommand(self._EXIT_CMD, check_return=False) File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 346, in _SendBattOrCommand status = self._SendBattOrCommandImpl(cmd) File "/b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 341, in _SendBattOrCommandImpl self._battor_shell.stdin.write('%s\n' % cmd) IOError: [Errno 32] Broken pipe (INFO) 2017-08-10 23:13:19,491 exception_formatter.PrintFormattedException:19 Try printing formatted exception: None None None Exception raised when cleaning story run: Traceback (most recent call last): _RunStoryAndProcessErrorIfNeeded at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:130 test.DidRunStory(state.platform, results) DidRunStory at /b/s/w/ir/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py:310 trace_result = platform.tracing_controller.StopTracing() StopTracing at /b/s/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller.py:47 return self._tracing_controller_backend.StopTracing() StopTracing at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:109 self._IssueClockSyncMarker() _IssueClockSyncMarker at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:213 self._RecordIssuerClockSyncMarker) RecordClockSyncMarker at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py:178 raise ChromeClockSyncError('Cannot issue clock sync. No devtools clients') ChromeClockSyncError: Cannot issue clock sync. No devtools clients Locals: devtools_clients : [] record_controller_clock_sync_marker_callback : <bound method TracingControllerBackend._RecordIssuerClockSyncMarker of <telemetry.internal.platform.tracing_controller_backend.TracingControllerBackend object at 0x10688b710>> sync_id : '7d5ac5fc-52df-4d16-a3c7-1a514ba05b19' [ FAILED ] http://www.slideshare.net/patrickmeenan (40769 ms) (INFO) 2017-08-10 23:13:20,411 exception_formatter.PrintFormattedException:19 Try printing formatted exception: None None None Traceback (most recent call last): RunBenchmark at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:340 expectations=expectations, metadata=benchmark.GetMetadata()) Run at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:203 _RunStoryAndProcessErrorIfNeeded(story, results, state, test) _RunStoryAndProcessErrorIfNeeded at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:97 test.WillRunStory(state.platform) WillRunStory at /b/s/w/ir/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py:280 platform.tracing_controller.StartTracing(self._tbm_options.config) StartTracing at /b/s/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller.py:43 self._tracing_controller_backend.StartTracing(tracing_config, timeout) StartTracing at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:91 started = agent.StartAgentTracing(config, timeout) StartAgentTracing at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/battor_tracing_agent.py:73 self._battor.StartTracing() StartTracing at /b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py:234 self._FlashBattOr() _FlashBattOr at /b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py:176 self.StopShell() StopShell at /b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py:221 self._SendBattOrCommand(self._EXIT_CMD, check_return=False) _SendBattOrCommand at /b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py:346 status = self._SendBattOrCommandImpl(cmd) _SendBattOrCommandImpl at /b/s/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py:341 self._battor_shell.stdin.write('%s\n' % cmd) IOError: [Errno 32] Broken pipe Locals: cmd : 'Exit' (INFO) 2017-08-10 23:13:20,412 cloud_storage.Insert:377 Uploading /b/s/w/itZwyXBu/tmpr_zM1h.png to gs://chrome-telemetry-output/profiler-file-id_0-2017-08-10_23-13-2062972.png View generated profiler files online at https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/profiler-file-id_0-2017-08-10_23-13-2062972.png for page http://www.slideshare.net/patrickmeenan [ PASSED ] 1 test. [ FAILED ] 1 test, listed below: [ FAILED ] http://www.slideshare.net/patrickmeenan 1 FAILED TEST View result at file:///b/s/w/itZwyXBu/tmpIizmkBtelemetry/results-chart.json View result at file:///b/s/w/itZwyXBu/tmpIizmkBtelemetry/test-results.json +------------------------------------------------------------------------------+ | End of shard 0 Pending: 1.3s Duration: 3650.3s Bot: build30-b4 Exit: -15 | +------------------------------------------------------------------------------+ Total duration: 3650.3s WARNING:root:collect_cmd had non-zero return code: 241 Traceback (most recent call last): File "/b/rr/tmp6VnB9N/rw/checkout/scripts/slave/recipe_modules/swarming/resources/standard_isolated_script_merge.py", line 45, in <module> sys.exit(main()) File "/b/rr/tmp6VnB9N/rw/checkout/scripts/slave/recipe_modules/swarming/resources/standard_isolated_script_merge.py", line 41, in main return StandardIsolatedScriptMerge(args.output_json, args.jsons_to_merge) File "/b/rr/tmp6VnB9N/rw/checkout/scripts/slave/recipe_modules/swarming/resources/standard_isolated_script_merge.py", line 24, in StandardIsolatedScriptMerge shard_results_list.append(json.load(f)) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 290, in load **kw) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads return _default_decoder.decode(s) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 384, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded
,
Aug 14 2017
,
Aug 15 2017
It looks like when this happens, it may take a _really_ long time (~1h) to manifest, at which point these tests are timing out. I have no idea why this might be the case.
,
Aug 15 2017
Other instances: https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Ftryserver.chromium.perf%2Fmac_retina_perf_bisect%2F2387%2F%2B%2Frecipes%2Fsteps%2FGathering_reference_values%2F0%2Fsteps%2FPerformance_Test_5_of_6__2_%2F0%2Fstdout https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Ftryserver.chromium.perf%2Fmac_retina_perf_bisect%2F2471%2F%2B%2Frecipes%2Fsteps%2FBisecting_revision__4_%2F0%2Fsteps%2FBisecting_revision.Performance_Test_2_of_3__2_%2F0%2Fstdout
,
Aug 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d7a7e1777dfe861049b97ec606594202ea31ef0b commit d7a7e1777dfe861049b97ec606594202ea31ef0b Author: catapult-deps-roller@chromium.org <catapult-deps-roller@chromium.org> Date: Wed Aug 16 15:02:40 2017 Roll src/third_party/catapult/ b153f210e..b520c7627 (1 commit) https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/b153f210e275..b520c76271ee $ git log b153f210e..b520c7627 --date=short --no-merges --format='%ad %ae %s' 2017-08-16 charliea Make the BattOr wrapper gracefully handle broken pipe during shutdown Created with: roll-dep src/third_party/catapult BUG= 754746 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, see: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=sullivan@chromium.org Change-Id: I697a3a29cd990af7e2edca358256e60e5068b88f Reviewed-on: https://chromium-review.googlesource.com/616800 Reviewed-by: <catapult-deps-roller@chromium.org> Commit-Queue: <catapult-deps-roller@chromium.org> Cr-Commit-Position: refs/heads/master@{#494781} [modify] https://crrev.com/d7a7e1777dfe861049b97ec606594202ea31ef0b/DEPS
,
Aug 30 2017
Going to mark this as a duplicate now that the serial logs confirm such. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by charliea@chromium.org
, Aug 14 2017