"connection was forcibly closed" requesting memory dump on Windows bots |
||||||
Issue description
The following error has started appearing frequently on windows bots:
TracingUnrecoverableException: Exception raised while sending a Tracing.requestMemoryDump request:
Traceback (most recent call last):
File "c:\b\s\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\tracing_backend.py", line 204, in DumpMemory
response = self._inspector_websocket.SyncRequest(request, timeout)
File "c:\b\s\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py", line 110, in SyncRequest
res = self._Receive(timeout)
File "c:\b\s\w\ir\third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py", line 149, in _Receive
data = self._socket.recv()
File "c:\b\s\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 293, in recv
opcode, data = self.recv_data()
File "c:\b\s\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 310, in recv_data
opcode, frame = self.recv_data_frame(control_frame)
File "c:\b\s\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 323, in recv_data_frame
frame = self.recv_frame()
File "c:\b\s\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 357, in recv_frame
return self.frame_buffer.recv_frame()
File "c:\b\s\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py", line 336, in recv_frame
self.recv_header()
File "c:\b\s\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py", line 286, in recv_header
header = self.recv_strict(2)
File "c:\b\s\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py", line 371, in recv_strict
bytes_ = self.recv(min(16384, shortage))
File "c:\b\s\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py", line 427, in _recv
return recv(self.sock, bufsize)
File "c:\b\s\w\ir\third_party\catapult\telemetry\third_party\websocket-client\websocket\_socket.py", line 80, in recv
bytes_ = sock.recv(bufsize)
error: [Errno 10054] An existing connection was forcibly closed by the remote host
https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FWin_10_Perf%2F1005%2F%2B%2Frecipes%2Fsteps%2Fsystem_health.memory_desktop_on__102b__GPU_on_Windows_on_Windows-10-10240%2F0%2Fstdout
The error is more frequent on win-10 (15/20 latest builds) often but not always on "load:search:yahoo"; and has also been seen in win-7, win-7-x64, and win-8 a few times.
Furthermore, the error is treated as "fatal", interrupting the rest of the execution of the benchmark.
I'll see if I can find the build where the error started and kick off a bisect from there.
,
Jun 28 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8975536285219265696
,
Jun 28 2017
=== BISECT JOB RESULTS === NO Test failure found Bisect Details Configuration: winx64_10_perf_bisect Benchmark : system_health.memory_desktop Metric : memory:chrome:all_processes:reported_by_chrome:effective_size_avg/load_search/load_search_yahoo Revision Exit Code N chromium@479744 0 +- N/A 20 good chromium@479932 0 +- N/A 20 bad Please refer to the following doc on diagnosing memory regressions: https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md To Run This Test src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.search.yahoo system_health.memory_desktop Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8975536285219265696 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=4521967824142336 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Jun 28 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8975532312681599456
,
Jun 28 2017
=== BISECT JOB RESULTS === NO Test failure found Bisect Details Configuration: winx64_10_perf_bisect Benchmark : system_health.memory_desktop Metric : memory:chrome:all_processes:reported_by_chrome:effective_size_avg/load_search/load_search_yahoo Revision Exit Code N chromium@479744 0 +- N/A 20 good chromium@480744 0 +- N/A 20 bad Please refer to the following doc on diagnosing memory regressions: https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md To Run This Test src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.search.yahoo system_health.memory_desktop Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8975532312681599456 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=4584612874944512 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Jun 28 2017
I suspect this is something wrong with the device in the lab. Do we see other benchmark failures on this bot as well?
,
Jun 28 2017
Note that the error has shown up on all of win-10, win-7, win-7-x64, and win-8. I'm wondering now if the error needs the benchmark to run for longer in order for it to appear?
,
Jun 29 2017
I think we may need to try reproduce this locally. I am very swarmed at the moment, so cc Erik & Etienne in case they are interested in helping out with this.
,
Jun 29 2017
Oh, I bet this is hitting a local timeout:
"""
rror: [Errno 10054] An existing connection was forcibly closed by the remote host
Locals:
request : {'method': 'Tracing.requestMemoryDump', 'id': 0}
timeout : 90
"""
Memory dumps can require over 90 seconds.
,
Jun 29 2017
,
Jun 30 2017
Ah, got it. Will try to increase that timeout.
,
Jul 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b3121dcc73abfee3cd11148cd9c43cf02db3d1a3 commit b3121dcc73abfee3cd11148cd9c43cf02db3d1a3 Author: catapult-deps-roller@chromium.org <catapult-deps-roller@chromium.org> Date: Mon Jul 03 11:46:34 2017 Roll src/third_party/catapult/ 3b0c0e04d..68c788088 (1 commit) https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/3b0c0e04db0b..68c788088273 $ git log 3b0c0e04d..68c788088 --date=short --no-merges --format='%ad %ae %s' 2017-07-03 perezju [Telemetry] Default DumpMemory timeout to 20 minutes Created with: roll-dep src/third_party/catapult BUG= 737565 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, see: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=sullivan@chromium.org Change-Id: I8309233ab2e3e893f1ab2346423a06e6433dacd8 Reviewed-on: https://chromium-review.googlesource.com/558674 Reviewed-by: <catapult-deps-roller@chromium.org> Commit-Queue: <catapult-deps-roller@chromium.org> Cr-Commit-Position: refs/heads/master@{#483990} [modify] https://crrev.com/b3121dcc73abfee3cd11148cd9c43cf02db3d1a3/DEPS
,
Jul 3 2017
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by perezju@chromium.org
, Jun 28 2017