Issue metadata
Sign in to add a comment
|
Random Telemetry test timeouts, mostly on Mac Minis with Intel GPUs |
||||||||||||||||||||||
Issue descriptionhttps://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19321 WebglConformance_deqp_functional_gles3_texturefiltering_2d_array_formats_00 https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19320 WebglConformance_conformance2_textures_canvas_sub_rectangle_tex_2d_rgb9_e5_rgb_float WebglConformance_conformance_glsl_implicit_add_ivec4_vec4_vert https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19298 WebglConformance_conformance2_textures_canvas_tex_3d_rgb5_a1_rgba_unsigned_byte https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20Release%20%28Intel%29/10359 WebglConformance_conformance_glsl_implicit_add_ivec4_vec4_vert Timing out tests seem pretty random to me, not sure if marking them Flaky will help.
,
Dec 23 2017
I don't see anything in common between these failures, could be just various reasons of bot being busy with something else during the test. (Also 19334 got in this list by mistake, updating the description).
,
Dec 23 2017
,
Dec 23 2017
Another timeout: https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20GPU%20ASAN%20Release/6809 WebglConformance_conformance2_glsl3_compound_assignment_type_combination
,
Dec 31 2017
Mo, Kai, James, can any of you please help triage this issue and see if there's any obvious commonality or root cause? It seems like a recent regression and looks like it's only affecting the Mac Intel bots. Thanks.
,
Jan 2 2018
Removing V8 label for now. Feel free to readd when you think V8 is the culprit.
,
Jan 3 2018
I'm not finding any hints looking at these logs and tests. It doesn't seem to be a performance regression: WebglConformance_conformance2_glsl3_compound_assignment_type_combination, for example, took about 11 seconds on build 6810, but timed out after 604 seconds on build 6809 (linked in #4). But I also can't spot any issues that would prevent finishTest from getting called in the test itself - especially for the deqp failures, which use the same code as every other deqp test.
,
Jan 4 2018
Seeing a couple of other failures currently on mac_optional_gpu_tests_rel: WebglConformance_conformance2_textures_image_data_tex_2d_srgb8_alpha8_rgba_unsigned_byte https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19406 WebglConformance_conformance2_textures_canvas_sub_rectangle_tex_2d_rgb9_e5_rgb_float https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19381 I tried injecting a failure into the test, commenting out the call to finishTest() in src/third_party/webgl/src/sdk/tests/js/tests/tex-image-and-sub-image-2d-with-image-data.js and running: ./content/test/gpu/run_gpu_integration_test.py webgl_conformance --webgl-conformance-version=2.0.1 --test-filter=WebglConformance_conformance2_textures_image_data_tex_2d_srgb8_alpha8_rgba_unsigned_byte --browser=canary Here was the resulting stack trace: Traceback (most recent call last): _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:132 self.RunActualGpuTest(url, *args) RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:203 getattr(self, test_name)(test_path, *args[1:]) _RunConformanceTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:217 self._CheckTestCompletion() _CheckTestCompletion at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:211 'webglTestHarness._finished', timeout=self._GetTestTimeout()) traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py:261 return self._tab.WaitForJavaScriptCondition(*args, **kwargs) traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:239 return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs) traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:302 e.message + '\n' + debug_message) TimeoutException: Timed out while waiting 15s for IsJavaScriptExpressionTrue. Compare this to the failures on the bots: Traceback (most recent call last): File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/testing/serially_executed_browser_test_case.py", line 192, in <lambda> return lambda self: based_method(self, *args) File "/b/s/w/ir/content/test/gpu/gpu_tests/gpu_integration_test.py", line 132, in _RunGpuTest self.RunActualGpuTest(url, *args) File "/b/s/w/ir/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py", line 203, in RunActualGpuTest getattr(self, test_name)(test_path, *args[1:]) File "/b/s/w/ir/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py", line 217, in _RunConformanceTest self._CheckTestCompletion() File "/b/s/w/ir/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py", line 211, in _CheckTestCompletion 'webglTestHarness._finished', timeout=self._GetTestTimeout()) File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py", line 261, in WaitForJavaScriptCondition return self._tab.WaitForJavaScriptCondition(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 239, in WaitForJavaScriptCondition return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 289, in WaitForJavaScriptCondition return py_utils.WaitFor(IsJavaScriptExpressionTrue, timeout) File "/b/s/w/ir/third_party/catapult/common/py_utils/py_utils/__init__.py", line 129, in WaitFor res = condition() File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 286, in IsJavaScriptExpressionTrue return self._EvaluateJavaScript(condition, context_id, timeout) File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 41, in Inner inspector_backend._ConvertExceptionFromInspectorWebsocket(e) File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 38, in Inner return func(inspector_backend, *args, **kwargs) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 518, in _EvaluateJavaScript return self._runtime.Evaluate(expression, context_id, timeout) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py", line 44, in Evaluate res = self._inspector_websocket.SyncRequest(request, timeout) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 116, in SyncRequest res = self._Receive(timeout) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 155, in _Receive data = self._socket.recv() File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 293, in recv opcode, data = self.recv_data() File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 310, in recv_data opcode, frame = self.recv_data_frame(control_frame) File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 323, in recv_data_frame frame = self.recv_frame() File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 357, in recv_frame return self.frame_buffer.recv_frame() File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py", line 336, in recv_frame self.recv_header() File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py", line 286, in recv_header header = self.recv_strict(2) File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py", line 371, in recv_strict bytes_ = self.recv(min(16384, shortage)) File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 427, in _recv return recv(self.sock, bufsize) File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py", line 83, in recv raise WebSocketTimeoutException(message) TimeoutException: ******************************************************************************** (/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:487 _ConvertExceptionFromInspectorWebsocket) The app is probably crashed: It looks to me like the renderer process is hanging while Telemetry is making a call to DevTools over its websocket connection. It doesn't look like the test itself is failing to complete for any reason. DevTools and Telemetry folks, do you agree with this assessment? What can we do to get better logging out of this and figure out what's going wrong? Should I simulate a renderer process hang and figure out how to get Telemetry to crash the target process so we get a minidump?
,
Jan 4 2018
,
Jan 4 2018
Per Issue 797919 , WebglConformance_conformance_extensions_oes_texture_float_with_image was seen to fail in the same way in this tryjob: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/618826
,
Jan 4 2018
I suspect that this flakiness is due to flaky websocket connection between telemetry & devtool (https://github.com/catapult-project/catapult/issues/3218) You can also try to see if Telemetry fails to achieve minidump upon a renderer process hang. If that is the case, would be great to expand test in https://cs.chromium.org/chromium/src/tools/perf/core/stacktrace_unittest.py
,
Jan 8 2018
,
Jan 9 2018
->nednguyen for triage
,
Jan 9 2018
,
Jan 9 2018
Issue 795950 has been merged into this issue.
,
Jan 9 2018
Manually injecting a hang of the renderer process has the same symptom as these tryjob failures. In this case, Telemetry *does not* cause the target process to produce a minidump. Please apply these patches: hang-renderer-in-get-max-viewport-dims.txt to chromium/src third-party-webgl-src-diff.txt to third_party/webgl/src Then run (I tested on macOS): ./content/test/gpu/run_gpu_integration_test.py webgl_conformance --browser=release --webgl-conformance-version=2.0.1 --test-filter=WebglConformance_conformance2_textures_image_data_tex_2d_srgb8_alpha8_rgba_unsigned_byte > output.txt 2>&1 It will time out in about 10 seconds with the same stack trace as above. We really need Telemetry to try to crash the renderer process and produce a minidump that it would then symbolize. Telemetry team: how feasible is this? Does Telemetry know the PID, at least on Unix platforms?
,
Jan 9 2018
Note: currently working with DevTools team to try adding a way to crash a renderer process on the IO thread, assuming the main thread is wedged. Will try wiring this into Telemetry to see if we can get a minidump in this scenario.
,
Jan 10 2018
a quick hack that crashes renderer on IO: https://chromium-review.googlesource.com/c/chromium/src/+/858379
,
Jan 10 2018
Thanks Pavel. With that patch as well as the attached one (catapult-diff.txt) applied to Catapult, my test case that hangs the renderer produces now produces symbolized stack traces for all threads in the renderer. Could we formalize the DevTools and catapult changes? If we named the new DevTools method something unlikely to collide with any future desired method, and were able to instrument Catapult, then we could probably get a stack trace off the bots in a day or so. Then we could decide whether to keep the new functionality or not.
,
Jan 10 2018
*kbr@ what does it mean for the renderer thread to be hang but not crashed? What would cause it to hang?
,
Jan 10 2018
I don't know. Maybe going into an infinite or long-running loop -- which of course should not happen. We would need to get a minidump for the renderer process in order to know. Also, if it turns out that the renderer wasn't actually hung, the minidump will help us understand that as well. If both DevTools and Telemetry are instrumented then we should have an answer within about a day. Here's another instance of this failure: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19636 WebglConformance_conformance_glsl_implicit_add_ivec4_vec4_vert failed in this run: https://chromium-swarm.appspot.com/task?id=3ae182b60a2c7a10&refresh=10&show_raw=1 with a WebSocket timeout while waiting for the results of a JavaScript expression evaluation to be returned.
,
Jan 10 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f25c80c34df79e4a95a39a9e4dfc56a03e8ea4e4 commit f25c80c34df79e4a95a39a9e4dfc56a03e8ea4e4 Author: Pavel Feldman <pfeldman@chromium.org> Date: Wed Jan 10 03:43:54 2018 DevTools: crash renderer on IO thread upon Page.crash command. Bug: 797368 Change-Id: Ib4ce90ac6b64278b12a96923ce990d5d0f0b97f0 Reviewed-on: https://chromium-review.googlesource.com/858379 Commit-Queue: Pavel Feldman <pfeldman@chromium.org> Reviewed-by: Dmitry Gozman <dgozman@chromium.org> Cr-Commit-Position: refs/heads/master@{#528233} [modify] https://crrev.com/f25c80c34df79e4a95a39a9e4dfc56a03e8ea4e4/content/browser/devtools/devtools_session.cc [modify] https://crrev.com/f25c80c34df79e4a95a39a9e4dfc56a03e8ea4e4/third_party/WebKit/Source/core/exported/WebDevToolsAgentImpl.cpp [modify] https://crrev.com/f25c80c34df79e4a95a39a9e4dfc56a03e8ea4e4/third_party/WebKit/Source/core/inspector/browser_protocol.json [modify] https://crrev.com/f25c80c34df79e4a95a39a9e4dfc56a03e8ea4e4/third_party/WebKit/Source/core/inspector/browser_protocol.pdl [modify] https://crrev.com/f25c80c34df79e4a95a39a9e4dfc56a03e8ea4e4/third_party/WebKit/Source/core/inspector/inspector_protocol_config.json
,
Jan 10 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/ad6685396e2015c8a593846043ea63944dbf1764 commit ad6685396e2015c8a593846043ea63944dbf1764 Author: Kenneth Russell <kbr@chromium.org> Date: Wed Jan 10 19:22:28 2018 If a sync request to DevTools times out, try to crash the renderer process. This uses a new DevTools API being introduced here: https://chromium-review.googlesource.com/858379 It's not clear whether this is the cause of the random WebGL conformance test hangs. However, if the main thread of the renderer process is hung, this change will provide symbolized stack traces for all threads, by virtue of producing a minidump for the hung renderer process. BUG= chromium:797368 Change-Id: I079a1d3343002b7f2dc37419aeb30f07e9f7a1a3 Reviewed-on: https://chromium-review.googlesource.com/857959 Commit-Queue: Kenneth Russell <kbr@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/ad6685396e2015c8a593846043ea63944dbf1764/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py [modify] https://crrev.com/ad6685396e2015c8a593846043ea63944dbf1764/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py
,
Jan 11 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4ea723aec59a81c7a96aa6182bbe363be0bf0dc9 commit 4ea723aec59a81c7a96aa6182bbe363be0bf0dc9 Author: catapult-deps-roller@chromium.org <catapult-deps-roller@chromium.org> Date: Thu Jan 11 00:05:33 2018 Roll src/third_party/catapult/ 142e757de..6385d5b1d (4 commits) https://chromium.googlesource.com/catapult.git/+log/142e757de9e3..6385d5b1d413 $ git log 142e757de..6385d5b1d --date=short --no-merges --format='%ad %ae %s' 2018-01-10 dtu [pinpoint] Add /api/config endpoint and configuration dropdown. 2018-01-09 dtu [pinpoint] Make "browser" argument optional. 2018-01-10 ssid Fix trace symbolization to work for local directory symbols 2018-01-09 kbr If a sync request to DevTools times out, try to crash the renderer process. Created with: roll-dep src/third_party/catapult BUG= 797368 The AutoRoll server is located here: https://catapult-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=sullivan@chromium.org Change-Id: I548820a64050a7ce3df070639ff484cafe7df7df Reviewed-on: https://chromium-review.googlesource.com/861052 Reviewed-by: <catapult-deps-roller@chromium.org> Commit-Queue: <catapult-deps-roller@chromium.org> Cr-Commit-Position: refs/heads/master@{#528497} [modify] https://crrev.com/4ea723aec59a81c7a96aa6182bbe363be0bf0dc9/DEPS
,
Jan 11 2018
The new code was run on this build: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19868 but it didn't catch a hung renderer. Instead, the new Crash command itself timed out. [8/139] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_rendering_instanced_arrays failed unexpectedly 601.0767s: Found crashpad_database_util No minidump found via crashpad_database_util Found crashpad_database_util No minidump found via crashpad_database_util Can't get standard output with --show-stdout Traceback (most recent call last): _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:132 self.RunActualGpuTest(url, *args) RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:203 getattr(self, test_name)(test_path, *args[1:]) _RunConformanceTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:217 self._CheckTestCompletion() _CheckTestCompletion at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:211 'webglTestHarness._finished', timeout=self._GetTestTimeout()) traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py:261 return self._tab.WaitForJavaScriptCondition(*args, **kwargs) traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:239 return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs) traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:289 return py_utils.WaitFor(IsJavaScriptExpressionTrue, timeout) WaitFor at third_party/catapult/common/py_utils/py_utils/__init__.py:129 res = condition() IsJavaScriptExpressionTrue at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:286 return self._EvaluateJavaScript(condition, context_id, timeout) traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) Inner at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:41 inspector_backend._ConvertExceptionFromInspectorWebsocket(e) traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) Inner at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:38 return func(inspector_backend, *args, **kwargs) _EvaluateJavaScript at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:524 self._runtime.Crash(context_id, timeout) Crash at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py:81 res = self._inspector_websocket.SyncRequest(request, timeout) SyncRequest at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:116 res = self._Receive(timeout) _Receive at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:155 data = self._socket.recv() recv at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:293 opcode, data = self.recv_data() recv_data at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:310 opcode, frame = self.recv_data_frame(control_frame) recv_data_frame at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:323 frame = self.recv_frame() recv_frame at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:357 return self.frame_buffer.recv_frame() recv_frame at third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py:336 self.recv_header() recv_header at third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py:286 header = self.recv_strict(2) recv_strict at third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py:371 bytes_ = self.recv(min(16384, shortage)) _recv at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:427 return recv(self.sock, bufsize) recv at third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py:83 raise WebSocketTimeoutException(message) At this point there are three hypotheses: 1) The browser is hung. 2) The renderer process is alive, but the navigation hasn't committed yet. In this case DevTools buffers up commands to be sent to the target renderer. The new Crash command would be one of those buffered up. 3) The renderer process's IO thread is hung. Option (2) sounds most likely at this point. caseq@ is going to revise the DevTools Page.crash command to return a string that indicates what DevTools thinks the state of the renderer process is (gone, not yet committed, alive). Then we will modify Telemetry to print out that string in the logs.
,
Jan 13 2018
Another random test timeout on these Mac Intel bots, this time in the pixel_test suite: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/627412 Pixel_WebGLTransparentGreenTriangle_NoAlpha_ImplicitClear failed. The browser was restarted immediately before running this test, because the command line arguments for this test differed from those needed by the previous test. We really need to verify that caseq's https://chromium-review.googlesource.com/862904 can provide more context of why we failed to terminate the target renderer, and get that in. Within a day we should have some evidence of what the problem is. At this point we think the browser's first navigation was dropped on the floor due to a race condition.
,
Jan 17 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7be978ffe3f6071d973eeae17aa05bd593d03777 commit 7be978ffe3f6071d973eeae17aa05bd593d03777 Author: Andrey Kosyakov <caseq@chromium.org> Date: Wed Jan 17 05:44:27 2018 DevTools: implement Page.crash() in browser This lets us provide a meaningful diagnostics in case the renderer process is already dead, or the navigation is in progress. Bug: 797368 Change-Id: Iac9f74a3a47fd9af5a95a7fa9e8a6b5a9935a82e Reviewed-on: https://chromium-review.googlesource.com/862904 Commit-Queue: Andrey Kosyakov <caseq@chromium.org> Reviewed-by: Kenneth Russell <kbr@chromium.org> Reviewed-by: Dmitry Gozman <dgozman@chromium.org> Cr-Commit-Position: refs/heads/master@{#529637} [modify] https://crrev.com/7be978ffe3f6071d973eeae17aa05bd593d03777/content/browser/devtools/protocol/page_handler.cc [modify] https://crrev.com/7be978ffe3f6071d973eeae17aa05bd593d03777/content/browser/devtools/protocol/page_handler.h [modify] https://crrev.com/7be978ffe3f6071d973eeae17aa05bd593d03777/content/browser/devtools/protocol_config.json
,
Jan 18 2018
Can't find any examples of this sort of renderer hang since the above CL landed. Closest failure is: https://ci.chromium.org/buildbot/chromium.gpu/Mac%20Release%20%28Intel%29/80356 which happened before it landed. An incidence of Issue 774682 on mac_chromium_rel_ng today, but none of this. Will continue to monitor.
,
Jan 18 2018
I take that back: https://ci.chromium.org/buildbot/chromium.gpu/Mac%20Release%20%28Intel%29/80356 was built at commit position 529730, and the CL above landed at commit position 529637. Also, this tryjob: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/630154 ran at commit position 529927, which is well after the above CL landed. The symptom is exactly the same. See: https://chromium-swarm.appspot.com/task?id=3b1f22c2d39da210&refresh=10&show_raw=1 Looking for WebglConformance_conformance_extensions_ext_shader_texture_lod : Traceback (most recent call last): _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:132 self.RunActualGpuTest(url, *args) RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:203 getattr(self, test_name)(test_path, *args[1:]) _RunConformanceTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:217 self._CheckTestCompletion() _CheckTestCompletion at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:211 'webglTestHarness._finished', timeout=self._GetTestTimeout()) traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py:261 return self._tab.WaitForJavaScriptCondition(*args, **kwargs) traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:239 return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs) traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:289 return py_utils.WaitFor(IsJavaScriptExpressionTrue, timeout) WaitFor at third_party/catapult/common/py_utils/py_utils/__init__.py:129 res = condition() IsJavaScriptExpressionTrue at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:286 return self._EvaluateJavaScript(condition, context_id, timeout) traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) Inner at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:41 inspector_backend._ConvertExceptionFromInspectorWebsocket(e) traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) Inner at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:38 return func(inspector_backend, *args, **kwargs) _EvaluateJavaScript at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:524 self._runtime.Crash(context_id, timeout) Crash at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py:81 res = self._inspector_websocket.SyncRequest(request, timeout) SyncRequest at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:116 res = self._Receive(timeout) _Receive at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:155 data = self._socket.recv() recv at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:293 opcode, data = self.recv_data() recv_data at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:310 opcode, frame = self.recv_data_frame(control_frame) recv_data_frame at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:323 frame = self.recv_frame() recv_frame at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:357 return self.frame_buffer.recv_frame() recv_frame at third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py:336 self.recv_header() recv_header at third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py:286 header = self.recv_strict(2) recv_strict at third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py:371 bytes_ = self.recv(min(16384, shortage)) _recv at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:427 return recv(self.sock, bufsize) recv at third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py:83 raise WebSocketTimeoutException(message) TimeoutException: The new Crash() call to DevTools still hung. caseq and DevTools team: how do we make further progress on this? Does this indicate that the browser is hung?
,
Jan 23 2018
Taking back from caseq. We discussed an idea to log the most recent request and response in Telemetry's inspector_websocket.py, to find out what query Telemetry made of the browser just before it (or the renderer process) hung. While considering logging that information while using the new Crash() call, we found that hundreds of calls to the browser's GetSystemInfo method were being made. Eliminating them in https://chromium-review.googlesource.com/879752 , because that might allow us to just pass the arguments "-v -v" to Telemetry, logging all DevTools calls.
,
Jan 23 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d91068f2f0757122b23f48a0ff77c89e63efbb97 commit d91068f2f0757122b23f48a0ff77c89e63efbb97 Author: Kenneth Russell <kbr@chromium.org> Date: Tue Jan 23 03:41:59 2018 Reduce the number of calls to Browser.GetSystemInfo. While adding some debug logging, it was discovered that hundreds of calls to this API were being made upon browser restart while building the TestExpectations cache. Cache the result. TBR=caseq@chromium.org Bug: 797368 Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Change-Id: Ic3ae33928e90759ce933b93a74abb897e4a1fb51 Reviewed-on: https://chromium-review.googlesource.com/879752 Commit-Queue: Kenneth Russell <kbr@chromium.org> Reviewed-by: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#531147} [modify] https://crrev.com/d91068f2f0757122b23f48a0ff77c89e63efbb97/content/test/gpu/gpu_tests/gpu_test_expectations.py
,
Jan 24 2018
Another instance here: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/634538 TraceTest_WebGLGreenTriangle_AA_Alpha this time
,
Jan 31 2018
Another instance here, this time with maps_pixel_test: https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20Experimental%20Release%20%28Intel%29/3779
,
Jan 31 2018
This just happened on Linux too, in trace_test, TraceTest_WebGLGreenTriangle_NoAA_Alpha: https://ci.chromium.org/buildbot/chromium.gpu/Linux%20Debug%20%28NVIDIA%29/89457 Same symptom; attempt to crash the target renderer process timed out, no response from DevTools. Traceback (most recent call last): File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/testing/serially_executed_browser_test_case.py", line 206, in <lambda> return lambda self: based_method(self, *args) File "/b/s/w/ir/content/test/gpu/gpu_tests/gpu_integration_test.py", line 132, in _RunGpuTest self.RunActualGpuTest(url, *args) File "/b/s/w/ir/content/test/gpu/gpu_tests/trace_integration_test.py", line 93, in RunActualGpuTest 'domAutomationController._finished', timeout=30) File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function return func(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py", line 261, in WaitForJavaScriptCondition return self._tab.WaitForJavaScriptCondition(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function return func(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 239, in WaitForJavaScriptCondition return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function return func(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 289, in WaitForJavaScriptCondition return py_utils.WaitFor(IsJavaScriptExpressionTrue, timeout) File "/b/s/w/ir/third_party/catapult/common/py_utils/py_utils/__init__.py", line 129, in WaitFor res = condition() File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 286, in IsJavaScriptExpressionTrue return self._EvaluateJavaScript(condition, context_id, timeout) File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function return func(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 41, in Inner inspector_backend._ConvertExceptionFromInspectorWebsocket(e) File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function return func(*args, **kwargs) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 38, in Inner return func(inspector_backend, *args, **kwargs) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 524, in _EvaluateJavaScript self._runtime.Crash(context_id, timeout) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py", line 81, in Crash res = self._inspector_websocket.SyncRequest(request, timeout) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 116, in SyncRequest res = self._Receive(timeout) File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 155, in _Receive data = self._socket.recv() File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 293, in recv opcode, data = self.recv_data() File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 310, in recv_data opcode, frame = self.recv_data_frame(control_frame) File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 323, in recv_data_frame frame = self.recv_frame() File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 357, in recv_frame return self.frame_buffer.recv_frame() File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py", line 336, in recv_frame self.recv_header() File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py", line 286, in recv_header header = self.recv_strict(2) File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py", line 371, in recv_strict bytes_ = self.recv(min(16384, shortage)) File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 427, in _recv return recv(self.sock, bufsize) File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py", line 83, in recv raise WebSocketTimeoutException(message) TimeoutException: ******************************************************************************** (/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:487 _ConvertExceptionFromInspectorWebsocket) The app is probably crashed: Found Minidump: False Stack Trace: ******************************************************************************** No crash dump found.
,
Jan 31 2018
,
Feb 5 2018
,
Mar 7 2018
,
Mar 8 2018
I've been having a problem on my Mac where when I plug in my 4K monitor, Chrome freezes and I have to force quit. Is there a chance that it's related to this? I see "mac" and "gpu" which makes me suspicious.
,
Mar 8 2018
Probably not related since these machines never change their monitors' configurations. Any chance you could instead kill Chrome with kill -SEGV [pid] or similar so that a minidump is generated? Not sure exactly whether that will produce a minidump though -- I'll be testing this.
,
Mar 10 2018
#38: or, run "sample [pid] -file output.txt" and feed the resulting text file into crsym/ . That should show the stack trace which is wedged. (See Issue 820727 , where the category for that bug was found in this way.)
,
Jul 20
I had been planning for a while to do more failure injection to cause the renderer process to hang on the IO thread (so it doesn't receive the IPC to crash itself) and see whether we could change the browser's child process management code to crash it in a way that would produce a minidump (sending the right kind of signal on POSIX platforms, at least). It's been painful to revisit this issue and since leaving it alone for a while it's gone away. There's no evidence of this failure mode on the Mac Intel bot: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Release%20%28Intel%29 nor on a couple of other bots that were previously affected: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Debug%20%28NVIDIA%29?limit=200 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng?limit=200 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_rel_ng?limit=200 In investigating the current test failures on these bots I filed Issue 866194 . At this point I'm closing this as WontFix since it doesn't reproduce any more. I've seen at least one situation where the ability to crash the renderer using DevTools did produce an actionable symbolized minidump.
,
Jul 20
,
Jul 20
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by kbr@chromium.org
, Dec 22 2017