New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 797368 link

Starred by 4 users

Issue metadata

Status: WontFix
Owner:
OOO until 2019-01-24
Closed: Jul 20
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Mac
Pri: 1
Type: Bug-Regression

Blocking:
issue 795942



Sign in to add a comment

Random Telemetry test timeouts, mostly on Mac Minis with Intel GPUs

Project Member Reported by ynovikov@chromium.org, Dec 22 2017

Issue description

https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19321
WebglConformance_deqp_functional_gles3_texturefiltering_2d_array_formats_00

https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19320
WebglConformance_conformance2_textures_canvas_sub_rectangle_tex_2d_rgb9_e5_rgb_float
WebglConformance_conformance_glsl_implicit_add_ivec4_vec4_vert

https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19298
WebglConformance_conformance2_textures_canvas_tex_3d_rgb5_a1_rgba_unsigned_byte

https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20Release%20%28Intel%29/10359
WebglConformance_conformance_glsl_implicit_add_ivec4_vec4_vert

Timing out tests seem pretty random to me, not sure if marking them Flaky will help.
 

Comment 1 by kbr@chromium.org, Dec 22 2017

Components: Blink>JavaScript
This is discouraging. Are there any hints of process crashes, like renderer or GPU, in any of these logs?

Any patterns regarding the tests that ran just before these? Sometimes if something went wrong in a previous test, the next one that runs might be affected.

Is this happening on the Mac AMD bots?

I don't see anything in common between these failures, could be just various reasons of bot being busy with something else during the test.
(Also 19334 got in this list by mistake, updating the description).
Description: Show this description
Another timeout:
https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20GPU%20ASAN%20Release/6809
WebglConformance_conformance2_glsl3_compound_assignment_type_combination

Comment 5 by kbr@chromium.org, Dec 31 2017

Cc: kainino@chromium.org jdarpinian@chromium.org
Mo, Kai, James, can any of you please help triage this issue and see if there's any obvious commonality or root cause? It seems like a recent regression and looks like it's only affecting the Mac Intel bots. Thanks.

Components: -Blink>JavaScript
Removing V8 label for now. Feel free to readd when you think V8 is the culprit.
I'm not finding any hints looking at these logs and tests. It doesn't seem to be a performance regression: WebglConformance_conformance2_glsl3_compound_assignment_type_combination, for example, took about 11 seconds on build 6810, but timed out after 604 seconds on build 6809 (linked in #4). But I also can't spot any issues that would prevent finishTest from getting called in the test itself - especially for the deqp failures, which use the same code as every other deqp test.

Comment 8 by kbr@chromium.org, Jan 4 2018

Cc: perezju@chromium.org nedngu...@google.com caseq@chromium.org pfeldman@chromium.org
Components: Platform>DevTools>Platform Tests>Telemetry
Seeing a couple of other failures currently on mac_optional_gpu_tests_rel:

WebglConformance_conformance2_textures_image_data_tex_2d_srgb8_alpha8_rgba_unsigned_byte 
https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19406

WebglConformance_conformance2_textures_canvas_sub_rectangle_tex_2d_rgb9_e5_rgb_float
https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19381

I tried injecting a failure into the test, commenting out the call to finishTest() in src/third_party/webgl/src/sdk/tests/js/tests/tex-image-and-sub-image-2d-with-image-data.js and running:

./content/test/gpu/run_gpu_integration_test.py webgl_conformance --webgl-conformance-version=2.0.1 --test-filter=WebglConformance_conformance2_textures_image_data_tex_2d_srgb8_alpha8_rgba_unsigned_byte --browser=canary

Here was the resulting stack trace:

  Traceback (most recent call last):
    _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:132
      self.RunActualGpuTest(url, *args)
    RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:203
      getattr(self, test_name)(test_path, *args[1:])
    _RunConformanceTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:217
      self._CheckTestCompletion()
    _CheckTestCompletion at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:211
      'webglTestHarness._finished', timeout=self._GetTestTimeout())
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py:261
      return self._tab.WaitForJavaScriptCondition(*args, **kwargs)
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:239
      return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs)
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:302
      e.message + '\n' + debug_message)
  TimeoutException: Timed out while waiting 15s for IsJavaScriptExpressionTrue.


Compare this to the failures on the bots:

  Traceback (most recent call last):
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/testing/serially_executed_browser_test_case.py", line 192, in <lambda>
      return lambda self: based_method(self, *args)
    File "/b/s/w/ir/content/test/gpu/gpu_tests/gpu_integration_test.py", line 132, in _RunGpuTest
      self.RunActualGpuTest(url, *args)
    File "/b/s/w/ir/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py", line 203, in RunActualGpuTest
      getattr(self, test_name)(test_path, *args[1:])
    File "/b/s/w/ir/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py", line 217, in _RunConformanceTest
      self._CheckTestCompletion()
    File "/b/s/w/ir/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py", line 211, in _CheckTestCompletion
      'webglTestHarness._finished', timeout=self._GetTestTimeout())
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py", line 261, in WaitForJavaScriptCondition
      return self._tab.WaitForJavaScriptCondition(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 239, in WaitForJavaScriptCondition
      return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 289, in WaitForJavaScriptCondition
      return py_utils.WaitFor(IsJavaScriptExpressionTrue, timeout)
    File "/b/s/w/ir/third_party/catapult/common/py_utils/py_utils/__init__.py", line 129, in WaitFor
      res = condition()
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 286, in IsJavaScriptExpressionTrue
      return self._EvaluateJavaScript(condition, context_id, timeout)
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 41, in Inner
      inspector_backend._ConvertExceptionFromInspectorWebsocket(e)
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 38, in Inner
      return func(inspector_backend, *args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 518, in _EvaluateJavaScript
      return self._runtime.Evaluate(expression, context_id, timeout)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py", line 44, in Evaluate
      res = self._inspector_websocket.SyncRequest(request, timeout)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 116, in SyncRequest
      res = self._Receive(timeout)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 155, in _Receive
      data = self._socket.recv()
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 293, in recv
      opcode, data = self.recv_data()
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 310, in recv_data
      opcode, frame = self.recv_data_frame(control_frame)
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 323, in recv_data_frame
      frame = self.recv_frame()
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 357, in recv_frame
      return self.frame_buffer.recv_frame()
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py", line 336, in recv_frame
      self.recv_header()
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py", line 286, in recv_header
      header = self.recv_strict(2)
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py", line 371, in recv_strict
      bytes_ = self.recv(min(16384, shortage))
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 427, in _recv
      return recv(self.sock, bufsize)
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py", line 83, in recv
      raise WebSocketTimeoutException(message)
  TimeoutException: 
  ********************************************************************************
  (/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:487 _ConvertExceptionFromInspectorWebsocket) The app is probably crashed:



It looks to me like the renderer process is hanging while Telemetry is making a call to DevTools over its websocket connection. It doesn't look like the test itself is failing to complete for any reason.

DevTools and Telemetry folks, do you agree with this assessment?

What can we do to get better logging out of this and figure out what's going wrong? Should I simulate a renderer process hang and figure out how to get Telemetry to crash the target process so we get a minidump?

Comment 9 by kbr@chromium.org, Jan 4 2018

Cc: haraken@chromium.org
 Issue 797919  has been merged into this issue.

Comment 10 by kbr@chromium.org, Jan 4 2018

Per  Issue 797919 , WebglConformance_conformance_extensions_oes_texture_float_with_image was seen to fail in the same way in this tryjob:

https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/618826

I suspect that this flakiness is due to flaky websocket connection between telemetry & devtool (https://github.com/catapult-project/catapult/issues/3218)


You can also try to see if Telemetry fails to achieve minidump upon a renderer process hang. If that is the case, would be great to expand test in https://cs.chromium.org/chromium/src/tools/perf/core/stacktrace_unittest.py

Components: -Platform>DevTools>Platform
Owner: nednguyen@chromium.org
Status: Available (was: Untriaged)
->nednguyen for triage
Owner: kbr@chromium.org
 Issue 795950  has been merged into this issue.

Comment 16 by kbr@chromium.org, Jan 9 2018

Cc: ashleymarie@chromium.org eyaich@chromium.org
Status: Assigned (was: Available)
Manually injecting a hang of the renderer process has the same symptom as these tryjob failures. In this case, Telemetry *does not* cause the target process to produce a minidump.

Please apply these patches:

hang-renderer-in-get-max-viewport-dims.txt to chromium/src
third-party-webgl-src-diff.txt to third_party/webgl/src

Then run (I tested on macOS):

./content/test/gpu/run_gpu_integration_test.py webgl_conformance --browser=release --webgl-conformance-version=2.0.1 --test-filter=WebglConformance_conformance2_textures_image_data_tex_2d_srgb8_alpha8_rgba_unsigned_byte > output.txt 2>&1

It will time out in about 10 seconds with the same stack trace as above.

We really need Telemetry to try to crash the renderer process and produce a minidump that it would then symbolize. Telemetry team: how feasible is this? Does Telemetry know the PID, at least on Unix platforms?

hang-renderer-in-get-max-viewport-dims.txt
1.7 KB View Download
third-party-webgl-src-diff.txt
798 bytes View Download

Comment 17 by kbr@chromium.org, Jan 9 2018

Note: currently working with DevTools team to try adding a way to crash a renderer process on the IO thread, assuming the main thread is wedged. Will try wiring this into Telemetry to see if we can get a minidump in this scenario.

a quick hack that crashes renderer on IO: https://chromium-review.googlesource.com/c/chromium/src/+/858379

Comment 19 by kbr@chromium.org, Jan 10 2018

Thanks Pavel. With that patch as well as the attached one (catapult-diff.txt) applied to Catapult, my test case that hangs the renderer produces now produces symbolized stack traces for all threads in the renderer.

Could we formalize the DevTools and catapult changes? If we named the new DevTools method something unlikely to collide with any future desired method, and were able to instrument Catapult, then we could probably get a stack trace off the bots in a day or so. Then we could decide whether to keep the new functionality or not.

catapult-diff.txt
1.9 KB View Download
*kbr@ what does it mean for the renderer thread to be hang but not crashed? What would cause it to hang?

Comment 21 by kbr@chromium.org, Jan 10 2018

Labels: Hotlist-PixelWrangler
I don't know. Maybe going into an infinite or long-running loop -- which of course should not happen. We would need to get a minidump for the renderer process in order to know. Also, if it turns out that the renderer wasn't actually hung, the minidump will help us understand that as well.

If both DevTools and Telemetry are instrumented then we should have an answer within about a day. Here's another instance of this failure:

https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19636

WebglConformance_conformance_glsl_implicit_add_ivec4_vec4_vert failed in this run:

https://chromium-swarm.appspot.com/task?id=3ae182b60a2c7a10&refresh=10&show_raw=1

with a WebSocket timeout while waiting for the results of a JavaScript expression evaluation to be returned.

Project Member

Comment 23 by bugdroid1@chromium.org, Jan 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/ad6685396e2015c8a593846043ea63944dbf1764

commit ad6685396e2015c8a593846043ea63944dbf1764
Author: Kenneth Russell <kbr@chromium.org>
Date: Wed Jan 10 19:22:28 2018

If a sync request to DevTools times out, try to crash the renderer process.

This uses a new DevTools API being introduced here:
  https://chromium-review.googlesource.com/858379

It's not clear whether this is the cause of the random WebGL conformance
test hangs. However, if the main thread of the renderer process is hung,
this change will provide symbolized stack traces for all threads, by virtue
of producing a minidump for the hung renderer process.

BUG= chromium:797368 

Change-Id: I079a1d3343002b7f2dc37419aeb30f07e9f7a1a3
Reviewed-on: https://chromium-review.googlesource.com/857959
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>

[modify] https://crrev.com/ad6685396e2015c8a593846043ea63944dbf1764/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py
[modify] https://crrev.com/ad6685396e2015c8a593846043ea63944dbf1764/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py

Project Member

Comment 24 by bugdroid1@chromium.org, Jan 11 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4ea723aec59a81c7a96aa6182bbe363be0bf0dc9

commit 4ea723aec59a81c7a96aa6182bbe363be0bf0dc9
Author: catapult-deps-roller@chromium.org <catapult-deps-roller@chromium.org>
Date: Thu Jan 11 00:05:33 2018

Roll src/third_party/catapult/ 142e757de..6385d5b1d (4 commits)

https://chromium.googlesource.com/catapult.git/+log/142e757de9e3..6385d5b1d413

$ git log 142e757de..6385d5b1d --date=short --no-merges --format='%ad %ae %s'
2018-01-10 dtu [pinpoint] Add /api/config endpoint and configuration dropdown.
2018-01-09 dtu [pinpoint] Make "browser" argument optional.
2018-01-10 ssid Fix trace symbolization to work for local directory symbols
2018-01-09 kbr If a sync request to DevTools times out, try to crash the renderer process.

Created with:
  roll-dep src/third_party/catapult
BUG= 797368 


The AutoRoll server is located here: https://catapult-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=sullivan@chromium.org

Change-Id: I548820a64050a7ce3df070639ff484cafe7df7df
Reviewed-on: https://chromium-review.googlesource.com/861052
Reviewed-by: <catapult-deps-roller@chromium.org>
Commit-Queue: <catapult-deps-roller@chromium.org>
Cr-Commit-Position: refs/heads/master@{#528497}
[modify] https://crrev.com/4ea723aec59a81c7a96aa6182bbe363be0bf0dc9/DEPS

Comment 25 by kbr@chromium.org, Jan 11 2018

The new code was run on this build:

https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_optional_gpu_tests_rel/19868

but it didn't catch a hung renderer. Instead, the new Crash command itself timed out.

[8/139] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_rendering_instanced_arrays failed unexpectedly 601.0767s:
  Found crashpad_database_util
  No minidump found via crashpad_database_util
  Found crashpad_database_util
  No minidump found via crashpad_database_util
  Can't get standard output with --show-stdout
  
  Traceback (most recent call last):
    _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:132
      self.RunActualGpuTest(url, *args)
    RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:203
      getattr(self, test_name)(test_path, *args[1:])
    _RunConformanceTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:217
      self._CheckTestCompletion()
    _CheckTestCompletion at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:211
      'webglTestHarness._finished', timeout=self._GetTestTimeout())
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py:261
      return self._tab.WaitForJavaScriptCondition(*args, **kwargs)
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:239
      return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs)
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:289
      return py_utils.WaitFor(IsJavaScriptExpressionTrue, timeout)
    WaitFor at third_party/catapult/common/py_utils/py_utils/__init__.py:129
      res = condition()
    IsJavaScriptExpressionTrue at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:286
      return self._EvaluateJavaScript(condition, context_id, timeout)
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    Inner at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:41
      inspector_backend._ConvertExceptionFromInspectorWebsocket(e)
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    Inner at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:38
      return func(inspector_backend, *args, **kwargs)
    _EvaluateJavaScript at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:524
      self._runtime.Crash(context_id, timeout)
    Crash at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py:81
      res = self._inspector_websocket.SyncRequest(request, timeout)
    SyncRequest at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:116
      res = self._Receive(timeout)
    _Receive at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:155
      data = self._socket.recv()
    recv at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:293
      opcode, data = self.recv_data()
    recv_data at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:310
      opcode, frame = self.recv_data_frame(control_frame)
    recv_data_frame at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:323
      frame = self.recv_frame()
    recv_frame at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:357
      return self.frame_buffer.recv_frame()
    recv_frame at third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py:336
      self.recv_header()
    recv_header at third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py:286
      header = self.recv_strict(2)
    recv_strict at third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py:371
      bytes_ = self.recv(min(16384, shortage))
    _recv at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:427
      return recv(self.sock, bufsize)
    recv at third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py:83
      raise WebSocketTimeoutException(message)



At this point there are three hypotheses:

1) The browser is hung.

2) The renderer process is alive, but the navigation hasn't committed yet. In this case DevTools buffers up commands to be sent to the target renderer. The new Crash command would be one of those buffered up.

3) The renderer process's IO thread is hung.


Option (2) sounds most likely at this point. caseq@ is going to revise the DevTools Page.crash command to return a string that indicates what DevTools thinks the state of the renderer process is (gone, not yet committed, alive). Then we will modify Telemetry to print out that string in the logs.

Comment 26 by kbr@chromium.org, Jan 13 2018

Another random test timeout on these Mac Intel bots, this time in the pixel_test suite:
https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/627412

Pixel_WebGLTransparentGreenTriangle_NoAlpha_ImplicitClear failed. The browser was restarted immediately before running this test, because the command line arguments for this test differed from those needed by the previous test.

We really need to verify that caseq's https://chromium-review.googlesource.com/862904 can provide more context of why we failed to terminate the target renderer, and get that in. Within a day we should have some evidence of what the problem is. At this point we think the browser's first navigation was dropped on the floor due to a race condition.

Project Member

Comment 27 by bugdroid1@chromium.org, Jan 17 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7be978ffe3f6071d973eeae17aa05bd593d03777

commit 7be978ffe3f6071d973eeae17aa05bd593d03777
Author: Andrey Kosyakov <caseq@chromium.org>
Date: Wed Jan 17 05:44:27 2018

DevTools: implement Page.crash() in browser

This lets us provide a meaningful diagnostics in case the renderer process is already dead, or the navigation is in progress.

Bug:  797368 
Change-Id: Iac9f74a3a47fd9af5a95a7fa9e8a6b5a9935a82e
Reviewed-on: https://chromium-review.googlesource.com/862904
Commit-Queue: Andrey Kosyakov <caseq@chromium.org>
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Dmitry Gozman <dgozman@chromium.org>
Cr-Commit-Position: refs/heads/master@{#529637}
[modify] https://crrev.com/7be978ffe3f6071d973eeae17aa05bd593d03777/content/browser/devtools/protocol/page_handler.cc
[modify] https://crrev.com/7be978ffe3f6071d973eeae17aa05bd593d03777/content/browser/devtools/protocol/page_handler.h
[modify] https://crrev.com/7be978ffe3f6071d973eeae17aa05bd593d03777/content/browser/devtools/protocol_config.json

Comment 28 by kbr@chromium.org, Jan 18 2018

Can't find any examples of this sort of renderer hang since the above CL landed.

Closest failure is:
https://ci.chromium.org/buildbot/chromium.gpu/Mac%20Release%20%28Intel%29/80356

which happened before it landed.

An incidence of  Issue 774682  on mac_chromium_rel_ng today, but none of this. Will continue to monitor.

Comment 29 by kbr@chromium.org, Jan 18 2018

Owner: caseq@chromium.org
I take that back:
https://ci.chromium.org/buildbot/chromium.gpu/Mac%20Release%20%28Intel%29/80356

was built at commit position 529730, and the CL above landed at commit position 529637.

Also, this tryjob:
https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/630154

ran at commit position 529927, which is well after the above CL landed.

The symptom is exactly the same. See:
https://chromium-swarm.appspot.com/task?id=3b1f22c2d39da210&refresh=10&show_raw=1

Looking for WebglConformance_conformance_extensions_ext_shader_texture_lod :

  Traceback (most recent call last):
    _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:132
      self.RunActualGpuTest(url, *args)
    RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:203
      getattr(self, test_name)(test_path, *args[1:])
    _RunConformanceTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:217
      self._CheckTestCompletion()
    _CheckTestCompletion at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:211
      'webglTestHarness._finished', timeout=self._GetTestTimeout())
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py:261
      return self._tab.WaitForJavaScriptCondition(*args, **kwargs)
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py:239
      return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs)
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    WaitForJavaScriptCondition at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:289
      return py_utils.WaitFor(IsJavaScriptExpressionTrue, timeout)
    WaitFor at third_party/catapult/common/py_utils/py_utils/__init__.py:129
      res = condition()
    IsJavaScriptExpressionTrue at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:286
      return self._EvaluateJavaScript(condition, context_id, timeout)
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    Inner at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:41
      inspector_backend._ConvertExceptionFromInspectorWebsocket(e)
    traced_function at third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
      return func(*args, **kwargs)
    Inner at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:38
      return func(inspector_backend, *args, **kwargs)
    _EvaluateJavaScript at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:524
      self._runtime.Crash(context_id, timeout)
    Crash at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py:81
      res = self._inspector_websocket.SyncRequest(request, timeout)
    SyncRequest at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:116
      res = self._Receive(timeout)
    _Receive at third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:155
      data = self._socket.recv()
    recv at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:293
      opcode, data = self.recv_data()
    recv_data at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:310
      opcode, frame = self.recv_data_frame(control_frame)
    recv_data_frame at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:323
      frame = self.recv_frame()
    recv_frame at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:357
      return self.frame_buffer.recv_frame()
    recv_frame at third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py:336
      self.recv_header()
    recv_header at third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py:286
      header = self.recv_strict(2)
    recv_strict at third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py:371
      bytes_ = self.recv(min(16384, shortage))
    _recv at third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:427
      return recv(self.sock, bufsize)
    recv at third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py:83
      raise WebSocketTimeoutException(message)
  TimeoutException: 


The new Crash() call to DevTools still hung.

caseq and DevTools team: how do we make further progress on this? Does this indicate that the browser is hung?

Comment 30 by kbr@chromium.org, Jan 23 2018

Owner: kbr@chromium.org
Taking back from caseq. We discussed an idea to log the most recent request and response in Telemetry's inspector_websocket.py, to find out what query Telemetry made of the browser just before it (or the renderer process) hung. While considering logging that information while using the new Crash() call, we found that hundreds of calls to the browser's GetSystemInfo method were being made. Eliminating them in https://chromium-review.googlesource.com/879752 , because that might allow us to just pass the arguments "-v -v" to Telemetry, logging all DevTools calls.

Project Member

Comment 31 by bugdroid1@chromium.org, Jan 23 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d91068f2f0757122b23f48a0ff77c89e63efbb97

commit d91068f2f0757122b23f48a0ff77c89e63efbb97
Author: Kenneth Russell <kbr@chromium.org>
Date: Tue Jan 23 03:41:59 2018

Reduce the number of calls to Browser.GetSystemInfo.

While adding some debug logging, it was discovered that hundreds of
calls to this API were being made upon browser restart while building
the TestExpectations cache. Cache the result.

TBR=caseq@chromium.org

Bug:  797368 
Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: Ic3ae33928e90759ce933b93a74abb897e4a1fb51
Reviewed-on: https://chromium-review.googlesource.com/879752
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#531147}
[modify] https://crrev.com/d91068f2f0757122b23f48a0ff77c89e63efbb97/content/test/gpu/gpu_tests/gpu_test_expectations.py

Another instance here: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/634538

TraceTest_WebGLGreenTriangle_AA_Alpha this time

Comment 33 by kbr@chromium.org, Jan 31 2018

Another instance here, this time with maps_pixel_test:
https://ci.chromium.org/buildbot/chromium.gpu.fyi/Mac%20Experimental%20Release%20%28Intel%29/3779

Comment 34 by kbr@chromium.org, Jan 31 2018

This just happened on Linux too, in trace_test, TraceTest_WebGLGreenTriangle_NoAA_Alpha:

https://ci.chromium.org/buildbot/chromium.gpu/Linux%20Debug%20%28NVIDIA%29/89457

Same symptom; attempt to crash the target renderer process timed out, no response from DevTools.

  Traceback (most recent call last):
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/testing/serially_executed_browser_test_case.py", line 206, in <lambda>
      return lambda self: based_method(self, *args)
    File "/b/s/w/ir/content/test/gpu/gpu_tests/gpu_integration_test.py", line 132, in _RunGpuTest
      self.RunActualGpuTest(url, *args)
    File "/b/s/w/ir/content/test/gpu/gpu_tests/trace_integration_test.py", line 93, in RunActualGpuTest
      'domAutomationController._finished', timeout=30)
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py", line 261, in WaitForJavaScriptCondition
      return self._tab.WaitForJavaScriptCondition(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 239, in WaitForJavaScriptCondition
      return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 289, in WaitForJavaScriptCondition
      return py_utils.WaitFor(IsJavaScriptExpressionTrue, timeout)
    File "/b/s/w/ir/third_party/catapult/common/py_utils/py_utils/__init__.py", line 129, in WaitFor
      res = condition()
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 286, in IsJavaScriptExpressionTrue
      return self._EvaluateJavaScript(condition, context_id, timeout)
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 41, in Inner
      inspector_backend._ConvertExceptionFromInspectorWebsocket(e)
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 38, in Inner
      return func(inspector_backend, *args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 524, in _EvaluateJavaScript
      self._runtime.Crash(context_id, timeout)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py", line 81, in Crash
      res = self._inspector_websocket.SyncRequest(request, timeout)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 116, in SyncRequest
      res = self._Receive(timeout)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 155, in _Receive
      data = self._socket.recv()
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 293, in recv
      opcode, data = self.recv_data()
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 310, in recv_data
      opcode, frame = self.recv_data_frame(control_frame)
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 323, in recv_data_frame
      frame = self.recv_frame()
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 357, in recv_frame
      return self.frame_buffer.recv_frame()
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py", line 336, in recv_frame
      self.recv_header()
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py", line 286, in recv_header
      header = self.recv_strict(2)
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py", line 371, in recv_strict
      bytes_ = self.recv(min(16384, shortage))
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 427, in _recv
      return recv(self.sock, bufsize)
    File "/b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py", line 83, in recv
      raise WebSocketTimeoutException(message)
  TimeoutException: 
  ********************************************************************************
  (/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:487 _ConvertExceptionFromInspectorWebsocket) The app is probably crashed:
  
  Found Minidump: False
  Stack Trace:
  ********************************************************************************
  	No crash dump found.

Comment 35 by kbr@chromium.org, Jan 31 2018

Labels: OS-Linux
Summary: Random Telemetry test timeouts, mostly on Mac Minis with Intel GPUs (was: webgl2_conformance_tests timeouts on mac_optional_gpu_tests_rel Intel GPU)

Comment 36 by kbr@chromium.org, Feb 5 2018

Blocking: 795942
Cc: robliao@chromium.org

Comment 38 by f...@chromium.org, Mar 8 2018

I've been having a problem on my Mac where when I plug in my 4K monitor, Chrome freezes and I have to force quit. Is there a chance that it's related to this? I see "mac" and "gpu" which makes me suspicious.

Comment 39 by kbr@chromium.org, Mar 8 2018

Probably not related since these machines never change their monitors' configurations. Any chance you could instead kill Chrome with kill -SEGV [pid] or similar so that a minidump is generated? Not sure exactly whether that will produce a minidump though -- I'll be testing this.

Comment 40 by kbr@chromium.org, Mar 10 2018

#38: or, run "sample [pid] -file output.txt" and feed the resulting text file into crsym/ . That should show the stack trace which is wedged. (See  Issue 820727 , where the category for that bug was found in this way.)

Status: WontFix (was: Assigned)
I had been planning for a while to do more failure injection to cause the renderer process to hang on the IO thread (so it doesn't receive the IPC to crash itself) and see whether we could change the browser's child process management code to crash it in a way that would produce a minidump (sending the right kind of signal on POSIX platforms, at least).

It's been painful to revisit this issue and since leaving it alone for a while it's gone away. There's no evidence of this failure mode on the Mac Intel bot:

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Release%20%28Intel%29

nor on a couple of other bots that were previously affected:

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Debug%20%28NVIDIA%29?limit=200

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng?limit=200

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_rel_ng?limit=200

In investigating the current test failures on these bots I filed  Issue 866194 .

At this point I'm closing this as WontFix since it doesn't reproduce any more. I've seen at least one situation where the ability to crash the renderer using DevTools did produce an actionable symbolized minidump.

Blocking: -795942
Blocking: 795942
 Issue 795942  has been merged into this issue.

Sign in to add a comment