New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 851213 link

Starred by 5 users

Issue metadata

Status: Fixed
Owner:
OOO until 2019-01-24
Closed: Sep 4
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows , Mac
Pri: 2
Type: Bug

Blocked on:
issue 833902
issue 840394
issue 858907

Blocking:
issue 609629
issue 880078



Sign in to add a comment

Crash seen in pixel_test, Pixel_WorkerRAF_OOPD test

Project Member Reported by kbr@chromium.org, Jun 9 2018

Issue description

Crash seen here:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Release%20%28AMD%29/4297

Only seen once:
https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=pixel_test&tests=gpu_tests.pixel_integration_test.PixelIntegrationTest.Pixel_WorkerRAF_OOPD

An excerpt of the test output is attached. It's still pretty large. There's a crash in viz and a crash in Ganesh. Not sure whether these are expected. The test deliberately crashes the GPU process and unfortunately we don't suppress that so the minidump symbolization may be seeing that. I think the renderer process crashing in response to the Viz process crashing is the problem and that we should work on making that more robust.

Linking this to related bugs.  Issue 840394  added some suppressions for this test but it seems a separate bug should be filed about this test failure.

 
output.txt
95.6 KB View Download
 Issue 851594  has been merged into this issue.
Labels: Sheriff-Chromium
Merged-in bug shows three flakes (in the last day).  I will disable the test.
Project Member

Comment 5 by bugdroid1@chromium.org, Jun 11 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6f6ff154b8492e36cc9e9d2bf37d44c523205202

commit 6f6ff154b8492e36cc9e9d2bf37d44c523205202
Author: Mark Pearson <mpearson@chromium.org>
Date: Mon Jun 11 22:08:41 2018

Disable Pixel_WorkerRAF_OOPD on Mac because it's Flaky

TBR=fserb

Bug:  851213 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: Id70514f8fba87098163792a64288c47d51fa193f
Reviewed-on: https://chromium-review.googlesource.com/1095559
Reviewed-by: Mark Pearson <mpearson@chromium.org>
Commit-Queue: Mark Pearson <mpearson@chromium.org>
Cr-Commit-Position: refs/heads/master@{#566167}
[modify] https://crrev.com/6f6ff154b8492e36cc9e9d2bf37d44c523205202/content/test/gpu/gpu_tests/pixel_expectations.py

Labels: -Sheriff-Chromium
Optimistically removing from sheriff queue.
Cc: fs...@chromium.org
Owner: ----
Status: Available (was: Assigned)
Owner: fsam...@chromium.org
Fady, could you please own this bug?

Labels: Sheriff-Chromium
Detected 3 new flakes for test/step "gpu_tests.pixel_integration_test.PixelIntegrationTest.Pixel_WorkerRAF_OOPD". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyVQsSBUZsYWtlIkpncHVfdGVzdHMucGl4ZWxfaW50ZWdyYXRpb25fdGVzdC5QaXhlbEludGVncmF0aW9uVGVzdC5QaXhlbF9Xb3JrZXJSQUZfT09QRAw. This message was posted automatically by the chromium-try-flakes app. Since flakiness is ongoing, the issue was moved back into Sheriff Bug Queue (unless already there).
Owner: fs...@chromium.org
Reassigning to fserb@ who introduced this test.
Owner: kbr@chromium.org
Status: Assigned (was: Available)
Taking this bug from fserb who's going on vacation.

Blockedon: 858907
Project Member

Comment 13 by bugdroid1@chromium.org, Aug 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/321c771ac3aa7c106969b17dfa7425b73eca138d

commit 321c771ac3aa7c106969b17dfa7425b73eca138d
Author: Kenneth Russell <kbr@chromium.org>
Date: Thu Aug 30 01:46:41 2018

Use gpubenchmarking's crashGpuProcess() in pixel tests.

For those pixel tests which need to crash the GPU process, do so with
this new, more reliable primitive.

Optimistically remove the Mac failure expectation for the
Pixel_WorkerRAF_OOPD test, which crashes the GPU process while running
the out-of-process display compositor and expects things to recover.

Tbr: fserb@chromium.org
Bug:  851213 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: If4f5f08b4ffcc8d508e3eb97245ab7a31643d123
Reviewed-on: https://chromium-review.googlesource.com/1195899
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#587405}
[modify] https://crrev.com/321c771ac3aa7c106969b17dfa7425b73eca138d/content/test/gpu/gpu_tests/pixel_expectations.py
[modify] https://crrev.com/321c771ac3aa7c106969b17dfa7425b73eca138d/content/test/gpu/gpu_tests/pixel_integration_test.py

Detected 3 new flakes for test/step "gpu_tests.pixel_integration_test.PixelIntegrationTest.Pixel_WorkerRAF_OOPD". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyVQsSBUZsYWtlIkpncHVfdGVzdHMucGl4ZWxfaW50ZWdyYXRpb25fdGVzdC5QaXhlbEludGVncmF0aW9uVGVzdC5QaXhlbF9Xb3JrZXJSQUZfT09QRAw. This message was posted automatically by the chromium-try-flakes app.
Status: Started (was: Assigned)
Still flaky after https://chromium-review.googlesource.com/1195899 .

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/75456
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/75454
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/75246
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/74859

The symptom is that the test times out trying to grab a screenshot after the gpu process is killed and restarted, and the test reports that it's done.

[53/53] gpu_tests.pixel_integration_test.PixelIntegrationTest.Pixel_WorkerRAF_OOPD failed unexpectedly 19.0010s:
...
[2272:2476:0830/104509.694:INFO:CONSOLE(30)] "undefined", source: http://127.0.0.1:61875/content/test/data/gpu/pixel_worker_requestAnimationFrame.html (30)
Received fatal exception EXCEPTION_ACCESS_VIOLATION
Backtrace:
[2272:2476:0830/104512.981:INFO:CONSOLE(30)] "Test complete", source: http://127.0.0.1:61875/content/test/data/gpu/pixel_worker_requestAnimationFrame.html (30)
	gl::Crash [0x6ACB2E4D+108]
	??$Dispatch@VGpuChannel@gpu@@V12@XP812@AEXXZ@?$MessageT@UGpuChannelMsg_CrashForTesting_Meta@@V?$tuple@$$V@std@@X@IPC@@SA_NPBVMessage@1@PAVGpuChannel@gpu@@1PAXP834@AEXXZ@Z [0x6B3A0B00+80]
	gpu::GpuChannel::OnControlMessageReceived [0x6B39FEBB+151]
...
Traceback (most recent call last):
  _RunGpuTest at content\test\gpu\gpu_tests\gpu_integration_test.py:138
    self.RunActualGpuTest(url, *args)
  RunActualGpuTest at content\test\gpu\gpu_tests\pixel_integration_test.py:142
    screenshot = tab.Screenshot(5)
  traced_function at third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:52
    return func(*args, **kwargs)
  Screenshot at third_party\catapult\telemetry\telemetry\internal\browser\tab.py:123
    return self._inspector_backend.Screenshot(timeout)
  traced_function at third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:52
    return func(*args, **kwargs)
  Inner at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py:40
    inspector_backend._ConvertExceptionFromInspectorWebsocket(e)
  traced_function at third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:52
    return func(*args, **kwargs)
  Inner at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py:37
    return func(inspector_backend, *args, **kwargs)
  Screenshot at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py:152
    return self._page.CaptureScreenshot(timeout)
  CaptureScreenshot at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_page.py:153
    res = self._inspector_websocket.SyncRequest(request, timeout)
  SyncRequest at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py:130
    res = self._Receive(timeout)
  _Receive at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py:179
    raise WebSocketException(err)
TimeoutException: 
...


Since the only important thing about this test is that rAF resumes in workers, I'm going to rewrite this test as a context_lost test.

Labels: OS-Windows
Project Member

Comment 17 by bugdroid1@chromium.org, Aug 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1e243609ddcb634dc2e98902f8c31bd7a76192cd

commit 1e243609ddcb634dc2e98902f8c31bd7a76192cd
Author: Kenneth Russell <kbr@chromium.org>
Date: Thu Aug 30 23:08:14 2018

Rewrite Pixel_WorkerRAF_OOPD as context_lost test.

The test seems to be timing out while taking a page screenshot after
the test completes, and this isn't the important part of the test.
Instead, rewrite it as a context_lost test, and include two variants:

  ContextLost_WorkerRAFAfterGPUCrash
  ContextLost_WorkerRAFAfterGPUCrash_OOPD

Tbr: fserb@chromium.org
Bug:  851213 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: Ib1556ad9f693003b6b216085a0b1665cad017314
Reviewed-on: https://chromium-review.googlesource.com/1197942
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#587829}
[rename] https://crrev.com/1e243609ddcb634dc2e98902f8c31bd7a76192cd/content/test/data/gpu/worker-raf-after-gpu-crash.html
[modify] https://crrev.com/1e243609ddcb634dc2e98902f8c31bd7a76192cd/content/test/gpu/gpu_tests/context_lost_expectations.py
[modify] https://crrev.com/1e243609ddcb634dc2e98902f8c31bd7a76192cd/content/test/gpu/gpu_tests/context_lost_integration_test.py
[modify] https://crrev.com/1e243609ddcb634dc2e98902f8c31bd7a76192cd/content/test/gpu/gpu_tests/pixel_expectations.py
[modify] https://crrev.com/1e243609ddcb634dc2e98902f8c31bd7a76192cd/content/test/gpu/gpu_tests/pixel_test_pages.py

Labels: -Sheriff-Chromium
Flakiness seems to have stopped for now, removing Sheriff label.
Blockedon: 880078
Blocking: 609629
The newly rewritten test is still flaky. chromium-try-flakes identified gpu_tests.context_lost_integration_test.ContextLostIntegrationTest.ContextLost_WorkerRAFAfterGPUCrash_OOPD as flaky on Android in Issue 880078:

https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNydQsSBUZsYWtlImpncHVfdGVzdHMuY29udGV4dF9sb3N0X2ludGVncmF0aW9uX3Rlc3QuQ29udGV4dExvc3RJbnRlZ3JhdGlvblRlc3QuQ29udGV4dExvc3RfV29ya2VyUkFGQWZ0ZXJHUFVDcmFzaF9PT1BEDA

The non-OOPD version of the test, gpu_tests.context_lost_integration_test.ContextLostIntegrationTest.ContextLost_WorkerRAFAfterGPUCrash, is also flaky on Android, though less so:

https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNycAsSBUZsYWtlImVncHVfdGVzdHMuY29udGV4dF9sb3N0X2ludGVncmF0aW9uX3Rlc3QuQ29udGV4dExvc3RJbnRlZ3JhdGlvblRlc3QuQ29udGV4dExvc3RfV29ya2VyUkFGQWZ0ZXJHUFVDcmFzaAw

In all cases, it looks like the browser may be silently crashing after the GPU process crashes. Looking at one failure:

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-marshmallow-arm64-rel/77193
https://chromium-swarm.appspot.com/task?id=3fbc44353baf6e10&refresh=10&show_raw=1

the WebSocket request for seeing whether the test passed or failed timed out, leading to Telemetry attempting to crash the tab via DevTools, causing DevTools to report that the connection has already been dropped.

It looks like Chrome on Android may not handle GPU process crashes well. A few of the context loss tests have been suppressed on Android for a long time in Issue 609629. Now that the tests have been changed to run in full Chrome rather than content_shell (and also use the GPU benchmarking extension to crash the GPU process rather than a new tab) they could plausibly be re-enabled.

For the time being I'm going to mark these tests failing and close this bug in favor of Issue 880078 which is more targeted.

-----

  Traceback (most recent call last):
    File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/testing/serially_executed_browser_test_case.py", line 214, in <lambda>
      return lambda self: based_method(self, *args)
    File "/b/swarming/w/ir/content/test/gpu/gpu_tests/gpu_integration_test.py", line 138, in _RunGpuTest
      self.RunActualGpuTest(url, *args)
    File "/b/swarming/w/ir/content/test/gpu/gpu_tests/context_lost_integration_test.py", line 106, in RunActualGpuTest
      getattr(self, test_name)(test_path)
    File "/b/swarming/w/ir/content/test/gpu/gpu_tests/context_lost_integration_test.py", line 347, in _ContextLost_WorkerRAFAfterGPUCrash_OOPD
      self._KillGPUProcess(1, False)
    File "/b/swarming/w/ir/content/test/gpu/gpu_tests/context_lost_integration_test.py", line 161, in _KillGPUProcess
      completed = self._WaitForPageToFinish(tab)
    File "/b/swarming/w/ir/content/test/gpu/gpu_tests/context_lost_integration_test.py", line 127, in _WaitForPageToFinish
      'window.domAutomationController._finished', timeout=wait_timeout)
    File "/b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 239, in WaitForJavaScriptCondition
      return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs)
    File "/b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 288, in WaitForJavaScriptCondition
      return py_utils.WaitFor(IsJavaScriptExpressionTrue, timeout)
    File "/b/swarming/w/ir/third_party/catapult/common/py_utils/py_utils/__init__.py", line 136, in WaitFor
      res = condition()
    File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 285, in IsJavaScriptExpressionTrue
      return self._EvaluateJavaScript(condition, context_id, timeout)
    File "/b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 37, in Inner
      return func(inspector_backend, *args, **kwargs)
    File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 533, in _EvaluateJavaScript
      self._runtime.Crash(context_id, timeout)
    File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py", line 81, in Crash
      res = self._inspector_websocket.SyncRequest(request, timeout)
    File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 127, in SyncRequest
      self._SendRequest(req)
    File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 115, in _SendRequest
      self._socket.send(data)
    File "/b/swarming/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 234, in send
      return self.send_frame(frame)
    File "/b/swarming/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 259, in send_frame
      l = self._send(data)
    File "/b/swarming/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 423, in _send
      return send(self.sock, data)
    File "/b/swarming/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py", line 113, in send
      raise WebSocketConnectionClosedException("socket is already closed.")
  WebSocketConnectionClosedException: socket is already closed.

Blockedon: -880078
Blocking: 880078
Status: Fixed (was: Started)
Turning around the blocked on/blocking relationship for one bug and closing this as fixed.

Note: bugdroid is currently down, but https://chromium-review.googlesource.com/1204598 expanded the suppression for these new context_lost tests to all Android devices.

Project Member

Comment 22 by bugdroid1@chromium.org, Sep 5

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d9878f42b7788b02a711e88a9df5e14e67ad1839

commit d9878f42b7788b02a711e88a9df5e14e67ad1839
Author: Kenneth Russell <kbr@chromium.org>
Date: Tue Sep 04 21:44:43 2018

Suppress WorkerRAFAfterGPUCrash failures on Android.

Mark the following two tests as failing:
  ContextLost_WorkerRAFAfterGPUCrash
  ContextLost_WorkerRAFAfterGPUCrash_OOPD

Tbr: fserb@chromium.org
No-Try: True
Bug:  851213 , 880078
Change-Id: Ia3baefd5b7bfc1f565626925de28befd3e3f4163
Reviewed-on: https://chromium-review.googlesource.com/1204598
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#588654}
[modify] https://crrev.com/d9878f42b7788b02a711e88a9df5e14e67ad1839/content/test/gpu/gpu_tests/context_lost_expectations.py

Sign in to add a comment