Crash seen in pixel_test, Pixel_WorkerRAF_OOPD test |
|||||||||||||
Issue descriptionCrash seen here: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Release%20%28AMD%29/4297 Only seen once: https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=pixel_test&tests=gpu_tests.pixel_integration_test.PixelIntegrationTest.Pixel_WorkerRAF_OOPD An excerpt of the test output is attached. It's still pretty large. There's a crash in viz and a crash in Ganesh. Not sure whether these are expected. The test deliberately crashes the GPU process and unfortunately we don't suppress that so the minidump symbolization may be seeing that. I think the renderer process crashing in response to the Viz process crashing is the problem and that we should work on making that more robust. Linking this to related bugs. Issue 840394 added some suppressions for this test but it seems a separate bug should be filed about this test failure.
,
Jun 11 2018
,
Jun 11 2018
Merged-in bug shows three flakes (in the last day). I will disable the test.
,
Jun 11 2018
Change going through CQ: https://chromium-review.googlesource.com/c/chromium/src/+/1095559
,
Jun 11 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6f6ff154b8492e36cc9e9d2bf37d44c523205202 commit 6f6ff154b8492e36cc9e9d2bf37d44c523205202 Author: Mark Pearson <mpearson@chromium.org> Date: Mon Jun 11 22:08:41 2018 Disable Pixel_WorkerRAF_OOPD on Mac because it's Flaky TBR=fserb Bug: 851213 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: Id70514f8fba87098163792a64288c47d51fa193f Reviewed-on: https://chromium-review.googlesource.com/1095559 Reviewed-by: Mark Pearson <mpearson@chromium.org> Commit-Queue: Mark Pearson <mpearson@chromium.org> Cr-Commit-Position: refs/heads/master@{#566167} [modify] https://crrev.com/6f6ff154b8492e36cc9e9d2bf37d44c523205202/content/test/gpu/gpu_tests/pixel_expectations.py
,
Jun 11 2018
Optimistically removing from sheriff queue.
,
Aug 13
,
Aug 24
Fady, could you please own this bug?
,
Aug 29
Detected 3 new flakes for test/step "gpu_tests.pixel_integration_test.PixelIntegrationTest.Pixel_WorkerRAF_OOPD". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyVQsSBUZsYWtlIkpncHVfdGVzdHMucGl4ZWxfaW50ZWdyYXRpb25fdGVzdC5QaXhlbEludGVncmF0aW9uVGVzdC5QaXhlbF9Xb3JrZXJSQUZfT09QRAw. This message was posted automatically by the chromium-try-flakes app. Since flakiness is ongoing, the issue was moved back into Sheriff Bug Queue (unless already there).
,
Aug 29
Reassigning to fserb@ who introduced this test.
,
Aug 29
Taking this bug from fserb who's going on vacation.
,
Aug 29
,
Aug 30
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/321c771ac3aa7c106969b17dfa7425b73eca138d commit 321c771ac3aa7c106969b17dfa7425b73eca138d Author: Kenneth Russell <kbr@chromium.org> Date: Thu Aug 30 01:46:41 2018 Use gpubenchmarking's crashGpuProcess() in pixel tests. For those pixel tests which need to crash the GPU process, do so with this new, more reliable primitive. Optimistically remove the Mac failure expectation for the Pixel_WorkerRAF_OOPD test, which crashes the GPU process while running the out-of-process display compositor and expects things to recover. Tbr: fserb@chromium.org Bug: 851213 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: If4f5f08b4ffcc8d508e3eb97245ab7a31643d123 Reviewed-on: https://chromium-review.googlesource.com/1195899 Reviewed-by: Kenneth Russell <kbr@chromium.org> Commit-Queue: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#587405} [modify] https://crrev.com/321c771ac3aa7c106969b17dfa7425b73eca138d/content/test/gpu/gpu_tests/pixel_expectations.py [modify] https://crrev.com/321c771ac3aa7c106969b17dfa7425b73eca138d/content/test/gpu/gpu_tests/pixel_integration_test.py
,
Aug 30
Detected 3 new flakes for test/step "gpu_tests.pixel_integration_test.PixelIntegrationTest.Pixel_WorkerRAF_OOPD". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyVQsSBUZsYWtlIkpncHVfdGVzdHMucGl4ZWxfaW50ZWdyYXRpb25fdGVzdC5QaXhlbEludGVncmF0aW9uVGVzdC5QaXhlbF9Xb3JrZXJSQUZfT09QRAw. This message was posted automatically by the chromium-try-flakes app.
,
Aug 30
Still flaky after https://chromium-review.googlesource.com/1195899 . https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/75456 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/75454 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/75246 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/74859 The symptom is that the test times out trying to grab a screenshot after the gpu process is killed and restarted, and the test reports that it's done. [53/53] gpu_tests.pixel_integration_test.PixelIntegrationTest.Pixel_WorkerRAF_OOPD failed unexpectedly 19.0010s: ... [2272:2476:0830/104509.694:INFO:CONSOLE(30)] "undefined", source: http://127.0.0.1:61875/content/test/data/gpu/pixel_worker_requestAnimationFrame.html (30) Received fatal exception EXCEPTION_ACCESS_VIOLATION Backtrace: [2272:2476:0830/104512.981:INFO:CONSOLE(30)] "Test complete", source: http://127.0.0.1:61875/content/test/data/gpu/pixel_worker_requestAnimationFrame.html (30) gl::Crash [0x6ACB2E4D+108] ??$Dispatch@VGpuChannel@gpu@@V12@XP812@AEXXZ@?$MessageT@UGpuChannelMsg_CrashForTesting_Meta@@V?$tuple@$$V@std@@X@IPC@@SA_NPBVMessage@1@PAVGpuChannel@gpu@@1PAXP834@AEXXZ@Z [0x6B3A0B00+80] gpu::GpuChannel::OnControlMessageReceived [0x6B39FEBB+151] ... Traceback (most recent call last): _RunGpuTest at content\test\gpu\gpu_tests\gpu_integration_test.py:138 self.RunActualGpuTest(url, *args) RunActualGpuTest at content\test\gpu\gpu_tests\pixel_integration_test.py:142 screenshot = tab.Screenshot(5) traced_function at third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:52 return func(*args, **kwargs) Screenshot at third_party\catapult\telemetry\telemetry\internal\browser\tab.py:123 return self._inspector_backend.Screenshot(timeout) traced_function at third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:52 return func(*args, **kwargs) Inner at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py:40 inspector_backend._ConvertExceptionFromInspectorWebsocket(e) traced_function at third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:52 return func(*args, **kwargs) Inner at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py:37 return func(inspector_backend, *args, **kwargs) Screenshot at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py:152 return self._page.CaptureScreenshot(timeout) CaptureScreenshot at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_page.py:153 res = self._inspector_websocket.SyncRequest(request, timeout) SyncRequest at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py:130 res = self._Receive(timeout) _Receive at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py:179 raise WebSocketException(err) TimeoutException: ... Since the only important thing about this test is that rAF resumes in workers, I'm going to rewrite this test as a context_lost test.
,
Aug 30
,
Aug 30
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1e243609ddcb634dc2e98902f8c31bd7a76192cd commit 1e243609ddcb634dc2e98902f8c31bd7a76192cd Author: Kenneth Russell <kbr@chromium.org> Date: Thu Aug 30 23:08:14 2018 Rewrite Pixel_WorkerRAF_OOPD as context_lost test. The test seems to be timing out while taking a page screenshot after the test completes, and this isn't the important part of the test. Instead, rewrite it as a context_lost test, and include two variants: ContextLost_WorkerRAFAfterGPUCrash ContextLost_WorkerRAFAfterGPUCrash_OOPD Tbr: fserb@chromium.org Bug: 851213 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: Ib1556ad9f693003b6b216085a0b1665cad017314 Reviewed-on: https://chromium-review.googlesource.com/1197942 Reviewed-by: Kenneth Russell <kbr@chromium.org> Commit-Queue: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#587829} [rename] https://crrev.com/1e243609ddcb634dc2e98902f8c31bd7a76192cd/content/test/data/gpu/worker-raf-after-gpu-crash.html [modify] https://crrev.com/1e243609ddcb634dc2e98902f8c31bd7a76192cd/content/test/gpu/gpu_tests/context_lost_expectations.py [modify] https://crrev.com/1e243609ddcb634dc2e98902f8c31bd7a76192cd/content/test/gpu/gpu_tests/context_lost_integration_test.py [modify] https://crrev.com/1e243609ddcb634dc2e98902f8c31bd7a76192cd/content/test/gpu/gpu_tests/pixel_expectations.py [modify] https://crrev.com/1e243609ddcb634dc2e98902f8c31bd7a76192cd/content/test/gpu/gpu_tests/pixel_test_pages.py
,
Sep 3
Flakiness seems to have stopped for now, removing Sheriff label.
,
Sep 4
The newly rewritten test is still flaky. chromium-try-flakes identified gpu_tests.context_lost_integration_test.ContextLostIntegrationTest.ContextLost_WorkerRAFAfterGPUCrash_OOPD as flaky on Android in Issue 880078: https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNydQsSBUZsYWtlImpncHVfdGVzdHMuY29udGV4dF9sb3N0X2ludGVncmF0aW9uX3Rlc3QuQ29udGV4dExvc3RJbnRlZ3JhdGlvblRlc3QuQ29udGV4dExvc3RfV29ya2VyUkFGQWZ0ZXJHUFVDcmFzaF9PT1BEDA The non-OOPD version of the test, gpu_tests.context_lost_integration_test.ContextLostIntegrationTest.ContextLost_WorkerRAFAfterGPUCrash, is also flaky on Android, though less so: https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNycAsSBUZsYWtlImVncHVfdGVzdHMuY29udGV4dF9sb3N0X2ludGVncmF0aW9uX3Rlc3QuQ29udGV4dExvc3RJbnRlZ3JhdGlvblRlc3QuQ29udGV4dExvc3RfV29ya2VyUkFGQWZ0ZXJHUFVDcmFzaAw In all cases, it looks like the browser may be silently crashing after the GPU process crashes. Looking at one failure: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-marshmallow-arm64-rel/77193 https://chromium-swarm.appspot.com/task?id=3fbc44353baf6e10&refresh=10&show_raw=1 the WebSocket request for seeing whether the test passed or failed timed out, leading to Telemetry attempting to crash the tab via DevTools, causing DevTools to report that the connection has already been dropped. It looks like Chrome on Android may not handle GPU process crashes well. A few of the context loss tests have been suppressed on Android for a long time in Issue 609629. Now that the tests have been changed to run in full Chrome rather than content_shell (and also use the GPU benchmarking extension to crash the GPU process rather than a new tab) they could plausibly be re-enabled. For the time being I'm going to mark these tests failing and close this bug in favor of Issue 880078 which is more targeted. ----- Traceback (most recent call last): File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/testing/serially_executed_browser_test_case.py", line 214, in <lambda> return lambda self: based_method(self, *args) File "/b/swarming/w/ir/content/test/gpu/gpu_tests/gpu_integration_test.py", line 138, in _RunGpuTest self.RunActualGpuTest(url, *args) File "/b/swarming/w/ir/content/test/gpu/gpu_tests/context_lost_integration_test.py", line 106, in RunActualGpuTest getattr(self, test_name)(test_path) File "/b/swarming/w/ir/content/test/gpu/gpu_tests/context_lost_integration_test.py", line 347, in _ContextLost_WorkerRAFAfterGPUCrash_OOPD self._KillGPUProcess(1, False) File "/b/swarming/w/ir/content/test/gpu/gpu_tests/context_lost_integration_test.py", line 161, in _KillGPUProcess completed = self._WaitForPageToFinish(tab) File "/b/swarming/w/ir/content/test/gpu/gpu_tests/context_lost_integration_test.py", line 127, in _WaitForPageToFinish 'window.domAutomationController._finished', timeout=wait_timeout) File "/b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 239, in WaitForJavaScriptCondition return self._inspector_backend.WaitForJavaScriptCondition(*args, **kwargs) File "/b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 288, in WaitForJavaScriptCondition return py_utils.WaitFor(IsJavaScriptExpressionTrue, timeout) File "/b/swarming/w/ir/third_party/catapult/common/py_utils/py_utils/__init__.py", line 136, in WaitFor res = condition() File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 285, in IsJavaScriptExpressionTrue return self._EvaluateJavaScript(condition, context_id, timeout) File "/b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 37, in Inner return func(inspector_backend, *args, **kwargs) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 533, in _EvaluateJavaScript self._runtime.Crash(context_id, timeout) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py", line 81, in Crash res = self._inspector_websocket.SyncRequest(request, timeout) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 127, in SyncRequest self._SendRequest(req) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 115, in _SendRequest self._socket.send(data) File "/b/swarming/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 234, in send return self.send_frame(frame) File "/b/swarming/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 259, in send_frame l = self._send(data) File "/b/swarming/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py", line 423, in _send return send(self.sock, data) File "/b/swarming/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py", line 113, in send raise WebSocketConnectionClosedException("socket is already closed.") WebSocketConnectionClosedException: socket is already closed.
,
Sep 5
Note: bugdroid is currently down, but https://chromium-review.googlesource.com/1204598 expanded the suppression for these new context_lost tests to all Android devices.
,
Sep 5
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d9878f42b7788b02a711e88a9df5e14e67ad1839 commit d9878f42b7788b02a711e88a9df5e14e67ad1839 Author: Kenneth Russell <kbr@chromium.org> Date: Tue Sep 04 21:44:43 2018 Suppress WorkerRAFAfterGPUCrash failures on Android. Mark the following two tests as failing: ContextLost_WorkerRAFAfterGPUCrash ContextLost_WorkerRAFAfterGPUCrash_OOPD Tbr: fserb@chromium.org No-Try: True Bug: 851213 , 880078 Change-Id: Ia3baefd5b7bfc1f565626925de28befd3e3f4163 Reviewed-on: https://chromium-review.googlesource.com/1204598 Reviewed-by: Kenneth Russell <kbr@chromium.org> Commit-Queue: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#588654} [modify] https://crrev.com/d9878f42b7788b02a711e88a9df5e14e67ad1839/content/test/gpu/gpu_tests/context_lost_expectations.py |
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by mpear...@chromium.org
, Jun 11 2018