gpu_tests.screenshot_sync_integration_test.ScreenshotSyncIntegrationTest.ScreenshotSync_GPURasterWithCanvas Flaky on Windows Intel |
||||||||||||
Issue descriptionFailing builds: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win_angle_rel_ng/1484 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win_angle_rel_ng/1457 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win_angle_rel_ng/1447 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win_angle_rel_ng/1420 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win_angle_rel_ng/1324 https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20FYI%20Release%20%28Intel%20HD%20630%29/812 I'm unable to determine what the relevant error text is from the output log. The sheriff/oncall box seems broken right now. Not sure who GPU wrangler is. I'll see if I can suppress this test. Can lower priority once this is stable again.
,
Jun 18 2018
,
Jun 18 2018
cc'ing ppl on viz team who might know more.
,
Jun 18 2018
,
Jun 18 2018
it's a viz party now :)
,
Jun 18 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3c86695d6b1d57a47873f37dc678e1230e68d03f commit 3c86695d6b1d57a47873f37dc678e1230e68d03f Author: Jamie Madill <jmadill@chromium.org> Date: Mon Jun 18 18:52:30 2018 Suppress flaky screenshot test. This is flaking on Viz/Windows/Intel. Not clear how to select viz as a config in the GPU expectations module. Suppress this on the Windows Intel config for now. Bug: 853762 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: I83bc2699d7b7a904eb68b97886d1c91025307d31 Tbr: kbr@chromium.org Reviewed-on: https://chromium-review.googlesource.com/1104783 Reviewed-by: Jamie Madill <jmadill@chromium.org> Commit-Queue: Jamie Madill <jmadill@chromium.org> Cr-Commit-Position: refs/heads/master@{#568094} [modify] https://crrev.com/3c86695d6b1d57a47873f37dc678e1230e68d03f/content/test/gpu/gpu_tests/screenshot_sync_expectations.py
,
Jun 18 2018
CQ stability should be improved now. Marking as available for someone from the viz team to take.
,
Jun 18 2018
(fwiw) The failure doesn't seem to be related to viz: for example, the first link in the OP has failure in non-viz runs.
,
Jun 18 2018
And the failure seems to be a timeout (vs. an expectation failure)?
,
Jun 18 2018
sadrul: This seems like it could be related to https://crbug.com/851504 since screenshot_sync_tests are using Page.captureScreenshot?
,
Jun 18 2018
It's possible it's not viz related. You're right that one is for the non-viz version. The "viz" mode does seem to fail more often.
,
Jun 18 2018
re #10: yea, it looks related. I am trying to repro the failure locally.
,
Jun 18 2018
I think the test is doing a lot of work, and simply timing out because of that (the change for crbug.com/851504 does introduce an extra ipc for a single request, and it looks like each test issues approx. ~1200 copy-requests). I will put up a speculative CL that reduces the amount of repetitions in the test, and see if that is acceptable. I assume tries on win_optional_gpu_tests_rel bots would tell me whether it does work or not?
,
Jun 18 2018
I am running https://chromium-review.googlesource.com/c/chromium/src/+/1105063 through the trybots.
,
Jun 19 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/dd80dd3a955ccfd85294631c0f10d560d0480b56 commit dd80dd3a955ccfd85294631c0f10d560d0480b56 Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Tue Jun 19 02:02:57 2018 gpu tests: Speculative fix for screenshot test. Reduce the number of repititions to avoid a timeout in the test. BUG= 853762 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: I08f70316da5c28d92ccb142e71ecd6c903bca612 Reviewed-on: https://chromium-review.googlesource.com/1105063 Reviewed-by: Zhenyao Mo <zmo@chromium.org> Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/heads/master@{#568296} [modify] https://crrev.com/dd80dd3a955ccfd85294631c0f10d560d0480b56/content/test/gpu/gpu_tests/screenshot_sync_expectations.py [modify] https://crrev.com/dd80dd3a955ccfd85294631c0f10d560d0480b56/content/test/gpu/gpu_tests/screenshot_sync_integration_test.py
,
Jun 19 2018
The test has not flaked since the fix landed. So marking this as fixed.
,
Jun 19 2018
Thanks sadrul. I'll keep an eye out.
,
Jun 19 2018
sadrul@: turning down the number of iterations of this test to reduce flakiness is unsatisfying. The test is a regression test for previously flaky behavior of the browser, so reducing the number of iterations reduces the likelihood that real bugs will be caught. Could you please pick up this bug again, increase the number of iterations back to 20 and change the 5 second timeout in tab.Screenshot() in the test to 10 seconds? This should eliminate the chance that slow bots were the reason that the screenshot call timed out. The bots are all running vpython and incorporating the numpy and cv2 packages, which are important to get acceptable performance of Telemetry's screenshot mechanism. I think it is more likely that there is a real bug where these screenshot requests are intermittently getting lost, possibly related to recent changes in how these requests are transmitted down the rendering pipeline. Linking to possibly related bug Issue 851504 .
,
Jun 19 2018
Sure. Is there a timeout for the individual test? Because I suspect that will need to be increased too.
,
Jun 19 2018
I searched through the Telemetry code base and don't see a per-test timeout. Any such timeout must be being enforced deeper, at the typ test harness level (dpranke@ is owner of that). Locally, if I change the test to do 50 invocations, each test still completes in about 15 seconds and the tests don't time out. Again, this is using the "#!/usr/bin/env vpython" shebang that's at the top of src/content/test/gpu/run_gpu_integration_test.py in order to pick up the numpy and cv2 packages. If those aren't being used then the test will run a *lot* slower. All of the bots should be picking up these packages. The timeout in this test run, for example: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win_angle_rel_ng/1484 and this shard: https://chromium-swarm.appspot.com/task?id=3e2be56d68f71810&refresh=10&show_raw=1 is caused because the browser didn't respond to Telemetry's screenshot request within 5 seconds, not because the test took too long to run: [1/4] gpu_tests.screenshot_sync_integration_test.ScreenshotSyncIntegrationTest.ScreenshotSync_GPURasterWithCanvas failed unexpectedly 12.2230s: ... Traceback (most recent call last): _RunGpuTest at content\test\gpu\gpu_tests\gpu_integration_test.py:132 self.RunActualGpuTest(url, *args) RunActualGpuTest at content\test\gpu\gpu_tests\screenshot_sync_integration_test.py:136 self._CheckScreenshot() _CheckScreenshot at content\test\gpu\gpu_tests\screenshot_sync_integration_test.py:121 screenshot = tab.Screenshot(5) traced_function at third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:52 return func(*args, **kwargs) Screenshot at third_party\catapult\telemetry\telemetry\internal\browser\tab.py:116 return self._inspector_backend.Screenshot(timeout) traced_function at third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:52 return func(*args, **kwargs) Inner at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py:41 inspector_backend._ConvertExceptionFromInspectorWebsocket(e) traced_function at third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:52 return func(*args, **kwargs) Inner at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py:38 return func(inspector_backend, *args, **kwargs) Screenshot at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py:153 return self._page.CaptureScreenshot(timeout) CaptureScreenshot at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_page.py:153 res = self._inspector_websocket.SyncRequest(request, timeout) SyncRequest at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py:116 res = self._Receive(timeout) _Receive at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py:155 data = self._socket.recv() recv at third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py:293 opcode, data = self.recv_data() recv_data at third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py:310 opcode, frame = self.recv_data_frame(control_frame) recv_data_frame at third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py:323 frame = self.recv_frame() recv_frame at third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py:357 return self.frame_buffer.recv_frame() recv_frame at third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py:336 self.recv_header() recv_header at third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py:286 header = self.recv_strict(2) recv_strict at third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py:371 bytes_ = self.recv(min(16384, shortage)) _recv at third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py:427 return recv(self.sock, bufsize) recv at third_party\catapult\telemetry\third_party\websocket-client\websocket\_socket.py:83 raise WebSocketTimeoutException(message) This failure mode is consistent with snapshot requests / CopyOutputRequests being accidentally dropped in some situations.
,
Jun 21 2018
It's likely that this is related to Issue 853651 which ccameron@ has a good diagnosis for.
,
Jun 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/27efdf3346545126ad4ea94ea4a722250c0bad6a commit 27efdf3346545126ad4ea94ea4a722250c0bad6a Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Fri Jun 22 05:37:59 2018 gpu tests: Update the screenshot test. Bump the repetition back to 20, but increase the timeout for taking the screenshots from 5 seconds to 10 seconds instead. BUG= 853762 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: Id6914cf7c1791f654d35725644bad7d8fec22073 Reviewed-on: https://chromium-review.googlesource.com/1107201 Commit-Queue: Kenneth Russell <kbr@chromium.org> Reviewed-by: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#569535} [modify] https://crrev.com/27efdf3346545126ad4ea94ea4a722250c0bad6a/content/test/gpu/gpu_tests/screenshot_sync_integration_test.py
,
Jun 22 2018
,
Sep 12
Is there any existing known flakiness in Chrome screenshoting now, or is it consistent now? Dealing with https://buganizer.corp.google.com/issues/115528744 and wondering if recent M69 changes account for this flakiness and also wondering whether we can repro the flakiness at the Chrome level without all the webdriver complexity. i.e. is this bug completely fixed?
,
Sep 14
Taking snapshots via DevTools should be reliable again at this point. Please file a new public bug about that internal one if you find it isn't, and block it on this one so there's some history behind the changes in this area.
,
Sep 14
Is this an instance of this bug? https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Release%20(AMD)/7586 Traceback (most recent call last): <module> at /b/s/w/ir/third_party/catapult/telemetry/telemetry/testing/run_browser_tests.py:359 ret_code = RunTests(sys.argv[1:]) RunTests at /b/s/w/ir/third_party/catapult/telemetry/telemetry/testing/run_browser_tests.py:328 ret, _, _ = runner.run() run at /b/s/w/ir/third_party/catapult/third_party/typ/typ/runner.py:179 ret, full_results = self._run_tests(result_set, test_set) _run_tests at /b/s/w/ir/third_party/catapult/third_party/typ/typ/runner.py:466 self._run_one_set(self.stats, result_set, test_set) _run_one_set at /b/s/w/ir/third_party/catapult/third_party/typ/typ/runner.py:511 test_set.isolated_tests, 1) _run_list at /b/s/w/ir/third_party/catapult/third_party/typ/typ/runner.py:537 _setup_process, _teardown_process) make_pool at /b/s/w/ir/third_party/catapult/third_party/typ/typ/pool.py:28 return _AsyncPool(host, jobs, callback, context, pre_fn, post_fn) __init__ at /b/s/w/ir/third_party/catapult/third_party/typ/typ/pool.py:188 self.context_after_pre = pre_fn(self.host, 1, self.context) _setup_process at /b/s/w/ir/third_party/catapult/third_party/typ/typ/runner.py:807 child.context_after_setup = child.setup_fn(child, child.context) _SetUpProcess at /b/s/w/ir/third_party/catapult/telemetry/telemetry/testing/run_browser_tests.py:349 context.test_class.SetUpProcess() SetUpProcess at /b/s/w/ir/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:297 cls.StartBrowser() StartBrowser at /b/s/w/ir/content/test/gpu/gpu_tests/gpu_integration_test.py:96 cls.tab = cls.browser.tabs[0] __getitem__ at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/tab_list.py:18 return self._tab_list_backend.__getitem__(index) __getitem__ at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend_list.py:64 return self.GetBackendFromContextId(context_id) GetBackendFromContextId at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend_list.py:75 context_id) GetInspectorBackend at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/devtools_client_backend.py:594 self._app_backend.app, self._devtools_client, context) traced_function at /b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 return func(*args, **kwargs) __init__ at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:70 self._websocket.Connect(self.debugger_url, timeout) Connect at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:84 skip_utf8_validation=True) CreateConnection at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/websocket.py:25 return _create_connection(*args, **kwargs) create_connection at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:487 websock.connect(url, **options) connect at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:214 self.handshake_response = handshake(self.sock, *addrs, **options) handshake at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_handshake.py:65 status, resp = _get_resp_headers(sock) _get_resp_headers at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_handshake.py:120 status, resp_headers = read_headers(sock) read_headers at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_http.py:223 line = recv_line(sock) recv_line at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py:101 c = recv(sock, 1) recv at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py:83 raise WebSocketTimeoutException(message) WebSocketTimeoutException: timed out
,
Sep 14
No. In this case the harness is failing to bring up the browser. This looks like a hardware issue per Issue 884064 . |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by jmad...@chromium.org
, Jun 18 2018