New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 853762 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug

Blocked on:
issue 853651

Blocking:
issue 851504



Sign in to add a comment

gpu_tests.screenshot_sync_integration_test.ScreenshotSyncIntegrationTest.ScreenshotSync_GPURasterWithCanvas Flaky on Windows Intel

Project Member Reported by jmad...@chromium.org, Jun 18 2018

Issue description

Summary: viz: gpu_tests.screenshot_sync_integration_test.ScreenshotSyncIntegrationTest.ScreenshotSync_GPURasterWithCanvas Flaky on Windows Intel (was: gpu_tests.screenshot_sync_integration_test.ScreenshotSyncIntegrationTest.ScreenshotSync_GPURasterWithCanvas Flaky on Windows Intel)
Cc: weiliangc@chromium.org samans@chromium.org kylec...@chromium.org
cc'ing ppl on viz team who might know more.

Comment 4 by samans@chromium.org, Jun 18 2018

Cc: sadrul@chromium.org
Cc: jonr...@chromium.org fsam...@chromium.org
it's a viz party now :)
Project Member

Comment 6 by bugdroid1@chromium.org, Jun 18 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3c86695d6b1d57a47873f37dc678e1230e68d03f

commit 3c86695d6b1d57a47873f37dc678e1230e68d03f
Author: Jamie Madill <jmadill@chromium.org>
Date: Mon Jun 18 18:52:30 2018

Suppress flaky screenshot test.

This is flaking on Viz/Windows/Intel. Not clear how to select viz
as a config in the GPU expectations module. Suppress this on the
Windows Intel config for now.

Bug:  853762 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: I83bc2699d7b7a904eb68b97886d1c91025307d31
Tbr: kbr@chromium.org
Reviewed-on: https://chromium-review.googlesource.com/1104783
Reviewed-by: Jamie Madill <jmadill@chromium.org>
Commit-Queue: Jamie Madill <jmadill@chromium.org>
Cr-Commit-Position: refs/heads/master@{#568094}
[modify] https://crrev.com/3c86695d6b1d57a47873f37dc678e1230e68d03f/content/test/gpu/gpu_tests/screenshot_sync_expectations.py

Cc: jmad...@chromium.org
Labels: -Pri-1 Pri-2
Owner: ----
Status: Available (was: Assigned)
CQ stability should be improved now. Marking as available for someone from the viz team to take.

Comment 8 by sadrul@chromium.org, Jun 18 2018

(fwiw) The failure doesn't seem to be related to viz: for example, the first link in the OP has failure in non-viz runs.

Comment 9 by sadrul@chromium.org, Jun 18 2018

And the failure seems to be a timeout (vs. an expectation failure)?
sadrul: This seems like it could be related to  https://crbug.com/851504  since screenshot_sync_tests are using Page.captureScreenshot?
Summary: gpu_tests.screenshot_sync_integration_test.ScreenshotSyncIntegrationTest.ScreenshotSync_GPURasterWithCanvas Flaky on Windows Intel (was: viz: gpu_tests.screenshot_sync_integration_test.ScreenshotSyncIntegrationTest.ScreenshotSync_GPURasterWithCanvas Flaky on Windows Intel)
It's possible it's not viz related. You're right that one is for the non-viz version. The "viz" mode does seem to fail more often.
Owner: sadrul@chromium.org
Status: Started (was: Available)
re #10: yea, it looks related. I am trying to repro the failure locally.
I think the test is doing a lot of work, and simply timing out because of that (the change for  crbug.com/851504  does introduce an extra ipc for a single request, and it looks like each test issues approx. ~1200 copy-requests). I will put up a speculative CL that reduces the amount of repetitions in the test, and see if that is acceptable.

I assume tries on win_optional_gpu_tests_rel bots would tell me whether it does work or not?
Cc: zmo@chromium.org
I am running https://chromium-review.googlesource.com/c/chromium/src/+/1105063 through the trybots.
Project Member

Comment 15 by bugdroid1@chromium.org, Jun 19 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/dd80dd3a955ccfd85294631c0f10d560d0480b56

commit dd80dd3a955ccfd85294631c0f10d560d0480b56
Author: Sadrul Habib Chowdhury <sadrul@chromium.org>
Date: Tue Jun 19 02:02:57 2018

gpu tests: Speculative fix for screenshot test.

Reduce the number of repititions to avoid a timeout in the test.

BUG= 853762 

Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: I08f70316da5c28d92ccb142e71ecd6c903bca612
Reviewed-on: https://chromium-review.googlesource.com/1105063
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org>
Cr-Commit-Position: refs/heads/master@{#568296}
[modify] https://crrev.com/dd80dd3a955ccfd85294631c0f10d560d0480b56/content/test/gpu/gpu_tests/screenshot_sync_expectations.py
[modify] https://crrev.com/dd80dd3a955ccfd85294631c0f10d560d0480b56/content/test/gpu/gpu_tests/screenshot_sync_integration_test.py

Status: Fixed (was: Started)
The test has not flaked since the fix landed. So marking this as fixed.
Thanks sadrul. I'll keep an eye out.

Comment 18 by kbr@chromium.org, Jun 19 2018

Blocking: 851504
Status: Started (was: Fixed)
sadrul@: turning down the number of iterations of this test to reduce flakiness is unsatisfying. The test is a regression test for previously flaky behavior of the browser, so reducing the number of iterations reduces the likelihood that real bugs will be caught.

Could you please pick up this bug again, increase the number of iterations back to 20 and change the 5 second timeout in tab.Screenshot() in the test to 10 seconds? This should eliminate the chance that slow bots were the reason that the screenshot call timed out.

The bots are all running vpython and incorporating the numpy and cv2 packages, which are important to get acceptable performance of Telemetry's screenshot mechanism.

I think it is more likely that there is a real bug where these screenshot requests are intermittently getting lost, possibly related to recent changes in how these requests are transmitted down the rendering pipeline. Linking to possibly related bug  Issue 851504 .

Sure. Is there a timeout for the individual test? Because I suspect that will need to be increased too.

Comment 20 by kbr@chromium.org, Jun 19 2018

I searched through the Telemetry code base and don't see a per-test timeout. Any such timeout must be being enforced deeper, at the typ test harness level (dpranke@ is owner of that). Locally, if I change the test to do 50 invocations, each test still completes in about 15 seconds and the tests don't time out. Again, this is using the "#!/usr/bin/env vpython" shebang that's at the top of src/content/test/gpu/run_gpu_integration_test.py in order to pick up the numpy and cv2 packages. If those aren't being used then the test will run a *lot* slower. All of the bots should be picking up these packages. The timeout in this test run, for example:

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win_angle_rel_ng/1484

and this shard:

https://chromium-swarm.appspot.com/task?id=3e2be56d68f71810&refresh=10&show_raw=1

is caused because the browser didn't respond to Telemetry's screenshot request within 5 seconds, not because the test took too long to run:

[1/4] gpu_tests.screenshot_sync_integration_test.ScreenshotSyncIntegrationTest.ScreenshotSync_GPURasterWithCanvas failed unexpectedly 12.2230s:
 ...

  Traceback (most recent call last):
    _RunGpuTest at content\test\gpu\gpu_tests\gpu_integration_test.py:132
      self.RunActualGpuTest(url, *args)
    RunActualGpuTest at content\test\gpu\gpu_tests\screenshot_sync_integration_test.py:136
      self._CheckScreenshot()
    _CheckScreenshot at content\test\gpu\gpu_tests\screenshot_sync_integration_test.py:121
      screenshot = tab.Screenshot(5)
    traced_function at third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:52
      return func(*args, **kwargs)
    Screenshot at third_party\catapult\telemetry\telemetry\internal\browser\tab.py:116
      return self._inspector_backend.Screenshot(timeout)
    traced_function at third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:52
      return func(*args, **kwargs)
    Inner at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py:41
      inspector_backend._ConvertExceptionFromInspectorWebsocket(e)
    traced_function at third_party\catapult\common\py_trace_event\py_trace_event\trace_event_impl\decorators.py:52
      return func(*args, **kwargs)
    Inner at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py:38
      return func(inspector_backend, *args, **kwargs)
    Screenshot at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py:153
      return self._page.CaptureScreenshot(timeout)
    CaptureScreenshot at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_page.py:153
      res = self._inspector_websocket.SyncRequest(request, timeout)
    SyncRequest at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py:116
      res = self._Receive(timeout)
    _Receive at third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py:155
      data = self._socket.recv()
    recv at third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py:293
      opcode, data = self.recv_data()
    recv_data at third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py:310
      opcode, frame = self.recv_data_frame(control_frame)
    recv_data_frame at third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py:323
      frame = self.recv_frame()
    recv_frame at third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py:357
      return self.frame_buffer.recv_frame()
    recv_frame at third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py:336
      self.recv_header()
    recv_header at third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py:286
      header = self.recv_strict(2)
    recv_strict at third_party\catapult\telemetry\third_party\websocket-client\websocket\_abnf.py:371
      bytes_ = self.recv(min(16384, shortage))
    _recv at third_party\catapult\telemetry\third_party\websocket-client\websocket\_core.py:427
      return recv(self.sock, bufsize)
    recv at third_party\catapult\telemetry\third_party\websocket-client\websocket\_socket.py:83
      raise WebSocketTimeoutException(message)
 

This failure mode is consistent with snapshot requests / CopyOutputRequests being accidentally dropped in some situations.

Comment 21 by kbr@chromium.org, Jun 21 2018

Blockedon: 853651
It's likely that this is related to  Issue 853651  which ccameron@ has a good diagnosis for.

Project Member

Comment 22 by bugdroid1@chromium.org, Jun 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/27efdf3346545126ad4ea94ea4a722250c0bad6a

commit 27efdf3346545126ad4ea94ea4a722250c0bad6a
Author: Sadrul Habib Chowdhury <sadrul@chromium.org>
Date: Fri Jun 22 05:37:59 2018

gpu tests: Update the screenshot test.

Bump the repetition back to 20, but increase the timeout for taking the
screenshots from 5 seconds to 10 seconds instead.

BUG= 853762 

Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: Id6914cf7c1791f654d35725644bad7d8fec22073
Reviewed-on: https://chromium-review.googlesource.com/1107201
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#569535}
[modify] https://crrev.com/27efdf3346545126ad4ea94ea4a722250c0bad6a/content/test/gpu/gpu_tests/screenshot_sync_integration_test.py

Status: Fixed (was: Started)
Is there any existing known flakiness in Chrome screenshoting now, or is it consistent now? Dealing with https://buganizer.corp.google.com/issues/115528744 and wondering if recent M69 changes account for this flakiness and also wondering whether we can repro the flakiness at the Chrome level without all the webdriver complexity.

i.e. is this bug completely fixed?
Taking snapshots via DevTools should be reliable again at this point. Please file a new public bug about that internal one if you find it isn't, and block it on this one so there's some history behind the changes in this area.

Is this an instance of this bug?

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Release%20(AMD)/7586

Traceback (most recent call last):
  <module> at /b/s/w/ir/third_party/catapult/telemetry/telemetry/testing/run_browser_tests.py:359
    ret_code = RunTests(sys.argv[1:])
  RunTests at /b/s/w/ir/third_party/catapult/telemetry/telemetry/testing/run_browser_tests.py:328
    ret, _, _ = runner.run()
  run at /b/s/w/ir/third_party/catapult/third_party/typ/typ/runner.py:179
    ret, full_results = self._run_tests(result_set, test_set)
  _run_tests at /b/s/w/ir/third_party/catapult/third_party/typ/typ/runner.py:466
    self._run_one_set(self.stats, result_set, test_set)
  _run_one_set at /b/s/w/ir/third_party/catapult/third_party/typ/typ/runner.py:511
    test_set.isolated_tests, 1)
  _run_list at /b/s/w/ir/third_party/catapult/third_party/typ/typ/runner.py:537
    _setup_process, _teardown_process)
  make_pool at /b/s/w/ir/third_party/catapult/third_party/typ/typ/pool.py:28
    return _AsyncPool(host, jobs, callback, context, pre_fn, post_fn)
  __init__ at /b/s/w/ir/third_party/catapult/third_party/typ/typ/pool.py:188
    self.context_after_pre = pre_fn(self.host, 1, self.context)
  _setup_process at /b/s/w/ir/third_party/catapult/third_party/typ/typ/runner.py:807
    child.context_after_setup = child.setup_fn(child, child.context)
  _SetUpProcess at /b/s/w/ir/third_party/catapult/telemetry/telemetry/testing/run_browser_tests.py:349
    context.test_class.SetUpProcess()
  SetUpProcess at /b/s/w/ir/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:297
    cls.StartBrowser()
  StartBrowser at /b/s/w/ir/content/test/gpu/gpu_tests/gpu_integration_test.py:96
    cls.tab = cls.browser.tabs[0]
  __getitem__ at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/tab_list.py:18
    return self._tab_list_backend.__getitem__(index)
  __getitem__ at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend_list.py:64
    return self.GetBackendFromContextId(context_id)
  GetBackendFromContextId at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend_list.py:75
    context_id)
  GetInspectorBackend at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/devtools_client_backend.py:594
    self._app_backend.app, self._devtools_client, context)
  traced_function at /b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
    return func(*args, **kwargs)
  __init__ at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:70
    self._websocket.Connect(self.debugger_url, timeout)
  Connect at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:84
    skip_utf8_validation=True)
  CreateConnection at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/websocket.py:25
    return _create_connection(*args, **kwargs)
  create_connection at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:487
    websock.connect(url, **options)
  connect at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:214
    self.handshake_response = handshake(self.sock, *addrs, **options)
  handshake at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_handshake.py:65
    status, resp = _get_resp_headers(sock)
  _get_resp_headers at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_handshake.py:120
    status, resp_headers = read_headers(sock)
  read_headers at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_http.py:223
    line = recv_line(sock)
  recv_line at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py:101
    c = recv(sock, 1)
  recv at /b/s/w/ir/third_party/catapult/telemetry/third_party/websocket-client/websocket/_socket.py:83
    raise WebSocketTimeoutException(message)
WebSocketTimeoutException: timed out

No. In this case the harness is failing to bring up the browser. This looks like a hardware issue per  Issue 884064 .

Sign in to add a comment