New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 671416 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 672999



Sign in to add a comment

"gpu_rasterization_tests (with patch)" is flaky

Project Member Reported by chromium...@appspot.gserviceaccount.com, Dec 6 2016

Issue description

"gpu_rasterization_tests (with patch)" is flaky.

This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label.

We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiRncHVfcmFzdGVyaXphdGlvbl90ZXN0cyAod2l0aCBwYXRjaCkM.



This flaky test/step was previously tracked in  issue 653365 .
 
Labels: Infra-Troopers
The problem seems to be a timeout while uploading screenshot after test, see stack trace below.

Troopers: is this a problem with the server for uploading test results?


(WARNING) 2016-12-05 11:28:07,993 shared_page_state.DumpStateUponFailure:140  Taking screenshots upon failures disabled.
Traceback (most recent call last):
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 87, in _RunStoryAndProcessErrorIfNeeded
    state.RunStory(results)
  File "/b/s/w/ir02REGx/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir02REGx/content/test/gpu/gpu_tests/gpu_test_base.py", line 111, in RunStory
    RunStoryWithRetries(GpuSharedPageState, self, results)
  File "/b/s/w/ir02REGx/content/test/gpu/gpu_tests/gpu_test_base.py", line 72, in RunStoryWithRetries
    super(cls, shared_page_state).RunStory(results)
  File "/b/s/w/ir02REGx/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 301, in RunStory
    self._current_page, self._current_tab, results)
  File "/b/s/w/ir02REGx/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir02REGx/content/test/gpu/gpu_tests/gpu_rasterization.py", line 51, in ValidateAndMeasurePage
    screenshot = tab.Screenshot()
  File "/b/s/w/ir02REGx/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/telemetry/internal/browser/tab.py", line 117, in Screenshot
    return self._inspector_backend.Screenshot(timeout)
  File "/b/s/w/ir02REGx/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 39, in inner
    inspector_backend._ConvertExceptionFromInspectorWebsocket(e)
  File "/b/s/w/ir02REGx/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 36, in inner
    return func(inspector_backend, *args, **kwargs)
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 147, in Screenshot
    return self._page.CaptureScreenshot(timeout)
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_page.py", line 146, in CaptureScreenshot
    res = self._inspector_websocket.SyncRequest(request, timeout)
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 110, in SyncRequest
    res = self._Receive(timeout)
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 149, in _Receive
    data = self._socket.recv()
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 596, in recv
    opcode, data = self.recv_data()
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 606, in recv_data
    frame = self.recv_frame()
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 637, in recv_frame
    self._frame_header = self._recv_strict(2)
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 746, in _recv_strict
    bytes = self._recv(shortage)
  File "/b/s/w/ir02REGx/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 732, in _recv
    raise WebSocketTimeoutException(e.message)
TimeoutException: <unprintable TimeoutException object>

Owner: petrcermak@chromium.org
Status: Assigned (was: Untriaged)
It appears from the first line that taking screenshots is disabled:
“(WARNING) 2016-12-05 11:28:07,993 shared_page_state.DumpStateUponFailure:140  Taking screenshots upon failures disabled.”

Petr, what would cause this particular condition?
Cc: eyaich@chromium.org
That is odd, not being able to take a screenshot shouldn't be a failure case.  

This is potentially something we can add a timeout to, we haven't seen this behavior in telemetry before.
Cc: nednguyen@chromium.org
Labels: -Sheriff-Chromium
Maybe the screenshot is not happening on failure, but during the normal running of the test?  So disabled-on-failure has no effect?
Owner: nedngu...@google.com
Sorry, I don't work on Chrome anymore. Ned should be able to help you here :-)
Owner: kbr@chromium.org

Comment 9 by kbr@chromium.org, Dec 8 2016

Cc: -nednguyen@chromium.org nedngu...@google.com jo...@chromium.org caseq@chromium.org pschmidt@chromium.org pfeldman@chromium.org
Components: Platform>DevTools Tests>Telemetry Infra>Labs
Owner: d...@chromium.org
The bug is in the capturing of page screenshots via DevTools. The report "shared_page_state.DumpStateUponFailure:140  Taking screenshots upon failures disabled." is a red herring. The actual bug is here:

  File "/b/s/w/ir7TLRzc/content/test/gpu/gpu_tests/gpu_rasterization.py", line 51, in ValidateAndMeasurePage
    screenshot = tab.Screenshot()
  File "/b/s/w/ir7TLRzc/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir7TLRzc/third_party/catapult/telemetry/telemetry/internal/browser/tab.py", line 117, in Screenshot
    return self._inspector_backend.Screenshot(timeout)
  File "/b/s/w/ir7TLRzc/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir7TLRzc/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 39, in inner
    inspector_backend._ConvertExceptionFromInspectorWebsocket(e)
  File "/b/s/w/ir7TLRzc/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
    return func(*args, **kwargs)
  File "/b/s/w/ir7TLRzc/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 36, in inner
    return func(inspector_backend, *args, **kwargs)
  File "/b/s/w/ir7TLRzc/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 147, in Screenshot
    return self._page.CaptureScreenshot(timeout)
  File "/b/s/w/ir7TLRzc/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_page.py", line 146, in CaptureScreenshot
    res = self._inspector_websocket.SyncRequest(request, timeout)
  File "/b/s/w/ir7TLRzc/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 110, in SyncRequest
    res = self._Receive(timeout)
  File "/b/s/w/ir7TLRzc/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 149, in _Receive
    data = self._socket.recv()
  File "/b/s/w/ir7TLRzc/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 596, in recv
    opcode, data = self.recv_data()
  File "/b/s/w/ir7TLRzc/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 606, in recv_data
    frame = self.recv_frame()
  File "/b/s/w/ir7TLRzc/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 637, in recv_frame
    self._frame_header = self._recv_strict(2)
  File "/b/s/w/ir7TLRzc/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 746, in _recv_strict
    bytes = self._recv(shortage)
  File "/b/s/w/ir7TLRzc/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 732, in _recv
    raise WebSocketTimeoutException(e.message)


The test uses Telemetry's tab.Screenshot() API, which uses the DevTools protocol to satisfy it. That's intermittently timing out, causing both of the (two) tests in this test to fail.

We've seen this behavior before. It was related to the screensaver becoming active, when that is supposed to be disabled on all of these machines.

Here are the failing jobs:

https://chromium-swarm.appspot.com/task?id=32ee9d0a949abb10&refresh=10&show_raw=1
https://chromium-swarm.appspot.com/task?id=32e9d07b588dd010&refresh=10&show_raw=1
https://chromium-swarm.appspot.com/task?id=32e944db3a220010&refresh=10&show_raw=1

They all happened on this bot:
build704-m4

Labs team, could you please investigate that bot? Thanks.

Comment 10 by d...@chromium.org, Dec 8 2016

Status: Fixed (was: Assigned)
Bot was resurrected recently, and the timelines when this started line up to the requests for HwOps to revive the machine. I'm assuming the cause to why the machine kept dying was due to someone bumping the power adapter and not re-seating it fully, as the machine had a locked screen. I've verified now that the screen is not locked via vnc, so it's good to go. Typically speaking we're supposed to verify after hwops revives these that the screen is not locked due to the machine going into hibernation mode I'm guessing this one got missed.

In the future, if this is being disruptive, please click the "shutdown bot gracefully" button on the bot's swarming page to take it out of the pool. This will stop the swarming daemon until it's relaunched either manually or by the system rebooting (the machine will not auto-reboot when not connected to swarming).

Comment 11 by kbr@chromium.org, Dec 12 2016

Thanks very much Bryce. I added instructions about using the "shutdown bot gracefully" button and following up with an Infra>Labs bug to https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions#TOC-Taking-a-machine-out-of-the-Swarming-pool .

The bot's running well again.

Comment 12 by kbr@chromium.org, Dec 13 2016

Blocking: 672999

Sign in to add a comment