New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 593710 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug

Blocking:
issue 440500



Sign in to add a comment

Windows 7 x64 Debug GPU FYI bot failing WebglConformance.conformance_uniforms_gl_uniform_arrays

Project Member Reported by jmad...@chromium.org, Mar 10 2016

Issue description

First failing build:

https://build.chromium.org/p/chromium.gpu.fyi/builders/Win7%20x64%20Debug%20%28NVIDIA%29/builds/843

ANGLE revision stayed the same so it's unlikely to be ANGLE.

Regression range:

https://chromium.googlesource.com/chromium/src/+log/624c53d136d76327615167859e5e015515286d99..d4dcbd342dd54f55383daf8bc44b2c9d97fe0d0b

Possibly related to the VS2015 change? There seems to be a timeout in this test, possibly an ASSERT dialog. I have not tried to repro yet.
 
Blocking: 440500
Owner: brucedaw...@chromium.org
Status: Assigned (was: Untriaged)
Looks like the failure cleaned up with the revert of the VS2015 change.. leaving open as there may be a regression still if we re-land the compiler change.
I will test.

If you want to test locally you can go:

set GYP_MSVS_VERSION=2015
gclient runhooks

ninja ...

Any advice on how to build/run this test locally?

Also, I see a lot of GL_ERROR output - I assume that is normal?
Found it:

python ../content/test/gpu/run_gpu_test.py webgl_conformance --show-stdout --browser=debug_x64 -v "--extra-browser-args=--enable-logging=stderr --js-flags=--expose-gc" --output-dir c:\users\brucedawson\documents\tmpdkcls2telemetry --output-format=json
Running webgl_conformance seems to take tens of minutes. Is there a way to just run WebglConformance.conformance_uniforms_gl_uniform_arrays? Without that it's pretty unwieldy trying to understand the problem. 
The reason I couldn't find gl_uniform_arrays in a code search is because it is listed as gl-uniform-arrays. Once I found it I modified three files in third_party\webgl\src. In the first two I deleted all listed files except for the reference to the next file. In the last one I deleted all listed tests except for gl-uniform-arrays.html.

sdk/tests/00_test_list.txt
sdk/tests/conformance/00_test_list.txt
sdk/tests/conformance/uniforms/00_test_list.txt

Now I can repro the failure in a timely manner. It would seem helpful to have and document a less clunky way of running a single sub-test.

The test seems to be running normally (alternating printing out vertex shader and pixel shader on the web page) and then times out before it completes - no assert dialog:

1) Does anybody know where the timeout is set so I can try increasing it? It appears to be a network timeout.
2) Does anybody know how long the VS 2013 version takes to run this test? I'm currently building a VS 2013 version to compare.

Comment 8 by kbr@chromium.org, Mar 10 2016

Cc: nedngu...@google.com aiolos@chromium.org eakuefner@chromium.org
Components: Tests>Telemetry
Apologies for the difficulty running individual tests. The way to do this is to pass the --story-filter argument to the harness:

run_gpu_test.py webgl_conformance --story-filter=gl_uniform_arrays

The timeouts are probably deep inside Telemetry; if you can provide a stack trace we can figure out how to hardwire them to some larger value.

The VS 2013 version takes about 20 seconds to run the test.

I'm not sure whether the VS 2015 version is running much slower, or if it is iterating more times than it should. Making progress...

The timeout callstack is shown below. Temporarily hardwiring the timeout to a larger value for test purposes could be handy.

  File "third_party\catapult\telemetry\telemetry\internal\story_runner.py", line 84, in _RunStoryAndProcessErrorIfNeeded
    state.RunStory(results)
  File "content\test\gpu\gpu_tests\gpu_test_base.py", line 122, in RunStory
    RunStoryWithRetries(DesktopGpuSharedPageState, self, results)
  File "content\test\gpu\gpu_tests\gpu_test_base.py", line 72, in RunStoryWithRetries
    super(cls, shared_page_state).RunStory(results)
  File "third_party\catapult\telemetry\telemetry\page\shared_page_state.py", line 323, in RunStory
    self._current_page.Run(self)
  File "third_party\catapult\telemetry\telemetry\page\__init__.py", line 84, in Run
    shared_state.page_test.RunNavigateSteps(self, current_tab)
  File "third_party\catapult\telemetry\telemetry\page\page_test.py", line 182, in RunNavigateSteps
    page.RunNavigateSteps(action_runner)
  File "content\test\gpu\gpu_tests\webgl_conformance.py", line 143, in RunNavigateSteps
    'webglTestHarness._finished', timeout_in_seconds=180)
  File "third_party\catapult\telemetry\telemetry\page\action_runner.py", line 186, in WaitForJavaScriptCondition
    self._tab.WaitForJavaScriptExpression(condition, timeout_in_seconds)
  File "third_party\catapult\telemetry\telemetry\internal\browser\web_contents.py", line 103, in WaitForJavaScriptExpression
    self._inspector_backend.GetCurrentConsoleOutputBuffer())
  File "third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py", line 35, in inner
    inspector_backend._ConvertExceptionFromInspectorWebsocket(e)
  File "third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py", line 32, in inner
    return func(inspector_backend, *args, **kwargs)
  File "third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_backend.py", line 177, in GetCurrentConsoleOutputBuffer
    return self._console.GetCurrentConsoleOutputBuffer(timeout)
  File "third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_console.py", line 37, in GetCurrentConsoleOutputBuffer
    self._EnableConsoleOutputStream()
  File "third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_console.py", line 50, in _EnableConsoleOutputStream
    self._inspector_websocket.SyncRequest({'method': 'Console.enable'})
  File "third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py", line 110, in SyncRequest
    res = self._Receive(timeout)
  File "third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py", line 149, in _Receive
    data = self._socket.recv()
  File "third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 596, in recv
    opcode, data = self.recv_data()
  File "third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 606, in recv_data
    frame = self.recv_frame()
  File "third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 637, in recv_frame
    self._frame_header = self._recv_strict(2)
  File "third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 746, in _recv_strict
    bytes = self._recv(shortage)
  File "third_party\catapult\telemetry\third_party\websocket-client\websocket.py", line 732, in _recv
    raise WebSocketTimeoutException(e.message)
TimeoutException:
Hmhh, this seems really messed up. Somehow we got a first timeout when waiting for the js condition to be True, then we get a WebSocketTimeoutException timeout when telemetry try to contact devtool to get log console.

Comment 11 by kbr@chromium.org, Mar 10 2016

Look up the stack: you can hardwire it probably at this stack frame or above:

File "third_party\catapult\telemetry\telemetry\internal\backends\chrome_inspector\inspector_websocket.py", line 110, in SyncRequest
    res = self._Receive(timeout)

Ken: the first timeout when waiting for the js condition is "File "content\test\gpu\gpu_tests\webgl_conformance.py", line 143, in RunNavigateSteps
    'webglTestHarness._finished', timeout_in_seconds=180)"

Comment 13 by kbr@chromium.org, Mar 10 2016

Thanks. Bruce, please try increasing that timeout.

The VS 2015 debug version is slightly slower than the VS 2013 debug version.

On my Z840 machine the VS 2013 64-bit debug version completes the 18 (I think) iterations of this test in about 200 s.

When I raised the timeout (thanks!) the VS 2015 64-bit debug test passed in 268.7 s.

A 34% slowdown is worth investigating, but since it is in debug builds it is not a blocker. I opened  crbug.com/593983  to track this.

It's curious that all of the runtimes are higher than the timeout, which makes it hard to know what to set the timeout to. The VS 2015 test passed with the timeout set to 240, but 300 would give more breathing room.

crrev.com/1785713003 raises the timeout to 300. Let me know what you think.

Project Member

Comment 15 by bugdroid1@chromium.org, Mar 11 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/87908413585f8cc838fe2eae1581dbcc767ca49f

commit 87908413585f8cc838fe2eae1581dbcc767ca49f
Author: brucedawson <brucedawson@chromium.org>
Date: Fri Mar 11 19:48:05 2016

Increase GPU test timeout to avoid fragility

VS 2015 64-bit debug builds were hitting tests timeouts on
WebglConformance.conformance_uniforms_gl_uniform_arrays. It turns out
that VS 2015 64-bit debug builds are, in this scenario at least, about
34% slower than VS 2013's debug builds. This means that VS 2013 is also
close to hitting this timeout, at least on slower machines.

This change increases the timeout from 180 to 300.

 Bug 593983  covers investigating why VS 2015's builds are slower, but
since this issue only occurs in debug builds it is not an immediate
blocker.

TBR=kbr@chromium.org
BUG= 593710 , 440500 , 593983 

Review URL: https://codereview.chromium.org/1785713003

Cr-Commit-Position: refs/heads/master@{#380708}

[modify] https://crrev.com/87908413585f8cc838fe2eae1581dbcc767ca49f/content/test/gpu/gpu_tests/webgl_conformance.py

Status: Fixed (was: Assigned)
Fixed. Investigating *why* VS 2015 debug builds are slower is tracked in  crbug.com/593983 .
I lowered the test iteration count to two and grabbed ETW traces (tinyurl.com/etwcentral) of the VS 2013 and VS 2015 builds (both 64-bit debug) running the tests. When grouped by Randomascii Exclusive (module and function) I noticed that gles2_utils.dll used almost twice as much CPU time in the VS 2015 case - see attached screen shot of the two profiles, with VS 2013 at the top. Looking at the top functions that the four top VS 2013 functions map neatly to four identical functions in the VS 2015 profile.

However the VS 2015 profile also shows nine other functions using equal or larger amounts of CPU time - functions that simply do not show up in the VS 2013 profile. Either these functions are new or else they were previously inlined. It seems most likely that they were previously inlined. Since most of them are in the std::string class it's not clear if there is anything we can do to improve them.
VS 2015 debug slowdown.PNG
43.3 KB View Download
Thanks for figuring this out Bruce. Sorry I couldn't be of any help!
One thought: the --story-filter arguments as a way to filter down to specific sub-tests is not very discoverable. Early on I ran this:

python ../content/test/gpu/run_gpu_test.py

and it gave me a list of the top-level tests. It seems like this would be an ideal place to mention --story-filter. Thoughts? Should I implement this or file a bug or ???

Comment 20 by kbr@chromium.org, Mar 14 2016

Bruce: sorry for the difficulty -- but the --story-filter option is listed under

run_gpu_test.py help run

This harness is likely going to change soon, so I don't think it's worth implementing a more specialized help function.

Sign in to add a comment