New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 823440 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
OOO until 2019-01-24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Mac
Pri: 2
Type: Bug



Sign in to add a comment

WebGL Performance regression, inconsistent speed

Reported by da...@gskinner.com, Mar 19 2018

Issue description

UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36

Example URL:

Steps to reproduce the problem:
1. Load https://codepen.io/createjs/pen/OORdMw
2. Set object count to 50,000 (max)
3. Observe Framerate in top left

What is the expected behavior?
Decent framerate of around 40 - 60 fps depending upon hardware.

What went wrong?
Unworkable framerate of around 20 fps

Does it occur on multiple sites: N/A

Is it a problem with a plugin? N/A 

Did this work before? Yes 64

Does this work in other browsers? N/A

Chrome version: 65.0.3325.162  Channel: stable
OS Version: 10.0
Flash Version: Shockwave Flash 29.0 r0

Confirmed on other similar systems, debugging information attached.

One of the more intriguing symptoms is that the performance recording shows wildly inconsistent durations for the bufferSubData (img). As this is a performance benchmark I'm simply updating the vertex positions and feeding it in, very simply and reliably. It's the same operation within the tick and between the ticks, yet it is demonstrated taking anywhere from no time at all to 30ms+ within the same tick and a different points in the same tick.

Problem persists in Canary Chrome 67, although the FPS drop is not as significant.
 
65Performance.png
52.7 KB View Download
65Performance_Profile-20180319T140208.zip
1.1 MB Download
trace_Tracing_65Performance.json.gz
1.4 MB Download
65Performance_DESKTOP-8RM1V0E.03-19-2018.14-46-46.etl.7z
7.8 MB Download
Labels: Needs-Bisect Needs-Triage-M65
Components: Blink>WebGL
Labels: -Type-Compat -Needs-Bisect Triaged-ET M-67 Target-67 FoundIn-67 OS-Linux OS-Mac Type-Bug
Status: Untriaged (was: Unconfirmed)
Able to reproduce the issue on reported chrome version 65.0.3325.162 and on the latest chrome version 67.0.3375.0 using Windows-10, Mac 10.13.1 and Ubuntu 14.04. As the issue is seen from M60(60.0.3112.0) considering it as non-regression and marking it as Untriaged.
Note: Adding GPU details of my windows-10 system.

Thanks!
Windows-10 GPU.pdf
137 KB Download

Comment 3 by piman@chromium.org, Mar 20 2018

Cc: sunn...@chromium.org kainino@chromium.org
Owner: kbr@chromium.org
Status: Assigned (was: Untriaged)
In the attached trace, we see a lot of CommandBufferProxyImpl::WaitForToken, which happens when the transfer buffer is full, typically as a result of back-pressure (if the page submits work faster than the GPU can consume). In this case 2 things are odd:
1- the GPU process seems mostly idle (except in some cases where we're blocked in GLES2DecoderImpl::HandlePostSubBufferCHROMIUM which is taking surprising long sometimes, which is possible if the GPU workload is high, and explains the spikes). It seems to only be doing work during the WaitForToken
2- WaitForToken waits for a very recent token (typically last-known+1 or 2).

This is somewhat unusual and warrants investigation. It suggests that we're filling up the transfer buffer in spikes, probably on a single BufferSubData call, which otherwise is quick to execute on the service side - we're not actually waiting on GPU back-pressure. Essentially, it sounds like the transfer buffer is much smaller than it should be.

I'm not sure if the transfer buffer size for WebGL contexts has changed in 65, which would explain the regression. But it sounds like it could use some tuning. Assigning to WebGL folks to investigate further.

Comment 4 by kbr@chromium.org, Mar 21 2018

Submitter: could you provide about:gpu (copy/paste is fine) from your machine?

I can see the repeated WaitForToken calls in the middle of frames, at least on macOS; I don't remember any changes to the transfer buffer size in M65 but will see if increasing it helps eliminate them.

Here's an about:tracing trace gathered from a MacBook Air with Intel HD 6000 GPU, with "JavaScript and Rendering" enabled. In the GPU process trace, all the time is being spent in ImageTransportSurfaceOverlayMac::ApplyBackpressure. Both Firefox and Safari seem to be capped at 30 FPS on this hardware, like Chrome, so maybe the GPU is actually being swamped. Still will try to eliminate the mid-frame stalls and see if that helps on Windows in particular.

trace_trace-createjs-50k-mac-intel-hd-6000.json.gz
2.3 MB Download

Comment 5 by piman@chromium.org, Mar 21 2018

@#4: in your trace, while there is a bit of back-pressure (similar to the windows trace, actually, that shows up in HandlePostSubBufferCHROMIUM), the GPU main thread is 50-70% idle, and we observe the same patterns of synchronous JS/GPU execution which I suspect comes from a too small transfer buffer.

Comment 6 by kbr@chromium.org, Mar 21 2018

Increasing both the minimum and maximum transfer buffer size to 64 MB in https://cs.chromium.org/chromium/src/gpu/command_buffer/client/shared_memory_limits.h increased the performance from 50 FPS to 60 FPS on my Windows workstation. The WaitForToken calls disappear from the trace though there are still a few flushes per frame which don't go away even if the command buffer is increased to 16 MB. They might come from the HTML UI and multiplexing between contexts.

I need to study the heuristics involved in automatically increasing the transfer buffer size and see whether it's actually growing as expected to the maximum size when the initial size is 1 MB and the max size is 64 MB.

trace_createjs-xfer-buffer-1mb-min-64mb-max-cmd-buffer-1mb.json.gz
2.0 MB Download
trace_createjs-xfer-buffer-64mb-min-64mb-max-cmd-buffer-1mb.json.gz
2.0 MB Download

Comment 7 by da...@gskinner.com, Mar 21 2018

@#4 My GPU is the hybrid system. I've never been exactly sure how it works but the Intel card offloads a lot to the 970M, so that might be why the information about the GPU status is confused compared to it's actual utilization.
GPU.png
2.8 KB View Download

Comment 8 by kbr@chromium.org, May 18 2018

Cc: jdarpinian@chromium.org
See also https://twitter.com/gfxprogrammer/status/997222952666939392 and specifically this test case:
http://floooh.github.io/oryol-samples/wasm/Dragons.html

Crank the number of dragons up to 1024.

It looks like the texture upload is the rate limiting factor. Need to try this test case with the increased transfer buffer size as well.

Sign in to add a comment