New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 735483 link

Starred by 5 users

Issue metadata

Status: ExternalDependency
Owner:
OOO until 2019-01-24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug

Blocking:
issue 662644


Show other hotlists

Hotlists containing this issue:
webgl-conformance-all


Sign in to add a comment

Poor performance rapidly updating sampler uniforms in WebGL on AMD Macbooks

Reported by m...@leemcdermott.co.uk, Jun 21 2017

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.68 Safari/537.36

Example URL:
https://tsherif.github.io/picogl.js/examples/cloth.html

Steps to reproduce the problem:
1. Visit given URL
2. Observe poor performance

What is the expected behavior?
60fps

What went wrong?
~1fps on 2016/17 AMD MacBook Pros

Contents of about:gpu attached.

Does it occur on multiple sites: Yes

Is it a problem with a plugin? N/A 

Did this work before? N/A 

Does this work in other browsers? N/A

Chrome version: 58.0.3029.68  Channel: stable
OS Version: OS X 10.12.5
Flash Version: 

I'm seeing this in my own WebGL work, but I'm not able to post this work/code to a public bug tracker. In one particular example, I'm seeing a smooth and consistent 60fps with some fairly straightforward 2D compositing work on a 2013 MacBook Pro. On a brand new, high-spec'd 2017 MacBook Pro I'm seeing about 4-5fps

If I can reduce this issue to whatever the specific bottleneck is, I'll update this bug.
 
gpu.htm
99 KB View Download

Comment 1 by bokan@chromium.org, Jun 22 2017

Components: -Blink Blink>WebGL

Comment 2 by bokan@chromium.org, Jun 22 2017

Could you update to M59 and see if the performance is still bad? You could also give it a try from Canary channel (M61). 
Tried on M60 and M61. Same issue.
Also the Chrome GPU process on a newer Mac is constantly maxed out at 100% with memory constantly increasing.

On an older Mac with Intel graphics, this demo runs at 60fps and GPU process at 20-30% when interacting with the cloth. Memory doesn't change at all.

Comment 4 by zmo@chromium.org, Jun 22 2017

Please post about:gpu page and a link to demo you use to reproduce this

Comment 5 Deleted

Comment 6 Deleted

Comment 7 by tshe...@gmail.com, Jun 22 2017

This issue appears to be specific to the AMD graphics driver on macOS. See discussion here: https://news.ycombinator.com/item?id=14603306

I get good performance on a macOS 10.12.5 2014 Mac Mini with an Intel Iris GPU, a macOS 10.12.5 2013 Macbook Pro with Nvidia GT 750m, Ubuntu 16.04 with Quadro M1000, Windows 10 with an Nvidia GT 750m, and Windows 10 with Intel HD 4000.

On a macOS 10.12.5 2015 iMac with an AMD Radeon M290, performance tanks to around 1FPS.


All the information is above. When I try to repost, my comment gets auto-deleted :S

Comment 9 by zmo@chromium.org, Jun 22 2017

Cc: kbr@chromium.org zmo@chromium.org kainino@chromium.org
Status: Available (was: Unconfirmed)
Thanks. I missed the original info.  I can reproduce the slow FPS on my Mac with AMD gpu (2015 MacBook Pro). I didn't try on other Macs though.

Comment 10 by tshe...@gmail.com, Jun 22 2017

A few things about the demo that might help track the issue down: The key bottleneck seems to be in the simulation loop which uses EXT_color_buffer_float to ping-pong between RGBA32F immutable float texture targets that use sampler objects to store sampling state. Not sure if some combination of those things might be leading to the slowdown. 

Comment 11 by kbr@chromium.org, Jun 23 2017

I tried eliminating the sampler object, and switching back to mutable textures. See attached diffs. (There's at least one bug fix in there, where "options" was incorrectly being destroyed in the HTMLImageElement upload path.) It's still slow on Mac/AMD.

The slow operations are definitely the updates to the floating-point textures. Commenting those out and still rendering the non-updating cloth mesh is fast.

I think that the problem's caused by replacing textures on the framebuffer objects, rather than just allocating different framebuffer objects and calling bindFramebuffer. However, it's not trivial for me to change the sample to allocate multiple FBOs.

Tarek, could you please do this and post a patch implementing this change? We can then test it here. Thanks.

diffs.patch
6.2 KB Download

Comment 12 by tshe...@gmail.com, Jun 24 2017

Sure, thing. I'll also try putting together a more stripped down implementation of a similar pipeline without PicoGL. Something where it just ping-pongs drawing a triangle back and forth.

One thing I managed to pinpoint today is that it is specifically the call to drawArrays that's slow. If I run all the GL calls in the simulation loop, but skip the call to drawArrays, it animates smoothly.

Comment 13 by kbr@chromium.org, Jun 24 2017

Which call to drawArrays? There are multiple calls to app.drawCalls(...).draw() both inside and outside the simulation loop and it's not obvious to me which are indexed vs. not.

Comment 14 by tshe...@gmail.com, Jun 24 2017

I manipulated the code to narrow that down. I added some code to PicoGL to allow me optionally do all the setup for a draw call, but skip the final call to drawArrays or drawElements. And when I did so for the calls in the simulation loop (which are all drawArrays), the frame rate returned to normal. 

I don't think it's indexed vs non-indexed that's important necessarily, but that the slowdown might be in whatever checks the driver does before executing a draw command.

One other thing that might be of interest: If I remove all but two drawcalls in the simulation loop, it's slow. If I reduce it one, then the framerate returns to normal. So it seems to be specifically when drawing to a texture that was just used as input.

Comment 15 by tshe...@gmail.com, Jun 24 2017

Kai, here's a version of the simulation that uses separate FBOs for each draw call: https://tsherif.github.io/picogl.js/examples/cloth-no-target-change.html

It now ping-pongs only once per frame to feed the new positions back into the sim. How does this one run for you?

Comment 16 by kbr@chromium.org, Jun 27 2017

Tarek: thanks for putting together that variant. It's still slow on the MacBook Retina with AMD GPU.

I've been tinkering with the example; see examples/cloth-no-target-change-2.html in the attached archive. I removed all but the first two draw calls in the constraint loop, reduced the number of iterations, and shrank the size of the data texture. It's still quite slow on the affected hardware. I then tried simplifying the update-constraint-fs. Simplifying it too much -- even leaving in both texelFetch calls and essentially making the update of the position a no-op -- made it significantly faster.

So it looks like the shader is actually running very slowly -- it's not simply the texelFetch operations. (I also tried replacing them with textureLod calls -- same basic speed.)

I'm going to file a Radar with Apple because this seems to be a problem specific to their Mac AMD driver and want to get this in the hands of their engineers sooner rather than later.

Comment 17 by kbr@chromium.org, Jun 27 2017

Owner: kbr@chromium.org
Status: ExternalDependency (was: Available)
Filed as Radar 32994227. We will need feedback from Apple and AMD on this one. In the meantime, if any more progress can be made to reduce the test case and/or figure out a workaround, that would help in creating a regression test.

Comment 18 by kbr@chromium.org, Jun 27 2017

Summary: Poor performance of RGBA32F render targets in WebGL on newer AMD Macbooks (was: Huge WebGL performance regression on newer AMD Macbooks)

Comment 19 by tshe...@gmail.com, Jun 29 2017

Have a minimal example for you: https://tsherif.github.io/webgl2bugs/mac/amd-texture-update.html

We were way off. That driver simply doesn't seem to like frequent updates to the texture unit a uniform is pointing to. They don't have to be RGBA32F or render targets.

If you comment out one of the uniform1i calls on line 166 or line 168, frame rate returns to normal. One the machine I was testing with a Radeon R9 M290, frame rate went from 15fps to 60fps. Commenting out one of the calls to drawArrays similarly returns performance to normal.

Does this problem not exist in WebGL 1? I'm surprised it wouldn't have come up before...

Comment 20 by a...@figma.com, Jul 2 2017

Figma is seeing a big slowdown specifically on 2015 MB Pro with AMD 370 and WebGL 1, but haven't isolated it since we just discovered it.  Doesn't happen on 2016 with AMD 455. We don't use 32F. Scene is simple 8 quads with clipping, and dragging one of these around is slower than Nvidia/Intel.

Comment 21 by kbr@chromium.org, Jul 6 2017

Summary: Poor performance rapidly switching textures in WebGL on AMD Macbooks (was: Poor performance of RGBA32F render targets in WebGL on newer AMD Macbooks)
Tarek, thanks very much for the test case. This is great.

It's not clear to me how long this has been happening. The problem certainly happens with WebGL 1.0 programs -- see the attached modified version of your test case. I'm surprised this hasn't been seen before. I tried some obvious things like setting the texture filtering modes but they had no effect on the performance.

Could I have your permission to incorporate a version of these programs into the WebGL conformance suite? This would require relicensing under Khronos' copyright. Or please feel free to put up a pull request. I'd like to make the tests more minimal, and eliminate the use of the PNG texture. Basically, the test should assert that it completes rendering a certain number of frames in a reasonable timeframe.

Note that the Radar I filed was closed as a duplicate, but I can't see the contents of the other Radar so don't have any details about the true root cause nor if a fix is imminent. Still, we should add a regression test for this.

amd-texture-update.html
6.0 KB View Download
amd-texture-update-webgl1.html
6.8 KB View Download
khronos_webgl.png
30.9 KB View Download

Comment 22 by tshe...@gmail.com, Jul 6 2017

Sure, I wouldn't mind writing this up as a conformance test. Would you just want the WebGL 1 version or both? Could you point me to another timing-based conformance test that I could use as a reference?

Comment 23 by kbr@chromium.org, Jul 6 2017

Let's incorporate both versions just in case.

Could you please put them in conformance/rendering/ and conformance2/rendering/ ?

You can use conformance/extensions/ext-disjoint-timer-query.html and conformance/glsl/misc/large-loop-compile.html for a couple of examples of tests which use Date.now(). You could also consider using window.performance.now().

Basically, I would say that with the more stressful situation of 400 draw calls per frame (the WebGL 1.0 case above), the test should complete something like 150 frames in 5 seconds. If the test hasn't rendered 150 frames within 5 seconds, fail. This would mean a target frame rate of 30 FPS. Take a look at other conformance/rendering tests to see how they set up the canvas, and try to use as many of the built-in utility functions as possible.

Please add the new test to 00_test_list.txt with --min-version=1.0.4 for the WebGL 1.0 case and --min-version=2.0.1 for the WebGL 2.0 case.

Thanks.

Comment 24 by tshe...@gmail.com, Jul 6 2017

Thanks for the tips, Ken. PR for the tests have been made here: https://github.com/KhronosGroup/WebGL/pull/2448

Comment 25 by tshe...@gmail.com, Jul 7 2017

Also, do you think it would be worth opening another bug ticket with Apple since our understanding of the problem has changed so much?

Comment 26 by kbr@chromium.org, Jul 10 2017

Tarek, thanks for the tests. They look good; just merged your pull request.

It's not worth opening a new bug with Apple at this point. They informed us that the problem is that the driver is recompiling the shader every time the uniform location for the sampler is changed. The bug should be fixed in 10.13. The workaround is to leave the sampler uniforms as is, and call bindTexture() / bindFramebuffer() to switch which texture is being sampled from and rendered to.

Project Member

Comment 27 by bugdroid1@chromium.org, Jul 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d36daf872d9601f32503e3c1f101e8d1395a4aa5

commit d36daf872d9601f32503e3c1f101e8d1395a4aa5
Author: Kenneth Russell <kbr@chromium.org>
Date: Fri Jul 14 21:13:41 2017

Roll WebGL 5e57726..72eda82

https://chromium.googlesource.com/external/khronosgroup/webgl.git/+log/5e57726..72eda82

BUG= 733599 , 735483,  angleproject:2103 
TEST=bots
TBR=zmo@chromium.org

CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel

Change-Id: I4542a243a3b34141becea64e1f024872f64c3ce2
Reviewed-on: https://chromium-review.googlesource.com/570761
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#486889}
[modify] https://crrev.com/d36daf872d9601f32503e3c1f101e8d1395a4aa5/DEPS
[modify] https://crrev.com/d36daf872d9601f32503e3c1f101e8d1395a4aa5/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
[modify] https://crrev.com/d36daf872d9601f32503e3c1f101e8d1395a4aa5/content/test/gpu/gpu_tests/webgl_conformance_expectations.py
[modify] https://crrev.com/d36daf872d9601f32503e3c1f101e8d1395a4aa5/content/test/gpu/gpu_tests/webgl_conformance_revision.txt

Comment 28 by a...@figma.com, Jul 19 2017

So we just performed a test of a simple file in Figma on an AMD Mac (2015), and Chrome WebGL is significantly slower than Firefox.  This may not be an Apple bug after all, but something in the WebGL layer of Chrome specifically. We pre-bind sampler uniforms, so it's relate, but not the same issue.

Try changing fill or dragging one of the frames in the file below (duplicate to your files first).

https://www.figma.com/file/EMhX3QjWKwSdIBSMdgHPLul7/Slow-Dragging-AMD---Chrome-WebGL


Comment 29 by kbr@chromium.org, Jul 19 2017

Alec, please help us by reducing this test. Thanks.

Project Member

Comment 30 by bugdroid1@chromium.org, Jul 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e73f88b6ff12e7b610e62e23780ac08cbb441fc9

commit e73f88b6ff12e7b610e62e23780ac08cbb441fc9
Author: Geoff Lang <geofflang@chromium.org>
Date: Thu Jul 20 17:55:52 2017

Skip WebGL texture-switch-performance on debug_x64 as well as debug.

Add a 'debug_x64' and 'release_x64' build type for the Windows bots.

BUG=735483

Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: I71adf860a39fb14600c044dca9aeb8bd3bfb33fd
Reviewed-on: https://chromium-review.googlesource.com/579574
Commit-Queue: Geoff Lang <geofflang@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#488302}
[modify] https://crrev.com/e73f88b6ff12e7b610e62e23780ac08cbb441fc9/content/test/gpu/gpu_tests/test_expectations.py
[modify] https://crrev.com/e73f88b6ff12e7b610e62e23780ac08cbb441fc9/content/test/gpu/gpu_tests/webgl_conformance_expectations.py

Project Member

Comment 32 by bugdroid1@chromium.org, Jul 24 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/65ef72c3a4b993acb333666b007ceea095b98aa4

commit 65ef72c3a4b993acb333666b007ceea095b98aa4
Author: Kenneth Russell <kbr@chromium.org>
Date: Mon Jul 24 21:51:57 2017

Skip texture-switch-performance test everywhere.

It needs to be rewritten to measure its expected performance. It's
currently too flaky.

BUG=735483
TBR=zmo@chromium.org
NOTRY=true

Change-Id: I23b5e7eca97a29700caeb9fe0dcffc6d664f4766
Reviewed-on: https://chromium-review.googlesource.com/583893
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#489088}
[modify] https://crrev.com/65ef72c3a4b993acb333666b007ceea095b98aa4/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
[modify] https://crrev.com/65ef72c3a4b993acb333666b007ceea095b98aa4/content/test/gpu/gpu_tests/webgl_conformance_expectations.py

Comment 33 by tshe...@gmail.com, Aug 7 2017

I've updated the original demo (https://tsherif.github.io/picogl.js/examples/cloth.html) to rebind textures rather than update sampler uniforms, and am now getting 60FPS on the affected machine I'm testing. 

So should we consider that a best practice and add a note about it to the spec? I don't mind making the PR if someone can point me to the right file to update.

Comment 34 by kbr@chromium.org, Aug 8 2017

I think it's worth a non-normative note in the spec. Could you upload a pull request against https://github.com/KhronosGroup/WebGL/blob/master/specs/latest/1.0/index.html  adding a paragraph in section 5.14.10 "Uniforms and attributes", after the descriptions for the "uniform[1234][fi]" and related entry points?

Mention that performance issues have been seen when updating sampler uniforms, and that applications should prefer bindTexture over uniform1i* if they desire to change which texture is being drawn with a given shader.

Thanks.

Comment 35 by kbr@chromium.org, Aug 8 2017

Summary: Poor performance rapidly updating sampler uniforms in WebGL on AMD Macbooks (was: Poor performance rapidly switching textures in WebGL on AMD Macbooks)
Blocking: 662644
Labels: webgl-conformance

Sign in to add a comment