bug GLSL: loop & smooth(0,0) & func vs macro
Reported by
fabrice....@gmail.com,
Jan 9
|
|||||
Issue descriptionUserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36 Example URL: https://www.shadertoy.com/view/WdX3zX Steps to reproduce the problem: see acid test in URL. A loop calling smoothstep either as macro or function, which first iteration is smoothstep(0,0). What is the expected behavior? on the acid test, either always a disk ( smoothstep(dist) ) or always full painting. but consistantly, independently of either the macro or func is called, or the number of iterations. What went wrong? On OpenGL (Windows+true OpenGL, or linux) + nVidia: - the macro (bottom) does not react like the function (top) - for different values of N, loop(function) reacts differently. See screenshots at the bottom of the link and below: img1: result on linux for N=5. img2: result on linux for N=3. correct result should be a disk, half red/green, as it is with windows/Angle. Does it occur on multiple sites: Yes Is it a problem with a plugin? No Did this work before? N/A Does this work in other browsers? No firefox Chrome version: 71.0.3578.98 Channel: stable OS Version: 16.04 Flash Version: May somebody please set the flags Blink>WebGL Internals>GPU>ANGLE ? BTW it's likely to be a nvidia driver bug. I know smoothstep(0,0) is undetermined by spec, but inconsistant behavior between macro vs func, or with N, probably hide some compilation bug. (Sorry I couldn't get the acid test more minimal.)
,
Jan 9
,
Jan 9
,
Jan 9
Kimmo, do you think you could triage this report? Thanks.
,
Jan 9
WEBGL_debug_shaders can reveal the difference here (I paused in piLibs.js in me.CreateShader and used devtools console to getTranslatedShaderSource)
if ((webgl_2299af5c39f80f15.y > 0.0))
{
for (float webgl_6fdd29f02130ae3a = 0.0; (webgl_6fdd29f02130ae3a < 5.0); (webgl_6fdd29f02130ae3a++))
{
(webgl_298e35cd02d72796.x += webgl_72ed5b26b72d146a(webgl_2299af5c39f80f15, (0.029999999 * webgl_6fdd29f02130ae3a), (0.003 * webgl_6fdd29f02130ae3a)));
}
}
else
{
for (float webgl_6fdd29f02130ae3a = 0.0; (webgl_6fdd29f02130ae3a < 5.0); (webgl_6fdd29f02130ae3a++))
{
(webgl_298e35cd02d72796.y += smoothstep((0.029999999 * webgl_6fdd29f02130ae3a), ((0.029999999 * webgl_6fdd29f02130ae3a) - (0.003 * webgl_6fdd29f02130ae3a)), length(webgl_2299af5c39f80f15)));
}
}
In the function, r is evaluated once, while in the macro, r is evaluated twice:
r = .03 * i
smoothstep(r, r - 0.003 * i, length(U))
vs
smoothstep(.03 * i, .03 * i - 0.003 * i, length(U))
And specifically, the second arg to smoothstep changes form.
Although this doesn't mathematically make a difference, it probably affects precision or optimizations.
I think there are 2 optimizations being affected here: constant folding (since it depends on the form of the expression) and loop unrolling (since it depends on the size of the loop).
If the loop IS unrolled, there is an expression that looks like this:
smoothstep(.03 * 0., .03 * 0. - .003 * 0., length(U))
which const folds into
smoothstep(0, 0, length(U))
which might further const fold into a different value than it would get if the GPU actually executed smoothstep(0, 0, length(U)).
(And, by the way, I'm pretty sure that ALL of the smoothsteps in this program are undef - manpage says for smoothstep(edge0, edge1, x), "Results are undefined if edge0 ≥ edge1.")
I think that if either the loop is not unrolled, or if the const folding is defeated (by having an extra variable), different cases are hit.
Try this shadertoy: https://www.shadertoy.com/view/Wss3zf
---
Anyway, I don't think it's that feasible to detect this scenario in ANGLE's shader translator. We would need to somehow prevent undefined smoothstep cases from being resolved in different ways, which means either:
- having our own smoothstep implementation that is defined for all cases (not sure how bad of an idea this is; it could impact performance), or
- somehow preventing the driver from const folding smooth step (definitely a bad idea because there's no guaranteed way to do this).
So I'm inclined to say this is a wontfix, especially since it's hitting documented undefined behavior.
,
Jan 9
great analysis ! I always thought macros and functions would end-up in exactly the same code. NB: on my systems (linux / nVidia / chrome ) I never saw a loop not unrolled (but when it was not unrollable), so it would be the other option. About your shadertoy, I get different results for the manually unrolled version vs the rolled one:
,
Jan 9
Yeah, those are the same results I had (sorry, forgot to include a screenshot). A function may be inlined (the code pasted into the caller), but on some hardware may actually result in a "stack push", while a macro will always be inlined. Also, GLSL macros behave like C macros - they substitute expressions directly rather than evaluating and passing as data. Macro substition rules has significant consequences (esp in C) when a macro argument has side effects, e.g.: #define twice(x) ((x) + (x)) twice(x++) expands to (x++) + (x++) and x gets incremented twice. And order of operations can be messed up too, e.g. #define double(x) (2 * x) double(1 + 3) expands to 2 * 1 + 3 not 2 * (1 + 3) . The macro makes a difference in this case probably because using the macro rather than the function results in a very easily constant-foldable expression (.03 * 0. - .003 * 0. can be reduced to 0.) I'm not 100% certain that unrolling has something to do with it, but I am pretty certain that const folding does. But that's totally valid - because this is undefined, the compiler and hardware can both output any value for undefined smoothstep calls, so it makes sense they may not match.
,
Jan 10
I'll ask internally if we could enable more consistent constant folding.. Filed a ANGLE bug to query if this kind of issues could be caught with UBSAN type-tool in ANGLE that would have Chrome inspector UI: https://bugs.chromium.org/p/angleproject/issues/detail?id=3064 |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by fabrice....@gmail.com
, Jan 9