Null-dereference READ in gl::GLFence::IsSupported |
|||||||||
Issue descriptionDetailed report: https://clusterfuzz.com/testcase?key=5679417882574848 Fuzzer: phoglund_webrtc_peerconnection Job Type: windows_asan_chrome_no_sandbox Platform Id: windows Crash Type: Null-dereference READ Crash Address: 0x000000000e66 Crash State: gl::GLFence::IsSupported gpu::gles2::FeatureInfo::InitializeFeatures gpu::gles2::ContextGroup::Initialize Sanitizer: address (ASAN) Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=5679417882574848 Additional requirements: Requires Gestures Additional requirements: Requires HTTP Issue filed automatically. See https://github.com/google/clusterfuzz-tools for more information.
,
Oct 19
Automatically applying components based on crash stacktrace and information from OWNERS files. If this is incorrect, please apply the Test-Predator-Wrong-Components label.
,
Oct 19
backer@ could you take a look? It looks like an invalid initialization path for RasterDecoders.
,
Oct 22
,
Oct 22
Victor: I think that I'm going to need to repro locally. I glanced at the code and the stack and can't think of why we would be dereferencing a nullptr there. Is there a way to repro on linux?
,
Oct 22
I followed the command line instructions to try to convert this to different bot (linux), but it doesn't repro there. https://clusterfuzz.com/v2/testcase-detail/5769176734760960
,
Oct 22
In RasterCommandBufferStub::Initialize, we check at line 172 if we can make the context current. I would expect this to BindGLApi()(https://cs.chromium.org/chromium/src/ui/gl/gl_context_wgl.cc?rcl=193fde1762f4f0d0d925c0f109f3f536aae0c7e9&l=108) and set the TLS pointers that are reporting nullptr in this stack trace. This will happen before the RasterDecoder initialization on line 182.
,
Oct 22
Do you have a Windows machine to reproduce? If not, I could see if I am able to repro on my Windows bot.
,
Oct 22
Thanks Mo. I don't currently have a Windows machine. I procured one a while ago, but I haven't yet set it up. If this is urgent, I'd appreciate if you could try reproing it.
,
Oct 22
OK, I'll take this bug and will try to repro. But if I can't repro on my Windows machine, I am sending it back to you. Fair?
,
Oct 22
SGTM. Thanks.
,
Nov 1
Friendly ping for an update on this.
,
Nov 1
Looking at the log: [3712:1532:1101/011050.551:ERROR:angle_platform_impl.cc(47)] reset(614): Could not create additional swap chains or offscreen surfaces, HRESULT: 0x887A0022 [3712:1532:1101/011050.552:ERROR:gl_surface_egl.cc(537)] EGL Driver message (Critical) eglCreateWindowSurface: Context lost. [3712:1532:1101/011050.552:ERROR:gl_surface_egl.cc(1057)] eglCreateWindowSurface failed with error EGL_CONTEXT_LOST [3712:1532:1101/011050.553:ERROR:gles2_command_buffer_stub.cc(237)] ContextResult::kSurfaceFailure: Failed to create surface. [3712:1532:1101/011050.668:ERROR:gl_surface_egl.cc(537)] EGL Driver message (Critical) eglMakeCurrent: Context lost. [4768:2952:1101/011050.670:ERROR:gpu_process_transport_factory.cc(967)] Lost UI shared context. So this is where we initially fail, likely because that bot doesn't have D3D drivers. This somehow triggers a chain reaction leading to the bug. Of course we won't be able to reproduce on a regular Windows desktop. Let me try to instruct the code to fail similarly and see if we could reproduce. Regardless, I don't think this should block Beta (I am still on it right now).
,
Nov 2
Further down the log, we see the following: [8052:4088:1018/050525.703:ERROR:gl_bindings_autogen_gl.cc(13953)] Trying to call glGetIntegerv without current GL context [8052:4088:1018/050525.703:ERROR:gl_bindings_autogen_gl.cc(13953)] Trying to call glGetString without current GL context [8052:4088:1018/050525.703:ERROR:gl_bindings_autogen_gl.cc(13953)] Trying to call glGetString without current GL context [8052:4088:1018/050525.703:ERROR:gl_bindings_autogen_gl.cc(13953)] Trying to call glGetString without current GL context [8052:4088:1018/050525.703:ERROR:gl_bindings_autogen_gl.cc(13953)] Trying to call glGetString without current GL context So of course when we try to initialize RasterDecoder, we are in a bad shape. What confuses me is why RasterDecoder successfully MakeCurrent. Also, why we didn't fail more graciously sooner and fallback to SwiftShader. Anyway, this bot should run with --use-gl=swiftshader instead.
,
Nov 2
Adding a few folks. inferno@: do you know what kind of win bots we run this fuzzer? Are they VMs without D3D drivers? phoglund@: we should pass in --use-gl=swiftshader for these fuzzer tests, so they don't depend on graphics drivers on the bots or the lack of them. geofflang@: the initial failure seems to be ANGLE fails to create a swap chain due to device lost. What should be the right action from such failure? Right now GPU process seems to just go on, which resulted in later disasters.
,
Nov 2
piman@: this is the bug I am looking at
,
Nov 2
ClusterFuzz has detected this issue as fixed in range 604604:604613. Detailed report: https://clusterfuzz.com/testcase?key=5679417882574848 Fuzzer: phoglund_webrtc_peerconnection Job Type: windows_asan_chrome_no_sandbox Platform Id: windows Crash Type: Null-dereference READ Crash Address: 0x000000000e66 Crash State: gl::GLFence::IsSupported gpu::gles2::FeatureInfo::InitializeFeatures gpu::gles2::ContextGroup::Initialize Sanitizer: address (ASAN) Fixed: https://clusterfuzz.com/revisions?job=windows_asan_chrome_no_sandbox&range=604604:604613 Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=5679417882574848 Additional requirements: Requires Gestures Additional requirements: Requires HTTP See https://github.com/google/clusterfuzz-tools for more information. If you suspect that the result above is incorrect, try re-doing that job on the test case report page.
,
Nov 2
ClusterFuzz testcase 5679417882574848 is verified as fixed, so closing issue as verified. If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.
,
Nov 2
The error that we're seeing is DXGI_ERROR_NOT_CURRENTLY_AVAILABLE (0x887A0022) from swap chain creation, I'm not sure what to make of that, it has the very generic description of "A resource is not available at the time of the call, but may become available later". If we got this far, it means that we successfully created a D3D device but it's pretty much a no-go if we can't create Window surfaces. I've seen the D3D drivers on VMs throw weird errors like this when they don't implement parts of the driver so that may be it.
,
Nov 2
Why Chrome still works without successful swap chain creation is documented and explained at https://bugs.chromium.org/p/angleproject/issues/detail?id=2949. Regardless of this bug disappearing (which I don't really understand), I still think we should pass in --use-gl=swiftshader in the first place. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by ClusterFuzz
, Oct 19