New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 897076 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Nov 2
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug



Sign in to add a comment

Null-dereference READ in gl::GLFence::IsSupported

Project Member Reported by ClusterFuzz, Oct 19

Issue description

Detailed report: https://clusterfuzz.com/testcase?key=5679417882574848

Fuzzer: phoglund_webrtc_peerconnection
Job Type: windows_asan_chrome_no_sandbox
Platform Id: windows

Crash Type: Null-dereference READ
Crash Address: 0x000000000e66
Crash State:
  gl::GLFence::IsSupported
  gpu::gles2::FeatureInfo::InitializeFeatures
  gpu::gles2::ContextGroup::Initialize
  
Sanitizer: address (ASAN)

Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=5679417882574848

Additional requirements: Requires Gestures

Additional requirements: Requires HTTP

Issue filed automatically.

See https://github.com/google/clusterfuzz-tools for more information.
 
Project Member

Comment 1 by ClusterFuzz, Oct 19

Labels: Fuzz-Blocker ReleaseBlock-Beta M-72
This crash occurs very frequently on windows platform and is likely preventing the fuzzer phoglund_webrtc_peerconnection from making much progress. Fixing this will allow more bugs to be found.

Marking this bug as a blocker for next Beta release.

If this is incorrect, please add ClusterFuzz-Wrong label and remove the ReleaseBlock-Beta label.
Project Member

Comment 2 by ClusterFuzz, Oct 19

Components: Internals>GPU>Internals
Labels: Test-Predator-Auto-Components
Automatically applying components based on crash stacktrace and information from OWNERS files.

If this is incorrect, please apply the Test-Predator-Wrong-Components label.
Cc: zmo@chromium.org
Owner: backer@chromium.org
Status: Assigned (was: Untriaged)
backer@ could you take a look?  It looks like an invalid initialization path for RasterDecoders.
Status: Started (was: Assigned)
Victor: I think that I'm going to need to repro locally. I glanced at the code and the stack and can't think of why we would be dereferencing a nullptr there.

Is there a way to repro on linux?
I followed the command line instructions to try to convert this to different bot (linux), but it doesn't repro there.

https://clusterfuzz.com/v2/testcase-detail/5769176734760960
In RasterCommandBufferStub::Initialize, we check at line 172 if we can make the context current. I would expect this to BindGLApi()(https://cs.chromium.org/chromium/src/ui/gl/gl_context_wgl.cc?rcl=193fde1762f4f0d0d925c0f109f3f536aae0c7e9&l=108) and set the TLS pointers that are reporting nullptr in this stack trace.

This will happen before the RasterDecoder initialization on line 182.
Do you have a Windows machine to reproduce? If not, I could see if I am able to repro on my Windows bot.
Thanks Mo. I don't currently have a Windows machine. I procured one a while ago, but I haven't yet set it up. If this is urgent, I'd appreciate if you could try reproing it.
Cc: -zmo@chromium.org backer@chromium.org
Owner: zmo@chromium.org
OK, I'll take this bug and will try to repro. But if I can't repro on my Windows machine, I am sending it back to you. Fair?
SGTM. Thanks.
Friendly ping for an update on this.
Labels: -ReleaseBlock-Beta
Looking at the log:

[3712:1532:1101/011050.551:ERROR:angle_platform_impl.cc(47)] reset(614): Could not create additional swap chains or offscreen surfaces, HRESULT: 0x887A0022
[3712:1532:1101/011050.552:ERROR:gl_surface_egl.cc(537)] EGL Driver message (Critical) eglCreateWindowSurface: Context lost.
[3712:1532:1101/011050.552:ERROR:gl_surface_egl.cc(1057)] eglCreateWindowSurface failed with error EGL_CONTEXT_LOST
[3712:1532:1101/011050.553:ERROR:gles2_command_buffer_stub.cc(237)] ContextResult::kSurfaceFailure: Failed to create surface.
[3712:1532:1101/011050.668:ERROR:gl_surface_egl.cc(537)] EGL Driver message (Critical) eglMakeCurrent: Context lost.
[4768:2952:1101/011050.670:ERROR:gpu_process_transport_factory.cc(967)] Lost UI shared context.

So this is where we initially fail, likely because that bot doesn't have D3D drivers. This somehow triggers a chain reaction leading to the bug. Of course we won't be able to reproduce on a regular Windows desktop. Let me try to instruct the code to fail similarly and see if we could reproduce.

Regardless, I don't think this should block Beta (I am still on it right now).
Cc: geoffl...@chromium.org phoglund@chromium.org infe...@chromium.org
Further down the log, we see the following:

[8052:4088:1018/050525.703:ERROR:gl_bindings_autogen_gl.cc(13953)] Trying to call glGetIntegerv without current GL context
[8052:4088:1018/050525.703:ERROR:gl_bindings_autogen_gl.cc(13953)] Trying to call glGetString without current GL context
[8052:4088:1018/050525.703:ERROR:gl_bindings_autogen_gl.cc(13953)] Trying to call glGetString without current GL context
[8052:4088:1018/050525.703:ERROR:gl_bindings_autogen_gl.cc(13953)] Trying to call glGetString without current GL context
[8052:4088:1018/050525.703:ERROR:gl_bindings_autogen_gl.cc(13953)] Trying to call glGetString without current GL context

So of course when we try to initialize RasterDecoder, we are in a bad shape.

What confuses me is why RasterDecoder successfully MakeCurrent. Also, why we didn't fail more graciously sooner and fallback to SwiftShader.

Anyway, this bot should run with --use-gl=swiftshader instead.
Adding a few folks.

inferno@: do you know what kind of win bots we run this fuzzer? Are they VMs without D3D drivers?

phoglund@: we should pass in --use-gl=swiftshader for these fuzzer tests, so they don't depend on graphics drivers on the bots or the lack of them.

geofflang@: the initial failure seems to be ANGLE fails to create a swap chain due to device lost. What should be the right action from such failure? Right now GPU process seems to just go on, which resulted in later disasters.
Cc: piman@chromium.org
piman@: this is the bug I am looking at
Project Member

Comment 17 by ClusterFuzz, Nov 2

ClusterFuzz has detected this issue as fixed in range 604604:604613.

Detailed report: https://clusterfuzz.com/testcase?key=5679417882574848

Fuzzer: phoglund_webrtc_peerconnection
Job Type: windows_asan_chrome_no_sandbox
Platform Id: windows

Crash Type: Null-dereference READ
Crash Address: 0x000000000e66
Crash State:
  gl::GLFence::IsSupported
  gpu::gles2::FeatureInfo::InitializeFeatures
  gpu::gles2::ContextGroup::Initialize
  
Sanitizer: address (ASAN)

Fixed: https://clusterfuzz.com/revisions?job=windows_asan_chrome_no_sandbox&range=604604:604613

Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=5679417882574848

Additional requirements: Requires Gestures

Additional requirements: Requires HTTP

See https://github.com/google/clusterfuzz-tools for more information.

If you suspect that the result above is incorrect, try re-doing that job on the test case report page.
Project Member

Comment 18 by ClusterFuzz, Nov 2

Labels: ClusterFuzz-Verified
Status: Verified (was: Started)
ClusterFuzz testcase 5679417882574848 is verified as fixed, so closing issue as verified.

If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.
The error that we're seeing is DXGI_ERROR_NOT_CURRENTLY_AVAILABLE (0x887A0022) from swap chain creation, I'm not sure what to make of that, it has the very generic description of "A resource is not available at the time of the call, but may become available later".

If we got this far, it means that we successfully created a D3D device but it's pretty much a no-go if we can't create Window surfaces.  I've seen the D3D drivers on VMs throw weird errors like this when they don't implement parts of the driver so that may be it.
Why Chrome still works without successful swap chain creation is documented and explained at https://bugs.chromium.org/p/angleproject/issues/detail?id=2949.

Regardless of this bug disappearing (which I don't really understand), I still think we should pass in --use-gl=swiftshader in the first place.

Sign in to add a comment