New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 861882 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: Jul 13
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 0
Type: Bug

Blocked on:
issue 819481
issue 824522



Sign in to add a comment

Abrt in sw::FrameBufferX11::~FrameBufferX11

Project Member Reported by ClusterFuzz, Jul 9

Issue description

Detailed report: https://clusterfuzz.com/testcase?key=4671324799369216

Fuzzer: inferno_sampler
Job Type: linux_asan_chrome_media
Platform Id: linux

Crash Type: Abrt
Crash Address: 0x053900004193
Crash State:
  sw::FrameBufferX11::~FrameBufferX11
  egl::DestroySurface
  gl::NativeViewGLSurfaceEGLX11::~NativeViewGLSurfaceEGLX11
  
Sanitizer: address (ASAN)

Regressed: https://clusterfuzz.com/revisions?job=linux_asan_chrome_media&range=573289:573292

Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=4671324799369216

Issue filed automatically.

See https://github.com/google/clusterfuzz-tools for more information.
 
Project Member

Comment 1 by ClusterFuzz, Jul 9

Labels: Fuzz-Blocker ReleaseBlock-Beta M-69
This crash occurs very frequently on linux platform and is likely preventing the fuzzer inferno_sampler from making much progress. Fixing this will allow more bugs to be found.

Marking this bug as a blocker for next Beta release.

If this is incorrect, please add ClusterFuzz-Wrong label and remove the ReleaseBlock-Beta label.
Project Member

Comment 2 by ClusterFuzz, Jul 9

Components: Internals>GPU>Internals Internals>GPU>SwiftShader
Labels: Test-Predator-Auto-Components
Automatically applying components based on crash stacktrace and information from OWNERS files.

If this is incorrect, please apply the Test-Predator-Wrong-Components label.
Project Member

Comment 3 by ClusterFuzz, Jul 9

Cc: capn@google.com
Labels: Test-Predator-Auto-CC
Automatically adding ccs based on suspected regression changelists:

Harden against X11 instability. by capn@google.com - https://swiftshader.googlesource.com/SwiftShader/+/fbba4900f68b12171277175980df2e52517cb6b6

If this is incorrect, please let us know why and apply the Test-Predator-Wrong-CLs label.
Blockedon: 824522
Cc: kylec...@chromium.org sadrul@chromium.org piman@chromium.org
Components: -Internals>GPU>SwiftShader
This is not a SwiftShader issue. It merely detects that the X11 window was destroyed before the EGL surface that uses it.
Cc: kkaluri@chromium.org
Labels: Test-Predator-Wrong-CLs CF-NeedsTriage
Unable to find actual suspect through code search and also observing no CL's under regression range, hence adding appropriate label and requesting someone from dev team to look in to this issue.

Thanks!
The root cause is probably very old. I've seen X11 related crashes in SwiftShader since it was first integrated. But again, it's not SwiftShader's fault; the X11 window lifetime is controlled by Chromium.
M69 branch is coming VERY soon on July 19th, Your bug is marked as ReleaseBlock-Beta for M69. Please try to land the fix ASAP to trunk in order to prevent many merges going after M69 branch. This will also help us to branch M69 from high quality trunk. Thank you.

Would someone be able to take a look at this issue? This is happening 500k+ times a day on ClusterFuzz (started about 2 days ago) and is blocking our fuzzing. 

A while back we had a similar issue which was fixed in  https://bugs.chromium.org/p/chromium/issues/detail?id=819481 
Labels: -Pri-1 Pri-0
Owner: capn@chromium.org
Status: Assigned (was: Untriaged)
capn@, can you please add a workaround so that it does not crash. this is a very bad blocker for our fuzzing.
Blockedon: 819481
Cc: capn@chromium.org
Owner: piman@chromium.org
Talked to capn@. This is a chrome side issue, swiftshader is just validating that it has a valid window.

piman@, can you please take a look, repro is reliable on clusterfuzz (see command line in stacktrace).
inferno@, sorry for the trouble, but I don't think adding a workaround in SwiftShader is the right solution. The EGL spec states that:

"EGL normally checks the validity of objects passed into it, but detecting invalid native objects (pixmaps, windows, and displays) may not always be possible. Specifying such invalid handles may result in undefined behavior"

So as far as I can tell SwiftShader is spec compliant. This undefined behavior may also occur in GPU drivers, and strike us at any time or may already be the reason for some of Chromium's stability issues.

The check in SwiftShader is trying to be helpful. It already enabled me to fix  Issue 824522 . Let's continue to hunt down similar bugs.
Issue 819481 has been merged into this issue.
Cc: ifratric@google.com
 Issue 861895  has been merged into this issue.
 Issue 862497  has been merged into this issue.
 Issue 862481  has been merged into this issue.
 Issue 861986  has been merged into this issue.
Owner: sugoi@chromium.org
I'm ooo today, this sounds urgent.
->sugoi who wired up SwiftShader in Chrome
I should note, I don't think the SwiftShader behavior is correct, X11 failure should not cause undefined behavior. Several things outside of the control of Chrome can cause XGetWindowAttributes to fail, be it the X connection was closed (e.g. the user closed the session while Chrome is running) or the window manager destroyed the X window, e.g. closed by the user (it's a perfectly valid behavior).
If Chrome is doing something bad, we should fix it, but it very much sounds to me that we should remove this abort and gracefully fail in SwiftShader.
Cc: sugoi@chromium.org
Owner: capn@chromium.org
Reassigning to revert change in swiftshader based on c#20.
Cc: -sugoi@chromium.org
sugoi@ is on vacation, and probably doesn't have much knowledge about Chromium's X11 window management either.

"X11 failure should not cause undefined behavior"

I'm sorry but the EGL spec literally states the opposite (see #13). If Chromium doesn't want to be exposed to undefined behavior (clearly the case), then it has the responsibility to protect against it, just like it protects against other undefined behavior in the OpenGL and EGL specs.

This shouldn't be particularly hard for people familiar with the code base. Just check if the window is still valid before making any EGL calls that might use it (primarily eglSwapBuffers), and fail gracefully if it's not.

I might be able to take a stab at that if no one else takes ownership, but I don't have a working Chromium checkout on Linux at the moment. Maybe tomorrow.
It is meaningless to test for whether the window is still valid before making EGL calls, as the window can become invalid (destroyed by the X server / Window manager) between that time and when the EGL implementation accesses it.
However this is not even something we should do as a round-trip to X would make performance tank.
Other drivers don't have this problem. X errors may be raised. We deal with them gracefully.
Status: Started (was: Assigned)
Thanks for the feedback! I agree that Chrome testing if the window is valid would still leave a chance of a race condition if the window can become invalid outside of Chrome's control right after. However, note that in this particular ClusterFuzz report the abort is hit not during eglSwapBuffers but at eglDestroySurface. That points at another destruction order issue, like  Issue 824522 , where Chrome was quite obviously to blame.

It's not clear to me that an X window can even become invalid outside of Chrome's control. As far as I can tell from reading the libX documentation, a window is a 'client resource' and it's destroyed at XDestroyWindow. The docs also state that XDestroyWindow can generate a BadWindow error, iff the "Window argument does not name a defined Window". Can it become undefined through external events? Likewise, wouldn't XGetWindowAttributes just retrieve client-side information, and thus not require a round-trip to the server and hence be impervious to disconnects outside of Chrome's control?

Happy to be proven wrong about all this. I just want to make sure Chrome is not still experiencing robustness issues due to bugs similar to  Issue 824522 .

"Other drivers don't have this problem."

Just out of curiosity, which other drivers are we running on ClusterFuzz Linux bots with GPUs? I only noticed Windows bots with GPUs, but I might not be looking for the right names. For what it's worth the waterfall build bots on Linux with GPUs don't seem to have a problem with SwiftShader doing a window validation check. So maybe it's an issue specific to the ClusterFuzz setup?

"X errors may be raised. We deal with them gracefully."

Am I interpreting correctly that you're saying Chrome was already catching X errors and dealing with them? I'd like to know if that means destroying the window and then calling eglDestroySurface, or the reverse.

I'll try to look into that today.
XDestroyWindow can be called by any client (not only the one that created the window), including (and as a primary use case) the WM. Windows can also be destroyed (server-side) if the parent window is destroyed. The window destruction happens on the server, after which requests that reference the destroyed window will receive BadWindow errors. All XDestroyWindow does is send the DestroyWindow request to the server (see https://github.com/mirror/libX11/blob/master/src/DestWind.c#L33).

XGetWindowAttributes does round-trip to the server, where the attributes are looked up and sent back to the client. They are not stored client-side. See https://github.com/mirror/libX11/blob/master/src/GetWAttrs.c#L89. If the Window is invalid by the time the request is received by the X server, it will generate an X error (BadWindow), which will make the XGetWindowAttributes return failure. 

All libX11 does is essentially sending requests and receiving responses to/from the X server. XIDs (including Windows) are just references to server-side resources. You generally can't reason about their state just by tracking things client-side as any X client can operate on any resource asynchronously from that client.


Again, it's very possible that Chrome does something bad, and we should fix that, but even destroying a Window too early shouldn't cause anything beyond X errors, and our software must deal with that gracefully since it can happen independently of our control.
ClusterFuzz linux bots don't have gpus, they run with --use-gl=swiftshader. They are running in a docker image, so i suspect maybe xvfb/blackbox window manager acts sometime funky.
Got it, thanks! I'll remove the window validation check for now. It's unfortunate that X11 doesn't adhere to RAII principles and throws internal and external causes of errors onto the same pile. Anyway, SwiftShader shouldn't aggravate that by failing hard.
Project Member

Comment 28 by bugdroid1@chromium.org, Jul 13

The following revision refers to this bug:
  https://swiftshader.googlesource.com/SwiftShader.git/+/8fb6f6a129f5ed809db7ba3a8bc862dd5e7bc75d

commit 8fb6f6a129f5ed809db7ba3a8bc862dd5e7bc75d
Author: Nicolas Capens <capn@google.com>
Date: Fri Jul 13 18:18:05 2018

Remove X11 window validation.

Despite being a 'client resource', the window can become invalid due to
events outside of the client code's control, which causes
XGetWindowAttributes to fail because it retrieves servers-side data
that is no longer available. Hence it is something we should expect to
see happen, and not (always) an indication of a bug that needs fixing.
Also, we should be able to safely continue with an invalid window.

At this point it's up to the client code to catch the X error and
handle it appropriately. The EGL spec does not indicate that it should
catch it instead and generate an error (eglSwapBuffers can generate
EGL_CONTEXT_LOST but that's reserved for power management events).

 Bug chromium:861882 
 Bug chromium:824522 

Change-Id: I78a364516b9466f652c94de68553369935590bde
Reviewed-on: https://swiftshader-review.googlesource.com/19868
Reviewed-by: Antoine Labour <piman@chromium.org>
Reviewed-by: Nicolas Capens <nicolascapens@google.com>
Tested-by: Nicolas Capens <nicolascapens@google.com>

[modify] https://crrev.com/8fb6f6a129f5ed809db7ba3a8bc862dd5e7bc75d/src/Main/FrameBufferX11.cpp
[modify] https://crrev.com/8fb6f6a129f5ed809db7ba3a8bc862dd5e7bc75d/src/Main/FrameBufferX11.hpp

inferno@, thanks for the info. Yes I do suspect there's still something going on that isn't quite supposed to happen. Despite the window lifetime not being entirely under Chrome's control, in theory we should have enough control over the test environment to ensure things run without losing the window. Anyway, it's clear to me now that SwiftShader is supposed to be able to merrily go on without a valid window.

Sorry again for the trouble this caused. Should be fixed soon.
Project Member

Comment 30 by bugdroid1@chromium.org, Jul 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ad06a642551a4913c4d8a7e7c8d0f0a47edb7d07

commit ad06a642551a4913c4d8a7e7c8d0f0a47edb7d07
Author: Nicolas Capens <capn@chromium.org>
Date: Fri Jul 13 21:21:16 2018

Roll SwiftShader 6a990f8..8fb6f6a

https://swiftshader.googlesource.com/SwiftShader.git/+log/6a990f8..8fb6f6a

BUG= 861882 

TEST=bots

CQ_INCLUDE_TRYBOTS=luci.chromium.try:win_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_chromium_cfi_rel_ng;luci.chromium.try:android_optional_gpu_tests_rel

Change-Id: I5a89accaec0bd01697ca9c4f0e4a6ac62de23e54
Reviewed-on: https://chromium-review.googlesource.com/1136867
Reviewed-by: Antoine Labour <piman@chromium.org>
Reviewed-by: Abhishek Arya <inferno@chromium.org>
Commit-Queue: Nicolas Capens <capn@chromium.org>
Cr-Commit-Position: refs/heads/master@{#575073}
[modify] https://crrev.com/ad06a642551a4913c4d8a7e7c8d0f0a47edb7d07/DEPS

Project Member

Comment 31 by ClusterFuzz, Jul 13

ClusterFuzz has detected this issue as fixed in range 575068:575074.

Detailed report: https://clusterfuzz.com/testcase?key=4671324799369216

Fuzzer: inferno_sampler
Job Type: linux_asan_chrome_media
Platform Id: linux

Crash Type: Abrt
Crash Address: 0x053900004193
Crash State:
  sw::FrameBufferX11::~FrameBufferX11
  egl::DestroySurface
  gl::NativeViewGLSurfaceEGLX11::~NativeViewGLSurfaceEGLX11
  
Sanitizer: address (ASAN)

Regressed: https://clusterfuzz.com/revisions?job=linux_asan_chrome_media&range=573289:573292
Fixed: https://clusterfuzz.com/revisions?job=linux_asan_chrome_media&range=575068:575074

Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=4671324799369216

See https://github.com/google/clusterfuzz-tools for more information.

If you suspect that the result above is incorrect, try re-doing that job on the test case report page.
Project Member

Comment 32 by ClusterFuzz, Jul 13

Labels: ClusterFuzz-Verified
Status: Verified (was: Started)
ClusterFuzz testcase 4671324799369216 is verified as fixed, so closing issue as verified.

If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.

Sign in to add a comment