Abrt in sw::FrameBufferX11::~FrameBufferX11 |
|||||||||||||
Issue descriptionDetailed report: https://clusterfuzz.com/testcase?key=4671324799369216 Fuzzer: inferno_sampler Job Type: linux_asan_chrome_media Platform Id: linux Crash Type: Abrt Crash Address: 0x053900004193 Crash State: sw::FrameBufferX11::~FrameBufferX11 egl::DestroySurface gl::NativeViewGLSurfaceEGLX11::~NativeViewGLSurfaceEGLX11 Sanitizer: address (ASAN) Regressed: https://clusterfuzz.com/revisions?job=linux_asan_chrome_media&range=573289:573292 Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=4671324799369216 Issue filed automatically. See https://github.com/google/clusterfuzz-tools for more information.
,
Jul 9
Automatically applying components based on crash stacktrace and information from OWNERS files. If this is incorrect, please apply the Test-Predator-Wrong-Components label.
,
Jul 9
Automatically adding ccs based on suspected regression changelists: Harden against X11 instability. by capn@google.com - https://swiftshader.googlesource.com/SwiftShader/+/fbba4900f68b12171277175980df2e52517cb6b6 If this is incorrect, please let us know why and apply the Test-Predator-Wrong-CLs label.
,
Jul 9
This is not a SwiftShader issue. It merely detects that the X11 window was destroyed before the EGL surface that uses it.
,
Jul 10
Unable to find actual suspect through code search and also observing no CL's under regression range, hence adding appropriate label and requesting someone from dev team to look in to this issue. Thanks!
,
Jul 10
The root cause is probably very old. I've seen X11 related crashes in SwiftShader since it was first integrated. But again, it's not SwiftShader's fault; the X11 window lifetime is controlled by Chromium.
,
Jul 10
M69 branch is coming VERY soon on July 19th, Your bug is marked as ReleaseBlock-Beta for M69. Please try to land the fix ASAP to trunk in order to prevent many merges going after M69 branch. This will also help us to branch M69 from high quality trunk. Thank you.
,
Jul 12
Would someone be able to take a look at this issue? This is happening 500k+ times a day on ClusterFuzz (started about 2 days ago) and is blocking our fuzzing. A while back we had a similar issue which was fixed in https://bugs.chromium.org/p/chromium/issues/detail?id=819481
,
Jul 12
capn@, can you please add a workaround so that it does not crash. this is a very bad blocker for our fuzzing.
,
Jul 12
,
Jul 12
Looks like regression from https://swiftshader.googlesource.com/SwiftShader.git/+/fbba4900f68b12171277175980df2e52517cb6b6 based on range.
,
Jul 12
Talked to capn@. This is a chrome side issue, swiftshader is just validating that it has a valid window. piman@, can you please take a look, repro is reliable on clusterfuzz (see command line in stacktrace).
,
Jul 12
inferno@, sorry for the trouble, but I don't think adding a workaround in SwiftShader is the right solution. The EGL spec states that: "EGL normally checks the validity of objects passed into it, but detecting invalid native objects (pixmaps, windows, and displays) may not always be possible. Specifying such invalid handles may result in undefined behavior" So as far as I can tell SwiftShader is spec compliant. This undefined behavior may also occur in GPU drivers, and strike us at any time or may already be the reason for some of Chromium's stability issues. The check in SwiftShader is trying to be helpful. It already enabled me to fix Issue 824522 . Let's continue to hunt down similar bugs.
,
Jul 12
Issue 819481 has been merged into this issue.
,
Jul 12
,
Jul 12
Issue 862497 has been merged into this issue.
,
Jul 12
Issue 862481 has been merged into this issue.
,
Jul 12
Issue 861986 has been merged into this issue.
,
Jul 12
I'm ooo today, this sounds urgent. ->sugoi who wired up SwiftShader in Chrome
,
Jul 12
I should note, I don't think the SwiftShader behavior is correct, X11 failure should not cause undefined behavior. Several things outside of the control of Chrome can cause XGetWindowAttributes to fail, be it the X connection was closed (e.g. the user closed the session while Chrome is running) or the window manager destroyed the X window, e.g. closed by the user (it's a perfectly valid behavior). If Chrome is doing something bad, we should fix it, but it very much sounds to me that we should remove this abort and gracefully fail in SwiftShader.
,
Jul 12
Reassigning to revert change in swiftshader based on c#20.
,
Jul 12
sugoi@ is on vacation, and probably doesn't have much knowledge about Chromium's X11 window management either. "X11 failure should not cause undefined behavior" I'm sorry but the EGL spec literally states the opposite (see #13). If Chromium doesn't want to be exposed to undefined behavior (clearly the case), then it has the responsibility to protect against it, just like it protects against other undefined behavior in the OpenGL and EGL specs. This shouldn't be particularly hard for people familiar with the code base. Just check if the window is still valid before making any EGL calls that might use it (primarily eglSwapBuffers), and fail gracefully if it's not. I might be able to take a stab at that if no one else takes ownership, but I don't have a working Chromium checkout on Linux at the moment. Maybe tomorrow.
,
Jul 12
It is meaningless to test for whether the window is still valid before making EGL calls, as the window can become invalid (destroyed by the X server / Window manager) between that time and when the EGL implementation accesses it. However this is not even something we should do as a round-trip to X would make performance tank. Other drivers don't have this problem. X errors may be raised. We deal with them gracefully.
,
Jul 13
Thanks for the feedback! I agree that Chrome testing if the window is valid would still leave a chance of a race condition if the window can become invalid outside of Chrome's control right after. However, note that in this particular ClusterFuzz report the abort is hit not during eglSwapBuffers but at eglDestroySurface. That points at another destruction order issue, like Issue 824522 , where Chrome was quite obviously to blame. It's not clear to me that an X window can even become invalid outside of Chrome's control. As far as I can tell from reading the libX documentation, a window is a 'client resource' and it's destroyed at XDestroyWindow. The docs also state that XDestroyWindow can generate a BadWindow error, iff the "Window argument does not name a defined Window". Can it become undefined through external events? Likewise, wouldn't XGetWindowAttributes just retrieve client-side information, and thus not require a round-trip to the server and hence be impervious to disconnects outside of Chrome's control? Happy to be proven wrong about all this. I just want to make sure Chrome is not still experiencing robustness issues due to bugs similar to Issue 824522 . "Other drivers don't have this problem." Just out of curiosity, which other drivers are we running on ClusterFuzz Linux bots with GPUs? I only noticed Windows bots with GPUs, but I might not be looking for the right names. For what it's worth the waterfall build bots on Linux with GPUs don't seem to have a problem with SwiftShader doing a window validation check. So maybe it's an issue specific to the ClusterFuzz setup? "X errors may be raised. We deal with them gracefully." Am I interpreting correctly that you're saying Chrome was already catching X errors and dealing with them? I'd like to know if that means destroying the window and then calling eglDestroySurface, or the reverse. I'll try to look into that today.
,
Jul 13
XDestroyWindow can be called by any client (not only the one that created the window), including (and as a primary use case) the WM. Windows can also be destroyed (server-side) if the parent window is destroyed. The window destruction happens on the server, after which requests that reference the destroyed window will receive BadWindow errors. All XDestroyWindow does is send the DestroyWindow request to the server (see https://github.com/mirror/libX11/blob/master/src/DestWind.c#L33). XGetWindowAttributes does round-trip to the server, where the attributes are looked up and sent back to the client. They are not stored client-side. See https://github.com/mirror/libX11/blob/master/src/GetWAttrs.c#L89. If the Window is invalid by the time the request is received by the X server, it will generate an X error (BadWindow), which will make the XGetWindowAttributes return failure. All libX11 does is essentially sending requests and receiving responses to/from the X server. XIDs (including Windows) are just references to server-side resources. You generally can't reason about their state just by tracking things client-side as any X client can operate on any resource asynchronously from that client. Again, it's very possible that Chrome does something bad, and we should fix that, but even destroying a Window too early shouldn't cause anything beyond X errors, and our software must deal with that gracefully since it can happen independently of our control.
,
Jul 13
ClusterFuzz linux bots don't have gpus, they run with --use-gl=swiftshader. They are running in a docker image, so i suspect maybe xvfb/blackbox window manager acts sometime funky.
,
Jul 13
Got it, thanks! I'll remove the window validation check for now. It's unfortunate that X11 doesn't adhere to RAII principles and throws internal and external causes of errors onto the same pile. Anyway, SwiftShader shouldn't aggravate that by failing hard.
,
Jul 13
The following revision refers to this bug: https://swiftshader.googlesource.com/SwiftShader.git/+/8fb6f6a129f5ed809db7ba3a8bc862dd5e7bc75d commit 8fb6f6a129f5ed809db7ba3a8bc862dd5e7bc75d Author: Nicolas Capens <capn@google.com> Date: Fri Jul 13 18:18:05 2018 Remove X11 window validation. Despite being a 'client resource', the window can become invalid due to events outside of the client code's control, which causes XGetWindowAttributes to fail because it retrieves servers-side data that is no longer available. Hence it is something we should expect to see happen, and not (always) an indication of a bug that needs fixing. Also, we should be able to safely continue with an invalid window. At this point it's up to the client code to catch the X error and handle it appropriately. The EGL spec does not indicate that it should catch it instead and generate an error (eglSwapBuffers can generate EGL_CONTEXT_LOST but that's reserved for power management events). Bug chromium:861882 Bug chromium:824522 Change-Id: I78a364516b9466f652c94de68553369935590bde Reviewed-on: https://swiftshader-review.googlesource.com/19868 Reviewed-by: Antoine Labour <piman@chromium.org> Reviewed-by: Nicolas Capens <nicolascapens@google.com> Tested-by: Nicolas Capens <nicolascapens@google.com> [modify] https://crrev.com/8fb6f6a129f5ed809db7ba3a8bc862dd5e7bc75d/src/Main/FrameBufferX11.cpp [modify] https://crrev.com/8fb6f6a129f5ed809db7ba3a8bc862dd5e7bc75d/src/Main/FrameBufferX11.hpp
,
Jul 13
inferno@, thanks for the info. Yes I do suspect there's still something going on that isn't quite supposed to happen. Despite the window lifetime not being entirely under Chrome's control, in theory we should have enough control over the test environment to ensure things run without losing the window. Anyway, it's clear to me now that SwiftShader is supposed to be able to merrily go on without a valid window. Sorry again for the trouble this caused. Should be fixed soon.
,
Jul 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ad06a642551a4913c4d8a7e7c8d0f0a47edb7d07 commit ad06a642551a4913c4d8a7e7c8d0f0a47edb7d07 Author: Nicolas Capens <capn@chromium.org> Date: Fri Jul 13 21:21:16 2018 Roll SwiftShader 6a990f8..8fb6f6a https://swiftshader.googlesource.com/SwiftShader.git/+log/6a990f8..8fb6f6a BUG= 861882 TEST=bots CQ_INCLUDE_TRYBOTS=luci.chromium.try:win_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_chromium_cfi_rel_ng;luci.chromium.try:android_optional_gpu_tests_rel Change-Id: I5a89accaec0bd01697ca9c4f0e4a6ac62de23e54 Reviewed-on: https://chromium-review.googlesource.com/1136867 Reviewed-by: Antoine Labour <piman@chromium.org> Reviewed-by: Abhishek Arya <inferno@chromium.org> Commit-Queue: Nicolas Capens <capn@chromium.org> Cr-Commit-Position: refs/heads/master@{#575073} [modify] https://crrev.com/ad06a642551a4913c4d8a7e7c8d0f0a47edb7d07/DEPS
,
Jul 13
ClusterFuzz has detected this issue as fixed in range 575068:575074. Detailed report: https://clusterfuzz.com/testcase?key=4671324799369216 Fuzzer: inferno_sampler Job Type: linux_asan_chrome_media Platform Id: linux Crash Type: Abrt Crash Address: 0x053900004193 Crash State: sw::FrameBufferX11::~FrameBufferX11 egl::DestroySurface gl::NativeViewGLSurfaceEGLX11::~NativeViewGLSurfaceEGLX11 Sanitizer: address (ASAN) Regressed: https://clusterfuzz.com/revisions?job=linux_asan_chrome_media&range=573289:573292 Fixed: https://clusterfuzz.com/revisions?job=linux_asan_chrome_media&range=575068:575074 Reproducer Testcase: https://clusterfuzz.com/download?testcase_id=4671324799369216 See https://github.com/google/clusterfuzz-tools for more information. If you suspect that the result above is incorrect, try re-doing that job on the test case report page.
,
Jul 13
ClusterFuzz testcase 4671324799369216 is verified as fixed, so closing issue as verified. If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue. |
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by ClusterFuzz
, Jul 9