GPU process crashes when using SwiftShader in archive builds |
||||||||||||||||
Issue descriptionArchives of Beta builds of M59 on Linux are failing to run with SwiftShader (launch with --disable-gpu). These can be found here: https://pantheon.corp.google.com/storage/browser/chrome-unsigned/desktop-5c0tCh/?authuser=0&prefix=59.0.3071 Local builds of the 3071 branch work fine. I'm using is_debug=false and tried both a component build and non-component build. The binary sizes of the SwiftShader libraries under //swiftshader/ are different than those in the archives, which indicates there's still a significant difference in the build parameters and/or environment.
,
May 11 2017
,
May 11 2017
Well, there's your problem:
Dump of assembler code from 0x7ffff7fc7cac to 0x7ffff7fc7ce4:
0x00007ffff7fc7cac <egl::MakeCurrent(void*, void*, void*, void*)+195>: 48 89 df mov rdi,rbx
0x00007ffff7fc7caf <egl::MakeCurrent(void*, void*, void*, void*)+198>: e8 e4 06 00 00 call 0x7ffff7fc8398 <egl::setCurrentDrawSurface(egl::Surface*)>
0x00007ffff7fc7cb4 <egl::MakeCurrent(void*, void*, void*, void*)+203>: 4c 89 ff mov rdi,r15
0x00007ffff7fc7cb7 <egl::MakeCurrent(void*, void*, void*, void*)+206>: e8 1e 07 00 00 call 0x7ffff7fc83da <egl::setCurrentReadSurface(egl::Surface*)>
0x00007ffff7fc7cbc <egl::MakeCurrent(void*, void*, void*, void*)+211>: 4c 89 f7 mov rdi,r14
0x00007ffff7fc7cbf <egl::MakeCurrent(void*, void*, void*, void*)+214>: e8 92 06 00 00 call 0x7ffff7fc8356 <egl::setCurrentContext(egl::Context*)>
0x00007ffff7fc7cc4 <egl::MakeCurrent(void*, void*, void*, void*)+219>: 4d 85 f6 test r14,r14
=> 0x00007ffff7fc7cc7 <egl::MakeCurrent(void*, void*, void*, void*)+222>: 75 19 jne 0x7ffff7fc7ce2 <egl::MakeCurrent(void*, void*, void*, void*)+249>
0x00007ffff7fc7cc9 <egl::MakeCurrent(void*, void*, void*, void*)+224>: bf 00 30 00 00 mov edi,0x3000
0x00007ffff7fc7cce <egl::MakeCurrent(void*, void*, void*, void*)+229>: e8 e3 05 00 00 call 0x7ffff7fc82b6 <egl::error(int)>
0x00007ffff7fc7cd3 <egl::MakeCurrent(void*, void*, void*, void*)+234>: b8 01 00 00 00 mov eax,0x1
0x00007ffff7fc7cd8 <egl::MakeCurrent(void*, void*, void*, void*)+239>: 5b pop rbx
0x00007ffff7fc7cd9 <egl::MakeCurrent(void*, void*, void*, void*)+240>: 41 5c pop r12
0x00007ffff7fc7cdb <egl::MakeCurrent(void*, void*, void*, void*)+242>: 41 5d pop r13
0x00007ffff7fc7cdd <egl::MakeCurrent(void*, void*, void*, void*)+244>: 41 5e pop r14
0x00007ffff7fc7cdf <egl::MakeCurrent(void*, void*, void*, void*)+246>: 41 5f pop r15
0x00007ffff7fc7ce1 <egl::MakeCurrent(void*, void*, void*, void*)+248>: c3 ret
0x00007ffff7fc7ce2 <egl::MakeCurrent(void*, void*, void*, void*)+249>: 0f 0b ud2
This jumps to the undefined instruction at the end due to r14, which is a pointer to the EGL context, being non-null as expected. The corresponding C++ code is in MakeCurrent() in libEGL.cpp:
egl::setCurrentDrawSurface(drawSurface);
egl::setCurrentReadSurface(readSurface);
egl::setCurrentContext(context);
if(context)
{
context->makeCurrent(drawSurface);
}
return success(EGL_TRUE);
}
,
May 11 2017
It's related to Issue 686980 after all. What seems to happen is that Clang performs aggressive link time optimization and considers the call to the virtual makeCurrent() method unreachable due to not having a definition within the linkage unit. It can be avoided using the [[clang::lto_visibility_public]] attribute on any such class: https://clang.llvm.org/docs/LTOVisibility.html
,
May 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c098c9960753fba76da77e7079cdfbc350ee9607 commit c098c9960753fba76da77e7079cdfbc350ee9607 Author: capn <capn@chromium.org> Date: Thu May 11 20:50:25 2017 Roll SwiftShader f34d1ac..9ed48ba https://swiftshader.googlesource.com/SwiftShader.git/+log/f34d1ac..9ed48ba Fixes LTO causing hard crash on Linux official builds. BUG= 720933 TBR=kbr@chromium.org TEST=bots CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2877513004 Cr-Commit-Position: refs/heads/master@{#471065} [modify] https://crrev.com/c098c9960753fba76da77e7079cdfbc350ee9607/DEPS
,
May 11 2017
,
May 12 2017
This bug requires manual review: DEPS changes referenced in bugdroid comments. Please contact the milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), Abdul Syed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 15 2017
Before we approve merge to M59, could you pls confirm this change is well baked/verified in Canary, having enough unit tests coverage and will be a safe merge to M59?
,
May 15 2017
,
May 15 2017
It passes the WebGL conformance test suite with Canary build 60.0.3100.0 on Windows, and the corresponding "unstable" build on Linux. Previously the GPU process was crashing hard on Linux. So I'm really confident about merging this fix into M59. It wasn't caught by any automated tests because we don't have any GPU bots running official builds. I'll look into getting SwiftShader's unit tests to run on the general buildbots (Issue 722380).
,
May 15 2017
Thank you capn@. Approving merge to M59 branch 3071 based on comment #10. Please merge before 5:00 PM PT today if possible so we can pick it up for this week beta release. Thank you.
,
May 15 2017
,
May 15 2017
Issue 721710 has been merged into this issue.
,
May 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b01b9acf5b1e54795099cdd7a62dc07ce01f06e7 commit b01b9acf5b1e54795099cdd7a62dc07ce01f06e7 Author: capn <capn@chromium.org> Date: Mon May 15 21:48:57 2017 Roll SwiftShader 30385f0..9ed48ba https://swiftshader.googlesource.com/SwiftShader.git/+log/30385f0..9ed48ba - Fixes LTO causing hard crash on Linux official builds. - Fixes buffer overflow. BUG= 720933 BUG= 719291 NOTRY=true NOPRESUBMIT=true TBR=kbr@chromium.org Review-Url: https://codereview.chromium.org/2888433002 Cr-Commit-Position: refs/branch-heads/3071@{#568} Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641} [modify] https://crrev.com/b01b9acf5b1e54795099cdd7a62dc07ce01f06e7/DEPS
,
May 16 2017
Build 59.0.3071.56 includes the merged patch, but it still has a crashing GPU process and no WebGL support when launching with --disable-gpu. I've verified that the same revision 3071@{#573} built locally with is_official_build = true runs fine. So I'm back to trying to figure out what's different between the nightly builds and mine...
,
May 16 2017
Users experienced this crash on the following builds: Linux Dev 60.0.3095.5 - 1212.23 CPM, 3331 reports, 248 clients (signature gl::GLContextEGL::MakeCurrent) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
May 16 2017
This crash has high impact on Chrome's stability. Signature: gl::GLContextEGL::MakeCurrent. Channel: dev. Platform: linux. Labeling issue 720933 with ReleaseBlock-Dev. If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
May 16 2017
,
May 17 2017
TL;DR: https://chromium.googlesource.com/chromium/src.git/+/59.0.3071.62/DEPS uses the wrong SwiftShader revision. I built with the exact same GN args as the bot used (https://uberchromegw.corp.google.com/i/official.desktop/builders/linux64/builds/231), and still couldn't reproduce the crashes. I then noticed the bot's use of a checked in version of eu-strip, and after using that locally I got a very similar file size but still no crash. Then I used objdump to try to understand how these builds could still be different, and found a difference in the relocation entries, in particular for pthreads, that didn't seem congruent with the pthread changes for Issue 710753 . Then it hit me that the bots might not have picked up the DEPS roll. Indeed https://uberchromegw.corp.google.com/i/official.desktop/builders/linux64/builds/231/steps/bot_update/logs/stdio used the fairly old 400667e revision instead of 9ed48ba from the CL in #5. https://chromium.googlesource.com/chromium/src.git/+/aceda52723a5d7a0ed87f8965f9473a70d394421%5E%21/#F0 looks suspicious to me. swiftshader_revision is 9ed48ba but src/third_party/swiftshader is using SwiftShader.git@400667e mmoss@, it looks like you're the main author of the buildsped/scripts/publish_deps.py that generated this commit. Could you tell us if it's working as expected and I need to do something differently to change a DEPS revision in a branch, or might there be a bug in the script?
,
May 17 2017
That would explain it. Branch DEPS need to be changed in the internal buildspec repo, so for 3071, you'd want to update https://chrome-internal.googlesource.com/a/chrome/tools/buildspec.git/+/master/branches/3071/DEPS
,
May 18 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chrome/tools/buildspec/+/4c54802b8c32deb32093688a836a8c17cdfe8bc6 commit 4c54802b8c32deb32093688a836a8c17cdfe8bc6 Author: Nicolas Capens <capn@google.com> Date: Thu May 18 15:31:13 2017
,
May 19 2017
Fixed in Beta build 59.0.3071.64.
,
Jun 9 2017
FYI, Skia gets around the internal DEPS issue by creating a milestone-specific branch of Skia (e.g., chrome/m60) a day or so prior to the Chrome branch, and pointing the new internal DEPS at that Skia branch. Then any fixes merged to the milestone-specific branch are picked up automatically by the beta builders, without needing any updates to the internal DEPS.
,
Jun 9 2017
,
Jun 12 2017
senorblanco@, I believe that ANGLE does things similarly, but I'm hoping that a fix for Issue 627200 would make that kind of setup unnecessary.
,
Jul 6 2017
,
Jul 5
The following revision refers to this bug: https://swiftshader.googlesource.com/SwiftShader.git/+/6b21ba98f6e43eace87729b96a5443de50e1e0a4 commit 6b21ba98f6e43eace87729b96a5443de50e1e0a4 Author: Nicolas Capens <capn@google.com> Date: Thu Jul 05 01:51:15 2018 Prevent LTO from eliminating releaseTexImage() calls. Texture::releaseTexImage() is a virtual method called by libEGL but defined by libGLESv2, so prevent LTO from treating it as undefined. This fixes running Chromium swiftshader_unittests with an 'is_official' build. Bug chromium:720933 Change-Id: I58c4441f9bd32b96703a28267837cc79b6087659 Reviewed-on: https://swiftshader-review.googlesource.com/19708 Tested-by: Nicolas Capens <nicolascapens@google.com> Reviewed-by: Alexis Hétu <sugoi@google.com> [modify] https://crrev.com/6b21ba98f6e43eace87729b96a5443de50e1e0a4/src/OpenGL/libEGL/Texture.hpp |
||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||
Comment 1 by capn@chromium.org
, May 11 2017Status: Started (was: Available)