New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 720933 link

Starred by 8 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 1
Type: Bug

Blocking:
issue 722380



Sign in to add a comment

GPU process crashes when using SwiftShader in archive builds

Project Member Reported by capn@chromium.org, May 10 2017

Issue description

Archives of Beta builds of M59 on Linux are failing to run with SwiftShader (launch with --disable-gpu). These can be found here: https://pantheon.corp.google.com/storage/browser/chrome-unsigned/desktop-5c0tCh/?authuser=0&prefix=59.0.3071

Local builds of the 3071 branch work fine. I'm using is_debug=false and tried both a component build and non-component build. The binary sizes of the SwiftShader libraries under //swiftshader/ are different than those in the archives, which indicates there's still a significant difference in the build parameters and/or environment.
 

Comment 1 by capn@chromium.org, May 11 2017

Owner: capn@chromium.org
Status: Started (was: Available)
I can replicate it by setting is_official_build = true in the gn args.

It appears to enable link time optimization. This reminds me of  Issue 686980 , except that one resulted in a link failure. Using the same build flags that were used as a workaround to fix that, doesn't help here.

Comment 2 by capn@chromium.org, May 11 2017

Components: Internals>GPU>SwiftShader Infra>Client>Chrome>Release

Comment 3 by capn@chromium.org, May 11 2017

Well, there's your problem:

Dump of assembler code from 0x7ffff7fc7cac to 0x7ffff7fc7ce4:
   0x00007ffff7fc7cac <egl::MakeCurrent(void*, void*, void*, void*)+195>:	48 89 df	mov    rdi,rbx
   0x00007ffff7fc7caf <egl::MakeCurrent(void*, void*, void*, void*)+198>:	e8 e4 06 00 00	call   0x7ffff7fc8398 <egl::setCurrentDrawSurface(egl::Surface*)>
   0x00007ffff7fc7cb4 <egl::MakeCurrent(void*, void*, void*, void*)+203>:	4c 89 ff	mov    rdi,r15
   0x00007ffff7fc7cb7 <egl::MakeCurrent(void*, void*, void*, void*)+206>:	e8 1e 07 00 00	call   0x7ffff7fc83da <egl::setCurrentReadSurface(egl::Surface*)>
   0x00007ffff7fc7cbc <egl::MakeCurrent(void*, void*, void*, void*)+211>:	4c 89 f7	mov    rdi,r14
   0x00007ffff7fc7cbf <egl::MakeCurrent(void*, void*, void*, void*)+214>:	e8 92 06 00 00	call   0x7ffff7fc8356 <egl::setCurrentContext(egl::Context*)>
   0x00007ffff7fc7cc4 <egl::MakeCurrent(void*, void*, void*, void*)+219>:	4d 85 f6	test   r14,r14
=> 0x00007ffff7fc7cc7 <egl::MakeCurrent(void*, void*, void*, void*)+222>:	75 19	jne    0x7ffff7fc7ce2 <egl::MakeCurrent(void*, void*, void*, void*)+249>
   0x00007ffff7fc7cc9 <egl::MakeCurrent(void*, void*, void*, void*)+224>:	bf 00 30 00 00	mov    edi,0x3000
   0x00007ffff7fc7cce <egl::MakeCurrent(void*, void*, void*, void*)+229>:	e8 e3 05 00 00	call   0x7ffff7fc82b6 <egl::error(int)>
   0x00007ffff7fc7cd3 <egl::MakeCurrent(void*, void*, void*, void*)+234>:	b8 01 00 00 00	mov    eax,0x1
   0x00007ffff7fc7cd8 <egl::MakeCurrent(void*, void*, void*, void*)+239>:	5b	pop    rbx
   0x00007ffff7fc7cd9 <egl::MakeCurrent(void*, void*, void*, void*)+240>:	41 5c	pop    r12
   0x00007ffff7fc7cdb <egl::MakeCurrent(void*, void*, void*, void*)+242>:	41 5d	pop    r13
   0x00007ffff7fc7cdd <egl::MakeCurrent(void*, void*, void*, void*)+244>:	41 5e	pop    r14
   0x00007ffff7fc7cdf <egl::MakeCurrent(void*, void*, void*, void*)+246>:	41 5f	pop    r15
   0x00007ffff7fc7ce1 <egl::MakeCurrent(void*, void*, void*, void*)+248>:	c3	ret    
   0x00007ffff7fc7ce2 <egl::MakeCurrent(void*, void*, void*, void*)+249>:	0f 0b	ud2  

This jumps to the undefined instruction at the end due to r14, which is a pointer to the EGL context, being non-null as expected. The corresponding C++ code is in MakeCurrent() in libEGL.cpp:

    egl::setCurrentDrawSurface(drawSurface);
    egl::setCurrentReadSurface(readSurface);
    egl::setCurrentContext(context);

    if(context)
    {
        context->makeCurrent(drawSurface);
    }

    return success(EGL_TRUE);
}

Comment 4 by capn@chromium.org, May 11 2017

It's related to  Issue 686980  after all. What seems to happen is that Clang performs aggressive link time optimization and considers the call to the virtual makeCurrent() method unreachable due to not having a definition within the linkage unit.

It can be avoided using the [[clang::lto_visibility_public]] attribute on any such class: https://clang.llvm.org/docs/LTOVisibility.html
Project Member

Comment 5 by bugdroid1@chromium.org, May 11 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c098c9960753fba76da77e7079cdfbc350ee9607

commit c098c9960753fba76da77e7079cdfbc350ee9607
Author: capn <capn@chromium.org>
Date: Thu May 11 20:50:25 2017

Roll SwiftShader f34d1ac..9ed48ba

https://swiftshader.googlesource.com/SwiftShader.git/+log/f34d1ac..9ed48ba

Fixes LTO causing hard crash on Linux official builds.

BUG= 720933 

TBR=kbr@chromium.org

TEST=bots

CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2877513004
Cr-Commit-Position: refs/heads/master@{#471065}

[modify] https://crrev.com/c098c9960753fba76da77e7079cdfbc350ee9607/DEPS

Comment 6 by capn@chromium.org, May 11 2017

Labels: Merge-Request-59
Status: Fixed (was: Started)
Project Member

Comment 7 by sheriffbot@chromium.org, May 12 2017

Labels: -Merge-Request-59 Merge-Review-59 Hotlist-Merge-Review
This bug requires manual review: DEPS changes referenced in bugdroid comments.
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), Abdul Syed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 8 by gov...@chromium.org, May 15 2017

Before we approve merge to M59, could you pls confirm this change is well baked/verified in Canary, having enough unit tests coverage and will be a safe merge to M59?

Comment 9 by capn@chromium.org, May 15 2017

Blocking: 722380

Comment 10 by capn@chromium.org, May 15 2017

Cc: sugoi@chromium.org kbr@chromium.org
It passes the WebGL conformance test suite with Canary build 60.0.3100.0 on Windows, and the corresponding "unstable" build on Linux. Previously the GPU process was crashing hard on Linux. So I'm really confident about merging this fix into M59.

It wasn't caught by any automated tests because we don't have any GPU bots running official builds. I'll look into getting SwiftShader's unit tests to run on the general buildbots (Issue 722380).
Labels: -Merge-Review-59 Merge-Approved-59
Thank you capn@. Approving merge to M59 branch 3071 based on comment #10. Please merge before 5:00 PM PT today if possible so we can pick it up for this week beta release. Thank you.
Cc: abdulsyed@chromium.org

Comment 13 by capn@chromium.org, May 15 2017

Issue 721710 has been merged into this issue.
Project Member

Comment 14 by bugdroid1@chromium.org, May 15 2017

Labels: -merge-approved-59 merge-merged-3071
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b01b9acf5b1e54795099cdd7a62dc07ce01f06e7

commit b01b9acf5b1e54795099cdd7a62dc07ce01f06e7
Author: capn <capn@chromium.org>
Date: Mon May 15 21:48:57 2017

Roll SwiftShader 30385f0..9ed48ba

https://swiftshader.googlesource.com/SwiftShader.git/+log/30385f0..9ed48ba

- Fixes LTO causing hard crash on Linux official builds.
- Fixes buffer overflow.

BUG= 720933 
BUG= 719291 

NOTRY=true
NOPRESUBMIT=true

TBR=kbr@chromium.org

Review-Url: https://codereview.chromium.org/2888433002
Cr-Commit-Position: refs/branch-heads/3071@{#568}
Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641}

[modify] https://crrev.com/b01b9acf5b1e54795099cdd7a62dc07ce01f06e7/DEPS

Comment 15 by capn@chromium.org, May 16 2017

Labels: -Hotlist-Merge-Review -merge-merged-3071
Status: Started (was: Fixed)
Build 59.0.3071.56 includes the merged patch, but it still has a crashing GPU process and no WebGL support when launching with --disable-gpu.

I've verified that the same revision 3071@{#573} built locally with is_official_build = true runs fine. So I'm back to trying to figure out what's different between the nightly builds and mine...
Project Member

Comment 16 by sheriffbot@chromium.org, May 16 2017

Labels: Fracas FoundIn-M-60
Users experienced this crash on the following builds:

Linux Dev 60.0.3095.5 -  1212.23 CPM, 3331 reports, 248 clients (signature gl::GLContextEGL::MakeCurrent)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Project Member

Comment 17 by sheriffbot@chromium.org, May 16 2017

Labels: ReleaseBlock-Dev
This crash has high impact on Chrome's stability.
Signature: gl::GLContextEGL::MakeCurrent.
Channel: dev. Platform: linux.
Labeling  issue 720933  with ReleaseBlock-Dev.


If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas

Comment 18 by capn@chromium.org, May 16 2017

Labels: -ReleaseBlock-Dev Fracas-Wrong

Comment 19 by capn@chromium.org, May 17 2017

Cc: mmoss@chromium.org
Labels: -Fracas -FoundIn-M-60
TL;DR: https://chromium.googlesource.com/chromium/src.git/+/59.0.3071.62/DEPS uses the wrong SwiftShader revision.

I built with the exact same GN args as the bot used (https://uberchromegw.corp.google.com/i/official.desktop/builders/linux64/builds/231), and still couldn't reproduce the crashes. I then noticed the bot's use of a checked in version of eu-strip, and after using that locally I got a very similar file size but still no crash. Then I used objdump to try to understand how these builds could still be different, and found a difference in the relocation entries, in particular for pthreads, that didn't seem congruent with the pthread changes for  Issue 710753 .

Then it hit me that the bots might not have picked up the DEPS roll. Indeed https://uberchromegw.corp.google.com/i/official.desktop/builders/linux64/builds/231/steps/bot_update/logs/stdio used the fairly old 400667e revision instead of 9ed48ba from the CL in #5.

https://chromium.googlesource.com/chromium/src.git/+/aceda52723a5d7a0ed87f8965f9473a70d394421%5E%21/#F0 looks suspicious to me. swiftshader_revision is 9ed48ba but src/third_party/swiftshader is using SwiftShader.git@400667e

mmoss@, it looks like you're the main author of the buildsped/scripts/publish_deps.py that generated this commit. Could you tell us if it's working as expected and I need to do something differently to change a DEPS revision in a branch, or might there be a bug in the script?

Comment 20 by mmoss@chromium.org, May 17 2017

That would explain it. Branch DEPS need to be changed in the internal buildspec repo, so for 3071, you'd want to update https://chrome-internal.googlesource.com/a/chrome/tools/buildspec.git/+/master/branches/3071/DEPS
Project Member

Comment 21 by bugdroid1@chromium.org, May 18 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chrome/tools/buildspec/+/4c54802b8c32deb32093688a836a8c17cdfe8bc6

commit 4c54802b8c32deb32093688a836a8c17cdfe8bc6
Author: Nicolas Capens <capn@google.com>
Date: Thu May 18 15:31:13 2017

Comment 22 by capn@chromium.org, May 19 2017

Status: Fixed (was: Started)
Fixed in Beta build 59.0.3071.64.
FYI, Skia gets around the internal DEPS issue by creating a milestone-specific branch of Skia (e.g., chrome/m60) a day or so prior to the Chrome branch, and pointing the new internal DEPS at that Skia branch. Then any fixes merged to the milestone-specific branch are picked up automatically by the beta builders, without needing any updates to the internal DEPS.
Labels: -M59 M-59

Comment 25 by capn@chromium.org, Jun 12 2017

senorblanco@, I believe that ANGLE does things similarly, but I'm hoping that a fix for Issue 627200 would make that kind of setup unnecessary.

Comment 26 by capn@chromium.org, Jul 6 2017

Cc: jbau...@chromium.org mer...@gmail.com
 Issue 739095  has been merged into this issue.
Project Member

Comment 27 by bugdroid1@chromium.org, Jul 5

The following revision refers to this bug:
  https://swiftshader.googlesource.com/SwiftShader.git/+/6b21ba98f6e43eace87729b96a5443de50e1e0a4

commit 6b21ba98f6e43eace87729b96a5443de50e1e0a4
Author: Nicolas Capens <capn@google.com>
Date: Thu Jul 05 01:51:15 2018

Prevent LTO from eliminating releaseTexImage() calls.

Texture::releaseTexImage() is a virtual method called by libEGL but
defined by libGLESv2, so prevent LTO from treating it as undefined.

This fixes running Chromium swiftshader_unittests with an 'is_official'
build.

 Bug chromium:720933 

Change-Id: I58c4441f9bd32b96703a28267837cc79b6087659
Reviewed-on: https://swiftshader-review.googlesource.com/19708
Tested-by: Nicolas Capens <nicolascapens@google.com>
Reviewed-by: Alexis Hétu <sugoi@google.com>

[modify] https://crrev.com/6b21ba98f6e43eace87729b96a5443de50e1e0a4/src/OpenGL/libEGL/Texture.hpp

Sign in to add a comment