New issue
Advanced search Search tips

Issue 814460 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug
Proj-XR



Sign in to add a comment

WebXR non-exclusive session causes renderer process segfault on L/N

Project Member Reported by bsheedy@chromium.org, Feb 21 2018

Issue description

Starting a WebXR non-exclusive session on Android L or N causes a segfault in the renderer process. This was the cause of Issue 814367, where all WebXR tests that used non-exclusive sessions were failing.

For whatever reason, this doesn't happen on the M bots, which is why it wasn't caught in the CQ.

Example log output:
02-21 03:38:35.625 23455 23471 E chromium: [ERROR:texture_manager.cc(2585)] [.Offscreen-For-WebGL-0x76b724e800]GL ERROR :GL_INVALID_VALUE : glTexImage2D: dimensions out of range
02-21 03:38:35.626 23424 23439 E chromium: [ERROR:XRWebGLDrawingBuffer.cpp(247)] Framebuffer incomplete
02-21 03:38:35.632   993  1315 I nanohub : osLog: [BMI160] accPower: on=1, state=3
02-21 03:38:35.633   993  1315 I nanohub : osLog: [BMI160] gyrSetRate: rate=409600, latency=2499584, state=4
02-21 03:38:35.642 23392 23392 I chromium: [INFO:CONSOLE(0)] "[.Offscreen-For-WebGL-0x76b724e800]GL ERROR :GL_INVALID_VALUE : glTexImage2D: dimensions out of range", source: file:///storage/emulated/0/chromium_tests_root/chrome/test/data/vr/e2e_test_files/html/generic_webxr_page.html (0)
02-21 03:38:35.649 23455 23471 E chromium: [ERROR:gles2_cmd_decoder.cc(4656)] [.Offscreen-For-WebGL-0x76b724e800]GL ERROR :GL_INVALID_FRAMEBUFFER_OPERATION : glClear: framebuffer incomplete
02-21 03:38:35.649 23455 23471 E chromium: [ERROR:texture_manager.cc(2585)] [.Offscreen-For-WebGL-0x76b724e800]GL ERROR :GL_INVALID_VALUE : glTexImage2D: dimensions out of range
--------- beginning of crash
02-21 03:38:35.659 23424 23439 F libc    : Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 23439 (CrRendererMain)
02-21 03:38:35.659   523   523 W         : debuggerd: handling request: pid=23424 uid=99052 gid=99052 tid=23439
02-21 03:38:35.683   993  1315 I nanohub : osLog: [BMI160] gyrSetRate: rate=409600, latency=2499584, state=3
02-21 03:38:35.685   993  1315 I nanohub : osLog: [BMI160] accSetRate: rate=409600, latency=2499584, state=12
02-21 03:38:35.687   993  1315 I nanohub : osLog: [BMI160] accSetRate: rate=409600, latency=2499584, state=3
02-21 03:38:35.727 23516 23516 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
02-21 03:38:35.727 23516 23516 F DEBUG   : Build fingerprint: 'google/marlin/marlin:7.1.1/NMF26U/3562008:userdebug/dev-keys'
02-21 03:38:35.727 23516 23516 F DEBUG   : Revision: '0'
02-21 03:38:35.728 23516 23516 F DEBUG   : ABI: 'arm64'
02-21 03:38:35.728 23516 23516 F DEBUG   : pid: 23424, tid: 23439, name: CrRendererMain  >>> org.chromium.chrome:sandboxed_process0 <<<
02-21 03:38:35.728 23516 23516 F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
02-21 03:38:35.728 23516 23516 F DEBUG   :     x0   0000000000000000  x1   0000000000000001  x2   0000000000000000  x3   0000000000000007
02-21 03:38:35.728 23516 23516 F DEBUG   :     x4   0000000000000036  x5   00000076b5d96888  x6   000000769ba3b664  x7   725e4c51405e4b46
02-21 03:38:35.728 23516 23516 F DEBUG   :     x8   22e72869be838085  x9   22e72869be838085  x10  0000000000000036  x11  0000000000000000
02-21 03:38:35.728 23516 23516 F DEBUG   :     x12  0000000000000060  x13  00000076acd60a74  x14  0000000000000000  x15  2e8ba2e8ba2e8ba3
02-21 03:38:35.728 23516 23516 F DEBUG   :     x16  00000076ba9215b0  x17  00000076ba8c8600  x18  000000769710e5b8  x19  00000038e4573ff8
02-21 03:38:35.729 23516 23516 F DEBUG   :     x20  00000038e4574000  x21  000000769bbea5e0  x22  00000076b5d9a4e8  x23  0000000000000000
02-21 03:38:35.729 23516 23516 F DEBUG   :     x24  000000769af951d8  x25  0000000000000019  x26  00000076b5d9a4e8  x27  0000000000000001
02-21 03:38:35.729 23516 23516 F DEBUG   :     x28  000000769b511000  x29  00000076b5d96f00  x30  00000076988ed560
02-21 03:38:35.729 23516 23516 F DEBUG   :     sp   00000076b5d96e70  pc   00000076963d4684  pstate 0000000060000000
02-21 03:38:35.732 23516 23516 F DEBUG   : 
02-21 03:38:35.732 23516 23516 F DEBUG   : backtrace:
02-21 03:38:35.732 23516 23516 F DEBUG   :     #00 pc 0000000000373684  /data/app/org.chromium.chrome-1/base.apk (offset 0x321a000)
02-21 03:38:35.732 23516 23516 F DEBUG   :     #01 pc 000000000288c55c  /data/app/org.chromium.chrome-1/base.apk (offset 0x321a000)
 
Cc: klausw@chromium.org
As a note, this also seems to happen on our FYI bot's locally attached Pixel, which is running O, so I'm not sure why this didn't repro locally on my Pixel running O...
A few more things:

1. The FYI bot's local device actually got swapped recently to a Pixel XL

2. I'm unable to repro locally on a Pixel flashed to the same OS build as the FYI bot and provisioned using the same provisioning script.

3. Thanks to Issue 812428, I'm unable to repro the issue on the swarming bots

4. I was uanble to repro on a Pixel XL with Canary and the WebXR magic window sample.
This is repro-able in a hacky way by modifying the bot test spec config files to run the VR tests on the linux_android_rel_ng trybot. See https://chromium-review.googlesource.com/c/chromium/src/+/929522.
I'm unable to get the tests to run properly on the debug K trybot. I ran the logcat output from the release K trybot through the Android stack tool, but since all the debug info is stripped out on that bot, it's not very useful:

signal 11 (SIGSEGV) at 0x00000000 (code=1), thread 25277 (CrRendererMain)
pid: 25263, tid: 25277, name: CrRendererMain  >>> org.chromium.chrome:sandboxed_process0 <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000000
     r0 00000000  r1 00000001  r2 00000000  r3 00000000
     r4 7bbcd3e8  r5 75da38c0  r6 772283e0  r7 75da3928
     r8 75da3b4c  r9 75da3aa4  sl 75da3aa0  fp 00000000
     ip 789baca9  sp 75da38c0  lr 7a106f27  pc 789baca8

Stack Trace:
  RELADDR   FUNCTION                                                                                                     FILE:LINE
  000ebca8  __aeabi_memset                                                                                               ??:0:0
  01837f25  SkTSect<SkDCubic, SkDQuad>::binarySearchCoin(SkTSect<SkDQuad, SkDCubic>*, double, double, double*, double*)  ??:0:0

Labels: VR-Caught-By-Test
I managed to get a better stack trace now that swarming works again:

signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 5712 (CrRendererMain)
pid: 5697, tid: 5712, name: CrRendererMain  >>> org.chromium.chrome:sandboxed_process0 <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
     r0 00000000  r1 00000001  r2 00000000  r3 80000000
     r4 4f64d3e8  r5 eceb8a90  r6 cd8e55b8  r7 eceb8af8
     r8 eceb8d1c  r9 eceb8c74  sl eceb8c70  fp 00000000
     ip cfdd004d  sp eceb8a90  lr d13f7de3  pc cfdd004c

Stack Trace:
  RELADDR   FUNCTION                                                                    FILE:LINE
  0168b04c  GrGpuRTCommandBuffer::clearStencilClip(GrFixedClip const&, bool)            ../../third_party/skia/src/image/SkImage.cpp:162:0
  02cb2de1  blink::ImageLayerBridge::SetImage(scoped_refptr<blink::StaticBitmapImage>)  ../../third_party/WebKit/Source/platform/graphics/gpu/ImageLayerBridge.cpp:53:55
That's a much better lead than anything else I've found on this so far. (I've also been failing to repro this on my device.) Thanks for digging deeper on this! I just encountered something possibly related today, so I'm going to pull on that thread and see where I get.
Apparently we're requesting a 2940 x 4173 frame buffer, which is larger than the max supported values.
Good find! I'm going to guess that a single allocation of a buffer that large isn't going to push it over the edge, but the fact that we allocate multiple of them for a swap chain (and keep the magic window ones allocated when exclusive mode is started) probably adds up to be simply too much allocation for the system. I'll put some clamps on these values and try to be a bit more intelligent about cleaning them up. 
What appears to be happening is that we try to run XRWebGLDrawingBuffer::Resize to go from a 0 x 0 buffer to 2940 x 4173. This ends up failing here since it's too large https://cs.chromium.org/chromium/src/gpu/command_buffer/service/gles2_cmd_decoder.cc?q=gles2_cmd_decoder.cc&sq=package:chromium&dr&l=9078.

We then find out that the frame buffer is incomplete here https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/graphics/gpu/XRWebGLDrawingBuffer.cpp?q=xrwebgldrawingbuffer&sq=package:chromium&dr=CSs&l=247, but we don't do anything about it. My guess is that we then later try to use the incomplete frame buffer, which causes issues.
To summarize some offline discussion, the root cause appears to be some really, really weird behavior of setting the canvas width/height to 100%.

When set to specific px values, e.g. 300 x 300, the offset width/height are correctly reported as such and the test passes.

When we set the width/height to 100%, offset width is reported as 980, which is then multiplied by the DPR of 3 to get 2940. What's really strange is that screen.width reports 360 (the correct value for a Nexus 5) at all points during the test, so 100% should be converted to 360px...
Project Member

Comment 13 by bugdroid1@chromium.org, Mar 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/69d3f0648a5962ea3a2a4d0e5244b2be57afbae1

commit 69d3f0648a5962ea3a2a4d0e5244b2be57afbae1
Author: Brandon Jones <bajones@chromium.org>
Date: Thu Mar 15 23:30:57 2018

Ensured XRWebGLLayers clamp their framebuffer size.

In some cases extreme output canvas sized were causing failed
allocations and incomplete framebuffers, which made the ImageLayerBridge
choke. This patch both clamps the backbuffer size to the max texture
size and, if an incomplete framebuffer is detected, produces black 1x1
images for the ImageLayerBridge to consume instead of attempting to pass
the texture that failed to allocate.

Bug:  814460 
Cq-Include-Trybots: master.tryserver.blink:linux_trusty_blink_rel;master.tryserver.chromium.linux:linux_layout_tests_slimming_paint_v2
Change-Id: Idba50e08052767360423018e06bc65f1f87c4d14
Reviewed-on: https://chromium-review.googlesource.com/964796
Reviewed-by: Brian Sheedy <bsheedy@chromium.org>
Reviewed-by: Ian Vollick <vollick@chromium.org>
Commit-Queue: Brandon Jones <bajones@chromium.org>
Cr-Commit-Position: refs/heads/master@{#543549}
[add] https://crrev.com/69d3f0648a5962ea3a2a4d0e5244b2be57afbae1/third_party/WebKit/LayoutTests/xr/xrWebGLLayer_non_exclusive_adjust_size.html
[modify] https://crrev.com/69d3f0648a5962ea3a2a4d0e5244b2be57afbae1/third_party/WebKit/Source/platform/graphics/gpu/XRWebGLDrawingBuffer.cpp
[modify] https://crrev.com/69d3f0648a5962ea3a2a4d0e5244b2be57afbae1/third_party/WebKit/Source/platform/graphics/gpu/XRWebGLDrawingBuffer.h

Status: Fixed (was: Assigned)
Marking this as fixed since the clamping CL fixed the segfaults. I've filed  Issue 823563  to track investigation/fixing of the root cause (incorrectly reported window size).
Labels: Test-Complete
Components: Internals>XR
Labels: -VR-Caught-By-Test XR-Caught-By-Test

Sign in to add a comment