New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 715997 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Black patch is seen on privacy policy in "hotels.com" app.

Reported by pshi...@etouch.net, Apr 27 2017

Issue description

Device name: Pixel XL/N2G47E,Nexus 5X/N4F26T,Htc Desire 630/MMB29M,Samsung Galaxy J5 SM-J500F/MMB29M
WebView version: 60.0.3082.0
Application: Hotels.com,Amazon shopping
Application version:25.1.1.2.release-25_1,10.8.0.300

Bisect Information:
Per-Version bisect information:
Good build: 59.0.3062.0
Bad build:  59.0.3063.0

Regression range: https://chromium.googlesource.com/chromium/src/+log/59.0.3062.0..59.0.3063.0?pretty=fuller&n=10000

Steps to reproduce:
(1)Launch Hotels.com app,navigate to wrench menu> info >privacy policy >long tap to select all and scroll
(2)Tap on homescreen,launch Amazon app and navigate to sign in page 
(3)Fill in the details and tap on recent apps,continuously switch between the apps
(4)observe

Actual result: Black patch is seen on privacy policy in hotels.com

Expected result: No such black patch should be seen on privacy policy in hotels.com 

Additional comments: Issue is only reproducible on M & N android devices.



 

Comment 1 by battun@chromium.org, Apr 27 2017

Labels: -Pri-3 M-59 Pri-1 Type-Bug-Regression
Status: Available (was: Unconfirmed)
Please find logs and video @ http://go/chrome-androidlogs1/7/715997

Comment 2 by torne@chromium.org, Apr 27 2017

Labels: -Restrict-View-Google
I think I got this to happen once but it took a lot of switching and it disappeared before I could investigate further (on a nexus 6 running N using 59.0.3063.0). Does this happen reliably for you?

Also, some general bug-filing requests:

- Please don't mark the bug restricted if there's nothing confidential mentioned; everything here is about public builds/devices as far as I can see.

- Please try to minimise repro instructions if possible: for example in this bug does it matter what the other app you switch to is, or does it happen for any app? Does it matter if the text is selected or not? When the instructions are this elaborate it makes it had to tell which details are essential and which are just what you happened to be doing when the bug occurred. 

Comment 3 by aluo@chromium.org, Apr 28 2017

Labels: Needs-Bisect
Owner: battun@chromium.org
battun@, cl bisect please.

Comment 4 by battun@chromium.org, Apr 28 2017

Owner: aluo@chromium.org
aluo@ I am able to repro by manually, but the frequency was very low 10/3. So its very difficult to bisect this issues. 

Thanks!

Comment 5 by aluo@chromium.org, Apr 29 2017

Cc: torne@chromium.org
Labels: -Needs-Bisect hasbisect-per-revision ReleaseBlock-Stable
Owner: danakj@chromium.org
bisected to: https://chromium.googlesource.com/chromium/src/+/87fbafb0be4bf8ccc0cd5fa2578527c0aa88b704

Although bug was not consistently reproducing, I find if you scroll to bottom of the terms of service and bring up the context menu, then background, foreground the app repeatedly, it will happen about 10% of time.  There's no need to switch to a different app.

Observed many errors in log:

04-29 00:26:06.402 24670 24853 E chromium: [ERROR:gles2_cmd_decoder.cc(4390)] [.RenderWorker-0x9c6f0c00.GpuRasterization]GL ERROR :GL_INVALID_FRAMEBUFFER_OPERATION : glDrawElements: framebuffer incomplete
04-29 00:26:06.402 24670 24853 E chromium: [ERROR:gles2_cmd_decoder.cc(4390)] [.RenderWorker-0x9c6f0c00.GpuRasterization]GL ERROR :GL_INVALID_FRAMEBUFFER_OPERATION : glDrawElements: framebuffer incomplete
04-29 00:26:06.402 24670 24853 E chromium: [ERROR:gles2_cmd_decoder.cc(4390)] [.RenderWorker-0x9c6f0c00.GpuRasterization]GL ERROR :GL_INVALID_FRAMEBUFFER_OPERATION : glDrawElements: framebuffer incomplete
04-29 00:26:06.403 24670 24853 E chromium: [ERROR:gles2_cmd_decoder.cc(4390)] [.RenderWorker-0x9c6f0c00.GpuRasterization]GL ERROR :GL_INVALID_FRAMEBUFFER_OPERATION : glDrawElements: framebuffer incomplete
04-29 00:26:06.403 24670 24853 E chromium: [ERROR:gles2_cmd_decoder.cc(7766)] [.RenderWorker-0x9c6f0c00.GpuRasterization]GL ERROR :GL_INVALID_OPERATION : glFramebufferTexture2D: Attachment textarget doesn't match texture target
04-29 00:26:06.414   197   197 E Region  : Region::boolean_operation(op=7) invalid Rect={0,72,11,66}
04-29 00:26:06.414   197   197 E Region  : Region::boolean_operation(op=7) invalid Rect={132,72,132,66}
04-29 00:26:06.429   977  1639 E Surface : getSlotFromBufferLocked: unknown buffer: 0x9c348450
04-29 00:26:06.430   197   197 E Region  : Region::boolean_operation(op=7) invalid Rect={0,69,13,66}
04-29 00:26:06.430   197   197 E Region  : Region::boolean_operation(op=7) invalid Rect={132,69,132,66}
04-29 00:26:06.431 24670 24853 E chromium: [ERROR:gles2_cmd_decoder.cc(4390)] [.RenderWorker-0x9c6f0c00.GpuRasterization]GL ERROR :GL_INVALID_FRAMEBUFFER_OPERATION : glDrawElements: framebuffer incomplete
04-29 00:26:06.431 24670 24853 E chromium: [ERROR:gles2_cmd_decoder.cc(4390)] [.RenderWorker-0x9c6f0c00.GpuRasterization]GL ERROR :GL_INVALID_FRAMEBUFFER_OPERATION : glDrawElements: framebuffer incomplete
04-29 00:26:06.432 24670 24853 E chromium: [ERROR:gles2_cmd_decoder.cc(4390)] [.RenderWorker-0x9c6f0c00.GpuRasterization]GL ERROR :GL_INVALID_FRAMEBUFFER_OPERATION : glDrawElements: framebuffer incomplete
04-29 00:26:06.432 24670 24853 E chromium: [ERROR:gles2_cmd_decoder.cc(4390)] [.RenderWorker-0x9c6f0c00.GpuRasterization]GL ERROR :GL_INVALID_FRAMEBUFFER_OPERATION : glDrawElements: framebuffer incomplete
04-29 00:26:06.432 24670 24853 E chromium: [ERROR:gles2_cmd_decoder.cc(4390)] [.RenderWorker-0x9c6f0c00.GpuRasterization]GL ERROR :GL_INVALID_FRAMEBUFFER_OPERATION : glDrawElements: framebuffer incomplete


Comment 6 by aluo@chromium.org, Apr 29 2017

I was using aosp build MRA58Z on Nexus 5.
Cc: piman@chromium.org enne@chromium.org
What device were u reproducing on? Can I reproduce this on desktop? What do you mean by background/foreground the app? Which app - chrome? The tab? What context menu are you opening, and how does that relate?

Comment 9 by aluo@chromium.org, May 1 2017

Please see video in http://go/chrome-androidlogs1/7/715997.  We've only tried on Android devices, not sure about desktop.  Backgrounding means pressing the overview button on the device to bring up a list of recent apps, foregrounding is tapping on the app in the list of recent apps to make it active again.  I just checked and the context menu is actually not required to reproduce this, so don't have to worry about that step.  The device and os info is in comment 6 (Nexus 5, MRA58Z).  This bug does not appear to be Android device specific.
I'm trying to reproduce on n7 but the page looks different and im getting redirected to ca.hotels.com. I haven't seen the problem repro. I'll try on a phone instead (charging).
Oh, this is an app not the webpage I guess.
OK trying with webview installed on M from 87fbafb0be4b in the Hotels.com native app. Unable to reproduce tho so far
I am doing:

- Launch hotels app
- Go to privacy policy page
- Select all
- Switch to diff app
- Switch back to hotels app
- Scroll to bottom
- Scroll to top
- Switch to diff app
- Switch to hotels app

and then repeatedly switch back and forth, but no repro for me.
OK I spammed the app list button to background-foreground a lot and got a repro
Cc: boliu@chromium.org
Cc: vmp...@chromium.org
Notably what happens is that the webcontent in the 4 tiles in the viewport, and the 1 tile outside the viewport, are black/empty. Tiles below that are ok. If I scroll away far enough the tiles are evicted and re-rastered and are ok.
webview will drop the memory budget to 0 when app goes into background, and may or may not hold onto the last delegated frame depending on if android trims memory
Desktop tabs would do the same, but when they come back they shouldn't try to use tiles without content. It almost feels like the recording went away, I'm bisecting what in my patch exactly is the cause to try help.
Before: RenderWidget::OnWasShown called:
  compositor_->SetNeedsForcedRedraw();
    layer_tree_host_->SetNextCommitForcesRedraw();
      proxy_->SetNeedsUpdateLayers();
      force UpdateLayers to return true, causing commit
    layer_tree_host_->SetNeedsUpdateLayers();
 ScheduleComposite();
  compositor_->setNeedsCompositorUpdate();
    layer_tree_host_->SetNeedsUpdateLayers();

So this is roughly SetNeedsCommit.. lol.


The problem stops if RenderWidget::OnWasShown calls LTH::SetNeedsCommit.

It does not stop if calling LTH::SetNeedsUpdateLayers (without forcing the commit from UpdateLayers). My patch changed it to SetNeedsAnimate, which also does not prevent the racey black state as it does not force a commit.

So now question is why is there this racey state that a commit resolves.
The tiles have blue borders, so they aren't checkerboard or anything.

Most times when you background-foreground this doesn't happen but sometimes it animates a little faster than usual? That seems to correlate with when this happens.

If I create invalidation inside a tile with text selection, I can get part of the tile to reraster with content, while the rest stays black. So implies.. we have a black rastered tile that compositor thinks is accurate, and then we partial raster in the invalidation?  Im not sure how else you'd get that scenario.
I turned off partial raster in TileManager::CreateRasterTask and was unable to make a partial tile.

Turned it back on and noticed while dragging text selection over a tile boundary, I got a flash of the full tile (because the black tile is in use), then a 2nd frame that was black again with just a bit of content. This super confirms compositor has a black tile that it thinks has the content of the webpage, and no concept of any invalidation being present there.
Is it possible we become visible, schedule raster, become not visible, discard it, but think the tiles are valid or something?

This seems to happen when we become visible->not visible very quickly. I can stay not visible for a while and it will show at that point we become visible again.
Or maybe when we're backgrounded and rastering the context is no longer useful so it draws black but we don't know it?
Cc: ericrk@chromium.org
Few things:
- Verified that we're creating new textures from ResourcePool when this happens (not reusing textures somehow)
- Seeing the following error:
05-02 11:33:38.994 22882 23180 E chromium: [ERROR:gles2_cmd_decoder.cc(7766)] [.RenderWorker-0x9fa69000.GpuRasterization]GL ERROR :GL_INVALID_OPERATION : glFramebufferTexture2D: Attachment textarget doesn't match texture target

Full errors for one tile:

05-02 11:43:54.110 24740 25065 E chromium: [ERROR:gles2_cmd_decoder.cc(7764)] texture target 0
05-02 11:43:54.110 24740 25065 E chromium: [ERROR:gles2_cmd_decoder.cc(7765)] GLFaceTargetToTextureTarget 3553
05-02 11:43:54.110 24740 25065 E chromium: [ERROR:gles2_cmd_decoder.cc(7768)] [.RenderWorker-0xaec5ac00.GpuRasterization]GL ERROR :GL_INVALID_OPERATION : glFramebufferTexture2D: Attachment textarget doesn't match texture target
05-02 11:43:54.122 24740 25065 E chromium: [ERROR:gles2_cmd_decoder.cc(4390)] [.RenderWorker-0xaec5ac00.GpuRasterization]GL ERROR :GL_INVALID_FRAMEBUFFER_OPERATION : glDrawElements: framebuffer incomplete
05-02 11:43:54.123 24740 25065 E chromium: [ERROR:gles2_cmd_decoder.cc(4390)] [.RenderWorker-0xaec5ac00.GpuRasterization]GL ERROR :GL_INVALID_FRAMEBUFFER_OPERATION : glDrawElements: framebuffer incomplete
05-02 11:43:54.124 24740 25065 E chromium: [ERROR:gles2_cmd_decoder.cc(4390)] [.RenderWorker-0xaec5ac00.GpuRasterization]GL ERROR :GL_INVALID_FRAMEBUFFER_OPERATION : glDrawElements: framebuffer incomplete

Putting a glFinish after creating the textures seems to fix this, so it looks like a race between the compositor context and the worker context for the gpu raster task.
glFlush also appears to be sufficient.
FTR repro steps are much simpler than given above simply:

- Open the hotels app
- Go to menu -> info -> privacy policy (this is a webview page)
- Smash the appswitching button to background/foreground the app until it happens

You'll see the animation of the window be different sometimes, when it is very fast that's when the problem ends up happening, so there's some diff OS behaviour when this reproduces.
Cc: reve...@chromium.org sunn...@chromium.org
Owner: ericrk@chromium.org
This also occurs on TOT-ish and putting a glFlush in ResourceProvider::ScopedWriteLockGL::ScopedWriteLockGL fixes it there also.

=> ericrk and +cc other gpu scheduling/sychronization peoples.
Cc: danakj@chromium.org
FWIW these directions make it easy to replace system webview: https://docs.google.com/a/google.com/document/d/1RvorDadxWRNFDv7QyeG1WdwasMexJJvSaiAKyaImEbs/edit?usp=sharing
Owner: sunn...@chromium.org
Status: Assigned (was: Available)
Cc: kbr@chromium.org
+kbr@ fyi

Comment 34 by kbr@chromium.org, May 2 2017

I'm digging into a similar problem right now --  Issue 694359  -- where it looks like under some circumstances, destroying a GLES2CmdDecoder and making the next one current fails to properly reset the context state when using virtual contexts.

Had been hoping the bug would be obvious, but from reading the code it looks like everything is working properly, so I'm currently adding instrumentation to try to confirm this hypothesis.

Just in case, I'll point out for this bug the texture is invalid at the time we raster it, but later becomes valid, cuz we're able to raster into the same texture with partial-raster later and that works fine.
A bit more context - the issue being seen here appears to be that we raster into a texture before allocating it. The texture target for the texture (in texture manager) is 0, which should only be possible if we haven't allocated / added a level to the texture.

The texture eventually is allocated and can be rasterized into (we see partial raster content appear later), so it isn't that we are allocating into a deleted texture, it seems that we're rasterizing into a texture before allocation completes.

It seems like we may be failing to wait on the worker context for the allocation to complete on the compositor context (not sure how though).

Comment 37 by kbr@chromium.org, May 3 2017

FYI, a longstanding bug in handling of virtual GL contexts' state upon command buffer creation has been identified in https://bugs.chromium.org/p/chromium/issues/detail?id=694359#c23 , and https://codereview.chromium.org/2862443002 is up for review fixing it.

I need to run several tryjobs to confirm that this conclusively eliminates the flakiness seen in the WebGL 2.0 conformance tests. Perhaps others would like to test this fix to see if it fixes this too?

I patched kbr's fix locally but I can still reproduce the bug :(
I think kbr's patch fixes the underlying GL context state, but the errors in #24/#25 are coming from the client state (GLES2Decoder/TextureManager etc.), which means the client is doing the wrong thing - e.g. missing sync tokens?
It looks like tiles are being rastered before their textures are uploaded. See the attached trace at ~15s. One of the raster tasks at 15000.581 ms does a flush which runs on the gpu thread at 15001.003 ms (OnAsyncFlush with put offset 892) before the texture uploads on compositor context at 15042.251 ms (OnAsyncFlush with put offset 227). However, there's an OrderingBarrier with put offset 196 on the compositor context before raster tasks are scheduled at 14993.948 ms and that's not run on the GPU process before the raster tasks. The compositor's uploads seem to run because of an explicit flush with put offset 227 that it does at 15034.161 ms.

Maybe we're short circuiting in the compositor context's OrderingBarrier call and not recording the put offset?
trace_webview_tile_corruption.json.gz
302 KB Download
Cc: vmi...@chromium.org
Added some more tracing and found the bug :)

We call GLES2Impl::SetAggressivelyFreeResources when going background. This does a flush updating GpuChannelHost's put_offset cache. Then it does a finish and deletes the get buffer. When it does this it doesn't update last_barrier_put_offset_ in CommandBufferProxyImpl. After we come back to the foreground, CommandBufferProxyImpl::OrderingBarrier sees the old value of last_barrier_put_offset_ and if that happens to match the current put_offset it calls GpuChannelHost::OrderingBarrier with put_offset_changed_ = false and the entire OrederingBarrier call becomes a nop.

See the attached trace at 11,737.794 ms for the OrderingBarrier/Flush in SetAggressivelyFreeResources with put_offset = 196. Then later at 12,391.497 ms the OrderingBarrier in ScheduleTasks has the exact same put_offset even though we deleted the get buffer in between.

The fix is simple. In CommandBufferProxyImpl::SetGetBuffer set last_barrier_put_offset_ = -1 (in addition to last_put_offset_ = -1 which is already there).
oops forgot to attach trace
trace_moar_tracing.json.gz
353 KB Download
Nice investigation :D
Nice find, that is subtle!
Project Member

Comment 45 by bugdroid1@chromium.org, May 4 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e4f4044d257b331c40a6743b88e3050d8c12bee3

commit e4f4044d257b331c40a6743b88e3050d8c12bee3
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Thu May 04 00:27:19 2017

gpu: Reset OrderingBarrier state when recreating get buffer.

If the get buffer is destroyed say in SetAggressivelyFreeResources, the
next OrderingBarrier call after recreating the get buffer will see the
old value for last_barrier_put_offset_. CommandBufferProxyImpl uses this
to set the put_offset_changed parameter for OrderingBarrier. If the new
put_offset_ matches last_barrier_put_offset_ exactly the OrderingBarrier
becomes a nop. That can cause contexts to run out of order causing
all sorts of rendering glitches.

R=jbauman
BUG= 715997 

Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: Ie1e511e22a3e2f3b0e71ea872bbd690925725be9
Reviewed-on: https://chromium-review.googlesource.com/495369
Reviewed-by: John Bauman <jbauman@chromium.org>
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/heads/master@{#469217}
[modify] https://crrev.com/e4f4044d257b331c40a6743b88e3050d8c12bee3/gpu/ipc/client/command_buffer_proxy_impl.cc

#0: Can you try the next Chrome Canary and see if the bug is fixed? You'll need a device with N and follow the instructions at https://www.chromium.org/developers/androidwebview/android-webview-beta to use Chrome Canary as the system webview.
Labels: Merge-Request-59
Project Member

Comment 48 by sheriffbot@chromium.org, May 5 2017

Labels: -Merge-Request-59 Hotlist-Merge-Approved Merge-Approved-59
Your change meets the bar and is auto-approved for M59. Please go ahead and merge the CL to branch 3071 manually. Please contact milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), Abdul Syed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 49 by bugdroid1@chromium.org, May 5 2017

Labels: -merge-approved-59 merge-merged-3071
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/be610c7d9baf8796661ed2dc3326c9ca521b60fd

commit be610c7d9baf8796661ed2dc3326c9ca521b60fd
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Fri May 05 22:11:13 2017

gpu: Reset OrderingBarrier state when recreating get buffer.

If the get buffer is destroyed say in SetAggressivelyFreeResources, the
next OrderingBarrier call after recreating the get buffer will see the
old value for last_barrier_put_offset_. CommandBufferProxyImpl uses this
to set the put_offset_changed parameter for OrderingBarrier. If the new
put_offset_ matches last_barrier_put_offset_ exactly the OrderingBarrier
becomes a nop. That can cause contexts to run out of order causing
all sorts of rendering glitches.

R=jbauman
TBR=sunnyps@chromium.org
BUG= 715997 

Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: Ie1e511e22a3e2f3b0e71ea872bbd690925725be9
Reviewed-on: https://chromium-review.googlesource.com/495369
Reviewed-by: John Bauman <jbauman@chromium.org>
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/heads/master@{#469217}
(cherry picked from commit e4f4044d257b331c40a6743b88e3050d8c12bee3)

Change-Id: I8dc11c768304481db1edf06cbd4f91457497fe26
Reviewed-on: https://chromium-review.googlesource.com/498130
Reviewed-by: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/branch-heads/3071@{#425}
Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641}
[modify] https://crrev.com/be610c7d9baf8796661ed2dc3326c9ca521b60fd/gpu/ipc/client/command_buffer_proxy_impl.cc

Status: Fixed (was: Assigned)
Status: Verified (was: Fixed)
Issue unable to repro on latest M59: 59.0.3071.41  & M60: 60.0.3093.0 
Tested devices:Pixel XL/N2G47E,Nexus 5X/N4F26T,Htc Desire 630/MMB29M,Samsung Galaxy J5 SM-J500F/MMB29M

Thanks!
Verified on latest M59:59.0.3071.49
Tested devices:Nexus 5X / N2G47W

Thanks!

Sign in to add a comment