Issue metadata
Sign in to add a comment
|
Flakiness when exiting WebVR presentation |
||||||||||||||||||||||||||||
Issue descriptionChrome Version: ToT OS: Android The following test is flaky VrBrowserTransitionTest#testWebVrReEntryFromVrBrowser Stacktrace: C 137.535s Main java.lang.AssertionError: Polling JavaScript boolean javascriptDone succeeded, but test failed. Testharness reported failure: "Harness failed due to timeout. FAIL rAFs continue to fire after presentation re-entry: null. " C 137.535s Main at org.junit.Assert.fail(Assert.java:88) C 137.535s Main at org.chromium.chrome.browser.vr.XrTestFramework.waitOnJavaScriptStep(XrTestFramework.java:282) C 137.535s Main at org.chromium.chrome.browser.vr.XrTestFramework.executeStepAndWait(XrTestFramework.java:241) C 137.536s Main at org.chromium.chrome.browser.vr.XrTestFramework.executeStepAndWait(XrTestFramework.java:475) C 137.536s Main at org.chromium.chrome.browser.vr.VrBrowserTransitionTest.reEntryFromVrBrowserImpl(VrBrowserTransitionTest.java:328) C 137.536s Main at org.chromium.chrome.browser.vr.VrBrowserTransitionTest.testWebVrReEntryFromVrBrowser(VrBrowserTransitionTest.java:297) C 137.536s Main at java.lang.reflect.Method.invoke(Native Method) C 137.536s Main at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) C 137.536s Main at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) C 137.536s Main at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:52) C 137.536s Main at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) C 137.536s Main at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) C 137.536s Main at org.chromium.base.test.ScreenshotOnFailureStatement.evaluate(ScreenshotOnFailureStatement.java:37) C 137.537s Main at org.chromium.chrome.browser.vr.util.VrTestRuleUtils.evaluateVrTestRuleImpl(VrTestRuleUtils.java:79) C 137.537s Main at org.chromium.chrome.browser.vr.rules.ChromeTabbedActivityVrTestRule$1.evaluate(ChromeTabbedActivityVrTestRule.java:32) C 137.537s Main at org.chromium.chrome.test.ChromeActivityTestRule$1.evaluate(ChromeActivityTestRule.java:116) C 137.537s Main at android.support.test.internal.statement.UiThreadStatement.evaluate(UiThreadStatement.java:55) C 137.537s Main at android.support.test.rule.ActivityTestRule$ActivityStatement.evaluate(ActivityTestRule.java:270) C 137.537s Main at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) C 137.537s Main at org.junit.rules.RunRules.evaluate(RunRules.java:20) C 137.537s Main at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) C 137.537s Main at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) C 137.538s Main at org.chromium.base.test.BaseJUnit4ClassRunner.runChild(BaseJUnit4ClassRunner.java:237) C 137.538s Main at org.chromium.base.test.BaseJUnit4ClassRunner.runChild(BaseJUnit4ClassRunner.java:48) C 137.538s Main at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) C 137.538s Main at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) C 137.538s Main at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) C 137.538s Main at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) C 137.538s Main at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) C 137.538s Main at org.junit.runners.ParentRunner.run(ParentRunner.java:363) C 137.538s Main at org.chromium.base.test.BaseJUnit4ClassRunner.run(BaseJUnit4ClassRunner.java:220) C 137.538s Main at org.junit.runners.Suite.runChild(Suite.java:128) C 137.539s Main at org.junit.runners.Suite.runChild(Suite.java:27) C 137.539s Main at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) C 137.539s Main at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) C 137.539s Main at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) C 137.539s Main at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) C 137.539s Main at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) C 137.539s Main at org.junit.runners.ParentRunner.run(ParentRunner.java:363) C 137.539s Main at org.junit.runner.JUnitCore.run(JUnitCore.java:137) C 137.539s Main at org.junit.runner.JUnitCore.run(JUnitCore.java:115) C 137.539s Main at android.support.test.internal.runner.TestExecutor.execute(TestExecutor.java:59) C 137.540s Main at android.support.test.runner.AndroidJUnitRunner.onStart(AndroidJUnitRunner.java:262) C 137.540s Main at org.chromium.base.test.BaseChromiumAndroidJUnitRunner.onStart(BaseChromiumAndroidJUnitRunner.java:125) C 137.540s Main at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:2145) Bisect found this CL as culprit https://crrev.com/c/1199699
,
Sep 10
,
Sep 10
Will investigate and revert CL if not quick fix is found. Thanks.
,
Sep 11
FWIW, I can't reproduce this in manual testing on a Pixel 2 XL on Android P. - enable WebVR in chrome://flags - open https://webvr.info/samples/03-vr-presentation.html - tap "Enter VR" - configure Daydream View headset and pair controller (if needed) - press App button on Daydream View controller to exit presentation and drop to VR Browsing mode. Observe if the magic window mode is animating. - start over at "Enter VR" For me, the frame counter continues updating as expected while in Magic Window mode in the VR Browser. Do you have more detailed repro steps aside from the test? What's the specific hardware and OS version it was running on? ericrk@: as discussed separately, it seems appropriate to disable the RAF blocking for WebVR canvases - what would be the cleanest way to do so? See for example VRDisplay::GetFrameImage that's called from VRDisplay::submitFrame and sets up the image_release_callback. That function is only used in VR presentation.
,
Sep 11
Here's a link for context: https://cs.chromium.org/chromium/src/third_party/blink/renderer/modules/vr/vr_display.cc?q=VRDisplay::GetFrameImage&sq=package:chromium&g=0&l=737 scoped_refptr<Image> VRDisplay::GetFrameImage( std::unique_ptr<viz::SingleReleaseCallback>* out_release_callback) { [...] scoped_refptr<Image> image_ref = rendering_context_->GetStaticBitmapImage(out_release_callback); Normal compositing and rAF processing is paused while in VR presentation, it's possible that something gets out of sync when transitioning from VR OnPresentingVSync back to normal rAF-style OnNonPresentingVSync. There's some complicated logic related to this in VRDisplay::RequestVSync and associated member attributes. Note that on ToT, the WebVR API is only available when manually enabling via chrome://flags, the origin trial for it has ended. (Pages are encouraged to upgrade to the newer WebXR API, but that's not entirely ready yet.)
,
Sep 11
The bot that was catching this was using Pixel 1 XLs with O (specifically OPR3.170623.008). I originally thought that it was not happening on N, but I'm seeing a couple of flakes on the N bot as well. That uses Pixel 1 XLs with N (NMF26U).
,
Sep 11
I also couldn't reproduce manually. I used a Pixel XL on Android P. The same device was used to bisect.
,
Sep 11
Got it reproduced finally on a Pixel XL using a debug build, it only fails occasionally. *** Passing run: 09-11 11:08:05.278 24557 24619 I chromium: [INFO:texture_layer.cc(442)] ReleaseAndUpdateWaiting::: this=0x7c806df000 texture_layer->resources_waiting_for_release_:=2 -> 1 09-11 11:08:05.302 24557 24619 I chromium: [INFO:drawing_buffer.cc(915)] CcLayer::: 09-11 11:08:05.302 24557 24619 I chatty : uid=99101(u0_i101) CrRendererMain identical 1 line 09-11 11:08:05.302 24557 24619 I chromium: [INFO:drawing_buffer.cc(915)] CcLayer::: 09-11 11:08:05.304 24557 24619 I chromium: [INFO:texture_layer.cc(131)] SetTransferableResourceInternal::: this=0x7c806df000 resources_waiting_for_release_:=1 -> 2 09-11 11:08:05.332 24590 24612 E FrameEvents: updateAcquireFence: Did not find frame. 09-11 11:08:05.332 24590 24612 E libEGL : eglGetFrameTimestampsANDROID:2418 error 3002 (EGL_BAD_ACCESS) 09-11 11:08:05.338 24557 24619 I chromium: [INFO:texture_layer.cc(442)] ReleaseAndUpdateWaiting::: this=0x7c806df000 texture_layer->resources_waiting_for_release_:=2 -> 1 09-11 11:08:05.341 24590 24618 E chromium: [ERROR:display_scheduler.cc(306)] Not implemented reached in virtual void viz::DisplayScheduler::OnBeginFrameSourcePausedChanged(bool) 09-11 11:08:05.353 24590 24612 E FrameEvents: updateAcquireFence: Did not find frame. 09-11 11:08:05.355 24590 24612 E libEGL : eglGetFrameTimestampsANDROID:2418 error 3002 (EGL_BAD_ACCESS) 09-11 11:08:05.576 24518 24518 I chromium: [INFO:CONSOLE(66)] "Test result: Pass", source: file:///storage/emulated/0/chromium_tests_root/chrome/test/data/xr/e2e_test_files/resources/webvr_e2e.js (66) *** Failed run: 09-11 11:08:33.022 25088 25144 I chromium: [INFO:texture_layer.cc(442)] ReleaseAndUpdateWaiting::: this=0x7c8875d000 texture_layer->resources_waiting_for_release_:=2 -> 1 09-11 11:08:33.061 25088 25144 I chromium: [INFO:drawing_buffer.cc(915)] CcLayer::: 09-11 11:08:33.075 25088 25144 I chatty : uid=99102(u0_i102) CrRendererMain identical 1 line 09-11 11:08:33.075 25088 25144 I chromium: [INFO:drawing_buffer.cc(915)] CcLayer::: 09-11 11:08:33.077 25118 25143 E FrameEvents: updateAcquireFence: Did not find frame. 09-11 11:08:33.077 25118 25143 E libEGL : eglGetFrameTimestampsANDROID:2418 error 3002 (EGL_BAD_ACCESS) 09-11 11:08:33.080 25118 25149 E chromium: [ERROR:display_scheduler.cc(306)] Not implemented reached in virtual void viz::DisplayScheduler::OnBeginFrameSourcePausedChanged(bool) 09-11 11:08:33.091 25088 25144 I chromium: [INFO:texture_layer.cc(131)] SetTransferableResourceInternal::: this=0x7c8875d000 resources_waiting_for_release_:=1 -> 2 09-11 11:08:33.910 25050 25433 I native : stationary_detector.cc:253 SensorFusion: Enter stationary state. Stability time: 1 09-11 11:08:34.327 25088 25144 I chromium: [INFO:drawing_buffer.cc(915)] CcLayer::: 09-11 11:08:39.607 963 1071 I ActivityManager: Start proc 25435:com.google.android.apps.dynamite/u0a136 for broadcast com.google.android.apps.dynamite/com.google.android.libraries.social.notifications.impl.gcm.receiver.GcmReceiver 09-11 11:08:34.328 25088 25144 I chatty : uid=99102(u0_i102) CrRendererMain identical 1 line 09-11 11:08:34.328 25088 25144 I chromium: [INFO:drawing_buffer.cc(915)] CcLayer::: 09-11 11:08:41.863 25050 25050 I chromium: [INFO:CONSOLE(66)] "Test result: Fail", source: file:///storage/emulated/0/chromium_tests_root/chrome/test/data/xr/e2e_test_files/resources/webvr_e2e.js (66) 09-11 11:08:41.869 25050 25050 I chromium: [INFO:CONSOLE(67)] "Test result string: Harness failed due to timeout. FAIL rAFs continue to fire after presentation re-entry: null. ", source: file:///storage/emulated/0/chromium_tests_root/chrome/test/data/xr/e2e_test_files/resources/webvr_e2e.js (67) The two failed runs all have the following last line from resource counting: [INFO:texture_layer.cc(131)] SetTransferableResourceInternal::: this=0x7c80763400 resources_waiting_for_release_:=1 -> 2 And the five successful runs have this: [INFO:texture_layer.cc(442)] ReleaseAndUpdateWaiting::: this=0x7ca296d800 texture_layer->resources_waiting_for_release_:=2 -> 1 This looks consistent with the hypothesis that it's unexpectedly throttling rAF, apparently due to losing a release callback? FYI, I'm also seeing this message both in the success and failure case: chromium: [ERROR:display_scheduler.cc(306)] Not implemented reached in virtual void viz::DisplayScheduler::OnBeginFrameSourcePausedChanged(bool)
,
Sep 11
More specifically, issue seems to be that the texture layer is in the resources_waiting_for_release_==2 state when entering VR presentation. During VR presentation, image release callbacks get called as expected, but this bypasses the texture_layer. When exiting VR presentation, apparently resources_waiting_for_release_ is still ==2.
,
Sep 12
ericrk@: I think I do need your help with this one, I'm kind of stuck. I've added various INFO logs to try to track down what's going on, see WIP CL https://chromium-review.googlesource.com/c/chromium/src/+/1220718 . The attached file shows logcat output. Issue seems to be that after entering presentation (search for "PresentChange"), the pending texture resource (id=9 in this case) doesn't get deleted. Normally they get cleaned up after drawing via ReceiveReturnsFromParent, but we're not calling PrepareToDraw for that last frame, and the resources don't get returned, and as a result we're not destroying the ScopedDeferCommits via ReleaseAndUpdateWaiting. I'm unsure what a clean way would be to fix this. A workaround would be to just have the VR code call a custom function on its cc::Layer to indicate that presentation has started and just clear the ScopedDeferCommits, but that seems hackish - it would be cleaner to ensure that the lost resource callback actually gets called.
,
Sep 12
,
Sep 12
I filed a separate issue 882618 "WebVR hits NOTIMPLEMENTED in viz::DisplayScheduler::OnBeginFrameSourcePausedChanged" for followup, boliu@ suggests that this should be implemented.
,
Sep 12
A crude workaround in https://chromium-review.googlesource.com/c/chromium/src/+/1222703 seems to be effective at avoiding the issue, so far all ~50 iterations have succeeded. Is there a better approach?
,
Sep 13
There have now been a number of reports that my original patch is causing issues. I'm going to revert for now and will work on a better solution (maybe incorporating klausw@'s patch)
,
Sep 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0405474407dff675506e6895f267b3bb3495a388 commit 0405474407dff675506e6895f267b3bb3495a388 Author: Eric Karl <ericrk@chromium.org> Date: Thu Sep 13 22:18:06 2018 Revert "Canvas/WebGL: Defer commits to limit queued frames." This reverts commit 0dd63a3394566a94619974f91ce14f8d7680512c. Reason for revert: Causes flakiness for a large number of tests. See crbug.com/882951 Original change's description: > Canvas/WebGL: Defer commits to limit queued frames. > > This change adds more backpressure to Canvas/WebGL rendering in GPU > bound situations. > > Before, the renderer could have three or four frames of drawing commands > queued and waiting for execution by the GPU process. The GPU process > does not know which drawing commands are associated with which display > frame. Sometimes it would run multiple frames of queued drawing commands > in one display frame, causing frame rate hiccups. > > Now when the TextureLayer has more than one buffer in flight, it calls > DeferCommits to block the next requestAnimationFrame callback from > running until one of the buffers is returned, indicating that the GPU > process has executed the commands for that frame. The shorter pipeline > reduces latency and prevents the GPU process from trying to execute > multiple Canvas/WebGL frames in one display frame. > > Bug: 835353, 879658 > Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_layout_tests_slimming_paint_v2;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel > Change-Id: I7e5484add5fbd79757b205b1124ed7c2c5836dbb > Reviewed-on: https://chromium-review.googlesource.com/1199699 > Commit-Queue: Eric Karl <ericrk@chromium.org> > Reviewed-by: Sunny Sachanandani <sunnyps@chromium.org> > Reviewed-by: Chris Harrelson <chrishtr@chromium.org> > Cr-Commit-Position: refs/heads/master@{#589784} TBR=chrishtr@chromium.org,sunnyps@chromium.org,ericrk@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 835353, 879658, 882951 , 883364 , 883358 , 883057 , 882618 Change-Id: I3bee10c8f43c2868966cedb638afbb4b0d9babab Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_layout_tests_slimming_paint_v2;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel Reviewed-on: https://chromium-review.googlesource.com/1224456 Commit-Queue: Eric Karl <ericrk@chromium.org> Reviewed-by: Eric Karl <ericrk@chromium.org> Cr-Commit-Position: refs/heads/master@{#591181} [modify] https://crrev.com/0405474407dff675506e6895f267b3bb3495a388/cc/layers/texture_layer.cc [modify] https://crrev.com/0405474407dff675506e6895f267b3bb3495a388/cc/layers/texture_layer.h [modify] https://crrev.com/0405474407dff675506e6895f267b3bb3495a388/cc/layers/texture_layer_unittest.cc [modify] https://crrev.com/0405474407dff675506e6895f267b3bb3495a388/third_party/blink/renderer/platform/graphics/canvas_2d_layer_bridge.cc [modify] https://crrev.com/0405474407dff675506e6895f267b3bb3495a388/third_party/blink/renderer/platform/graphics/gpu/drawing_buffer.cc [modify] https://crrev.com/0405474407dff675506e6895f267b3bb3495a388/third_party/blink/renderer/platform/graphics/gpu/image_layer_bridge.cc
,
Sep 14
Eric, it seems that it's not safe to assume that buffers are predictably returned via the release callback. In this case, it seemed to end up in a code path where it had resources ready to draw but decided not to do so, and the resource cleanup that would normally be done in the drawing path didn't happen. Marking this bug as fixed since the rollback should take care of the test failure, but please let me know if there's a followup bug for retrying this. |
|||||||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||||||
Comment 1 by acondor@chromium.org
, Sep 10