New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 630625 link

Starred by 3 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Aug 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

GPU SyncPointManager Destructor DCHECK Causes Flaky Test Failures

Project Member Reported by chromium...@appspot.gserviceaccount.com, Jul 22 2016

Issue description

"DisplayTest.CreatePopupShellSurface" is flaky.

This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label.

We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLgsSBUZsYWtlIiNEaXNwbGF5VGVzdC5DcmVhdGVQb3B1cFNoZWxsU3VyZmFjZQw.

Flaky tests should be disabled within 30 minutes unless culprit CL is found and reverted. Please see more details here: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues#triaging-auto-filed-flakiness-bugs
 
Cc: zmo@chromium.org piman@chromium.org
Components: Internals>GPU
Labels: -Sheriff-Chromium
Owner: dyen@chromium.org
Status: Assigned (was: Untriaged)
All of these failures are coming from a DCHECK in SyncPointManager's destructer.
https://cs.chromium.org/chromium/src/gpu/command_buffer/service/sync_point_manager.cc?l=393

I think it's likely that something changed in the last few days about how these series of tests are run and exited that are triggering this DCHECK (in a flaky manner).

Removing from sheriff queue and CCing some GPU folks, assigning one arbitrarily.

Summary: GPU SyncPointManager Destructer DCHECK Causes Flaky Test Failures (was: "DisplayTest.CreatePopupShellSurface" is flaky)
Revising for clarity.  Tests include
DisplayTest.CreatePopupShellSurface
DisplayTest.CreateSubSurface
DisplayTest.CreateSurface
GamepadTest.OnButton
PointerTest.OnPointerEnter
KeyboardTest.OnKeyboardKey
BufferTest.IsLost
BufferTest.ReleaseCallback
GamepadTest.OnWindowFocused
KeyboardTest.OnKeyboardModifiers
DisplayTest.CreateShellSurface
PointerTest.OnPointerMotion
KeyboardTest.OnKeyboardLeave
PointerTest.OnPointerLeave
DisplayTest.CreateSharedMemory
DisplayTest.CreateRemoteShellSurface
GamepadTest.OnAxis
GamepadTest.OnStateChange
PointerTest.SetCursor
KeyboardTest.OnKeyboardEnter
 Issue 630596  has been merged into this issue.

Comment 4 by piman@chromium.org, Jul 22 2016

Cc: reve...@chromium.org
Owner: osh...@chromium.org
dyen@ doesn't work on this any more. This seems to be specific to exo_unittests. Maybe https://codereview.chromium.org/2162143006 is relevant?
 Issue 630783  has been merged into this issue.
 Issue 630794  has been merged into this issue.
 Issue 630803  has been merged into this issue.
 Issue 630820  has been merged into this issue.
 Issue 630810  has been merged into this issue.
Project Member

Comment 10 by chromium...@appspot.gserviceaccount.com, Jul 25 2016

Labels: Sheriff-Chromium
Detected 3 new flakes for test/step "BufferTest.ReleaseCallback". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyJQsSBUZsYWtlIhpCdWZmZXJUZXN0LlJlbGVhc2VDYWxsYmFjaww. This message was posted automatically by the chromium-try-flakes app. Since flakiness is ongoing, the issue was moved back into Sheriff Bug Queue (unless already there).
Project Member

Comment 11 by chromium...@appspot.gserviceaccount.com, Jul 25 2016

Detected 3 new flakes for test/step "DisplayTest.CreatePopupShellSurface". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLgsSBUZsYWtlIiNEaXNwbGF5VGVzdC5DcmVhdGVQb3B1cFNoZWxsU3VyZmFjZQw. This message was posted automatically by the chromium-try-flakes app.
Project Member

Comment 12 by chromium...@appspot.gserviceaccount.com, Jul 25 2016

Detected 3 new flakes for test/step "DisplayTest.CreateRemoteShellSurface". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiREaXNwbGF5VGVzdC5DcmVhdGVSZW1vdGVTaGVsbFN1cmZhY2UM. This message was posted automatically by the chromium-try-flakes app.
Project Member

Comment 13 by chromium...@appspot.gserviceaccount.com, Jul 25 2016

Detected 3 new flakes for test/step "DisplayTest.CreateSharedMemory". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKQsSBUZsYWtlIh5EaXNwbGF5VGVzdC5DcmVhdGVTaGFyZWRNZW1vcnkM. This message was posted automatically by the chromium-try-flakes app.
Project Member

Comment 14 by bugdroid1@chromium.org, Jul 25 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/94ed3a8d23c4136814d3a5a09dee7ac6b27cd1a9

commit 94ed3a8d23c4136814d3a5a09dee7ac6b27cd1a9
Author: oshima <oshima@chromium.org>
Date: Mon Jul 25 18:53:19 2016

Revert of Run exo tests in parallel (patchset #1 id:1 of https://codereview.chromium.org/2162143006/ )

Reason for revert:
Could be causing flaky-ness.

BUG= 630625 

Original issue's description:
> Run exo tests in parallel
>
> BUG=None
> R=reveman@chromium.org
>
> Committed: https://crrev.com/5fbe6a031d7be91fa3f276a726fb05ed4fb31c82
> Cr-Commit-Position: refs/heads/master@{#406564}

TBR=reveman@chromium.org
# Not skipping CQ checks because original CL landed more than 1 days ago.
BUG=None

Review-Url: https://codereview.chromium.org/2178913004
Cr-Commit-Position: refs/heads/master@{#407533}

[modify] https://crrev.com/94ed3a8d23c4136814d3a5a09dee7ac6b27cd1a9/components/exo/test/run_all_unittests.cc

 Issue 630843  has been merged into this issue.
 Issue 630853  has been merged into this issue.
Kept tagged as Sheriff-Chromium as an FYI for future sheriffs.

Future sheriffs, if you don't see any additional reports from the flakiness bot on this issue by this time tomorrow, please feel free to close it.

Comment 18 by vabr@chromium.org, Jul 26 2016

 Issue 631417  has been merged into this issue.

Comment 19 by vabr@chromium.org, Jul 26 2016

Labels: -Sheriff-Chromium
I spot-checked a couple of chromium-try-flakes logs linked from this bug or the ones duplicated into it. It seems that no further come in after #14, so removing the sheriff label.

oshima@ -- not sure whether you want this open of closed, please adjust as you see fit.

Comment 20 by vabr@chromium.org, Jul 26 2016

 Issue 631351  has been merged into this issue.

Comment 21 by vabr@chromium.org, Jul 26 2016

 Issue 631361  has been merged into this issue.
Status: Fixed (was: Assigned)
 Issue 633822  has been merged into this issue.
Status: Assigned (was: Fixed)
Seems like there have still been some flakes after the revert, could you have a look? Thanks!

https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLgsSBUZsYWtlIiNEaXNwbGF5VGVzdC5DcmVhdGVQb3B1cFNoZWxsU3VyZmFjZQw
 Issue 633808  has been merged into this issue.
 Issue 633831  has been merged into this issue.
 Issue 634081  has been merged into this issue.
Labels: hotlist-infra-opportunity
Cc: -reve...@chromium.org denniskempin@chromium.org
Owner: reve...@chromium.org
The failure looks the same, so it probably was flaky before my change.
Dave, can you look into it? +dennis who may also have some idea.
Labels: Sheriff-Chromium
Adding Sheriff-Chromium label since this continues to cause flakes and I didn't find this yesterday when it would have been helpful.
Cc: jbau...@chromium.org vmi...@chromium.org reve...@chromium.org dyen@chromium.org bajones@chromium.org osh...@chromium.org
 Issue 631468  has been merged into this issue.
 Issue 636287  has been merged into this issue.
The first reported flake was at "try run at 2016-07-19 22:33:36 UTC" so if the cause is a commit, it should be at or before https://crrev.com/149198074adf37cbda2f834901059728d3ad6342. The previously suspected https://crrev.com/5fbe6a031d7be91fa3f276a726fb05ed4fb31c82 was after that.
Labels: OS-Chrome
All failures have been on linux_chromium_chromeos_rel_ng.
Status: Started (was: Assigned)
I have not been able to reproduce this yet. Looks like a race condition when tearing down the SyncPointManager/Clients. I'll keep investigating and hopefully find a way to reproduce this. 
The best candidate I can find shortly before the first flake is "gpu: Take surface handle into account when importing buffers." (https://crrev.com/15e3d9a4516ad22cc56c1ff162f32f3d03e8de9f)

Does that seems plausible?
 Issue 636510  has been merged into this issue.
 Issue 636498  has been merged into this issue.
Summary: GPU SyncPointManager Destructor DCHECK Causes Flaky Test Failures (was: GPU SyncPointManager Destructer DCHECK Causes Flaky Test Failures)
I'm disabling the assert here:
https://codereview.chromium.org/2237653002
Cc: sunn...@chromium.org

Comment 41 by piman@chromium.org, Aug 10 2016

Please do not disable the assert. Please disable the tests instead until they are fixed.

It looks like the only tests that trigger this are exo-specific tests.
Project Member

Comment 42 by bugdroid1@chromium.org, Aug 11 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c66cdfb6f61518ea8b3e9187ffc451fe3716c246

commit c66cdfb6f61518ea8b3e9187ffc451fe3716c246
Author: piman <piman@chromium.org>
Date: Thu Aug 11 00:44:43 2016

Disable exo unittests that are flaky

BUG= 630625 
TBR=reveman

Review-Url: https://codereview.chromium.org/2235043002
Cr-Commit-Position: refs/heads/master@{#411207}

[modify] https://crrev.com/c66cdfb6f61518ea8b3e9187ffc451fe3716c246/components/exo/BUILD.gn

Comment 43 by samli@chromium.org, Aug 11 2016

 Issue 636618  has been merged into this issue.

Comment 44 by samli@chromium.org, Aug 11 2016

 Issue 636620  has been merged into this issue.

Comment 45 by samli@chromium.org, Aug 11 2016

 Issue 636619  has been merged into this issue.

Comment 46 by piman@chromium.org, Aug 11 2016

Ok, root-caused this. I'm not sure there's a particular given CL that caused the flakiness, more like a timing difference made it start to happen.

To repro, patch in https://codereview.chromium.org/2237723002 (reactivates the tests and makes the timing for GamepadTest.OnAxis ever so slightly deterministic) and run exo_unittests --gtest_filter="GamepadTest.OnAxis"

The issue is an interference between various events that happen (racily) at test teardown, that causes an InProcessCommandBuffer leak, which is then caught by the assert.

Test teardown has multiple RunUntilIdle() loops, which, depending on timing, can cause different tasks to be executed. In particular, AshTestBase::TearDown calls RunAllPendingInMessageLoop before ash_test_helper_->TearDown(), which is what causes the window/ui::Compositor/cc::Display to be destroyed. It is possible that a frame gets drawn depending on timing (race #1). If it is, Buffer::Texture::ReleaseTexImage is called with a valid sync_token (right after the draw). Otherwise it is called with a null sync_token.

If ReleaseTexImage is called with a valid sync_token, the Texture is kept alive in a callback passed to ContextProvider::SignalQuery. This callback will be posted from the GPU thread, after the commands have been submitted (or completed). Normally, when InProcessCommandBuffer (that backs the ContextProvider) is destroyed, we guarantee that the callback is posted. However, the Texture keeps a reference to the ContextProvider, so we have essentially a circular reference, until it's resolved with the callback being called naturally. This may or may not happen within the various RunUntilIdle on test teardown (race #2). If it is not, the InProcessCommandBuffer is not destroyed by the time the SyncPointManager is destroyed, causing the assert.


I think what's needed is for Texture to not keep a reference to the ContextProvider, or at least provide a way to drop that reference at tear down, so that the InProcessCommandBuffer can be destroyed correctly.
Labels: -Sheriff-Chromium
Taking this off the sheriff queue, the tests are all disabled now.
piman@, thanks for the detailed analysis and the repro. I've uploaded a potential fix here: https://codereview.chromium.org/2240503002
Project Member

Comment 49 by bugdroid1@chromium.org, Aug 12 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d0b75c385d2c1abf1ecff249d9ac4c79d157db73

commit d0b75c385d2c1abf1ecff249d9ac4c79d157db73
Author: reveman <reveman@chromium.org>
Date: Fri Aug 12 02:46:52 2016

exo: Release context provider references when main thread context is lost.

This allows use to break any circular reference to the
context provider instance during tear down.

BUG= 630625 
TEST=exo_unittests --gtest_filter=GamepadTest.OnAxis (with https://codereview.chromium.org/2237723002 patched in)

Review-Url: https://codereview.chromium.org/2240503002
Cr-Commit-Position: refs/heads/master@{#411529}

[modify] https://crrev.com/d0b75c385d2c1abf1ecff249d9ac4c79d157db73/components/exo/BUILD.gn
[modify] https://crrev.com/d0b75c385d2c1abf1ecff249d9ac4c79d157db73/components/exo/buffer.cc
[modify] https://crrev.com/d0b75c385d2c1abf1ecff249d9ac4c79d157db73/ui/compositor/test/context_factories_for_test.cc

Status: Fixed (was: Started)
Labels: VerifyIn-54
Labels: VerifyIn-55

Comment 53 by dchan@google.com, Nov 19 2016

Labels: VerifyIn-56

Comment 54 by dchan@google.com, Jan 21 2017

Labels: VerifyIn-57

Comment 55 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 56 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 57 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 59 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment