GPU SyncPointManager Destructor DCHECK Causes Flaky Test Failures |
||||||||||||||||||||||||||
Issue description"DisplayTest.CreatePopupShellSurface" is flaky. This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label. We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLgsSBUZsYWtlIiNEaXNwbGF5VGVzdC5DcmVhdGVQb3B1cFNoZWxsU3VyZmFjZQw. Flaky tests should be disabled within 30 minutes unless culprit CL is found and reverted. Please see more details here: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues#triaging-auto-filed-flakiness-bugs
,
Jul 22 2016
Revising for clarity. Tests include DisplayTest.CreatePopupShellSurface DisplayTest.CreateSubSurface DisplayTest.CreateSurface GamepadTest.OnButton PointerTest.OnPointerEnter KeyboardTest.OnKeyboardKey BufferTest.IsLost BufferTest.ReleaseCallback GamepadTest.OnWindowFocused KeyboardTest.OnKeyboardModifiers DisplayTest.CreateShellSurface PointerTest.OnPointerMotion KeyboardTest.OnKeyboardLeave PointerTest.OnPointerLeave DisplayTest.CreateSharedMemory DisplayTest.CreateRemoteShellSurface GamepadTest.OnAxis GamepadTest.OnStateChange PointerTest.SetCursor KeyboardTest.OnKeyboardEnter
,
Jul 22 2016
Issue 630596 has been merged into this issue.
,
Jul 22 2016
dyen@ doesn't work on this any more. This seems to be specific to exo_unittests. Maybe https://codereview.chromium.org/2162143006 is relevant?
,
Jul 22 2016
Issue 630783 has been merged into this issue.
,
Jul 22 2016
Issue 630794 has been merged into this issue.
,
Jul 23 2016
Issue 630803 has been merged into this issue.
,
Jul 23 2016
Issue 630820 has been merged into this issue.
,
Jul 23 2016
Issue 630810 has been merged into this issue.
,
Jul 25 2016
Detected 3 new flakes for test/step "BufferTest.ReleaseCallback". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyJQsSBUZsYWtlIhpCdWZmZXJUZXN0LlJlbGVhc2VDYWxsYmFjaww. This message was posted automatically by the chromium-try-flakes app. Since flakiness is ongoing, the issue was moved back into Sheriff Bug Queue (unless already there).
,
Jul 25 2016
Detected 3 new flakes for test/step "DisplayTest.CreatePopupShellSurface". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLgsSBUZsYWtlIiNEaXNwbGF5VGVzdC5DcmVhdGVQb3B1cFNoZWxsU3VyZmFjZQw. This message was posted automatically by the chromium-try-flakes app.
,
Jul 25 2016
Detected 3 new flakes for test/step "DisplayTest.CreateRemoteShellSurface". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiREaXNwbGF5VGVzdC5DcmVhdGVSZW1vdGVTaGVsbFN1cmZhY2UM. This message was posted automatically by the chromium-try-flakes app.
,
Jul 25 2016
Detected 3 new flakes for test/step "DisplayTest.CreateSharedMemory". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKQsSBUZsYWtlIh5EaXNwbGF5VGVzdC5DcmVhdGVTaGFyZWRNZW1vcnkM. This message was posted automatically by the chromium-try-flakes app.
,
Jul 25 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/94ed3a8d23c4136814d3a5a09dee7ac6b27cd1a9 commit 94ed3a8d23c4136814d3a5a09dee7ac6b27cd1a9 Author: oshima <oshima@chromium.org> Date: Mon Jul 25 18:53:19 2016 Revert of Run exo tests in parallel (patchset #1 id:1 of https://codereview.chromium.org/2162143006/ ) Reason for revert: Could be causing flaky-ness. BUG= 630625 Original issue's description: > Run exo tests in parallel > > BUG=None > R=reveman@chromium.org > > Committed: https://crrev.com/5fbe6a031d7be91fa3f276a726fb05ed4fb31c82 > Cr-Commit-Position: refs/heads/master@{#406564} TBR=reveman@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG=None Review-Url: https://codereview.chromium.org/2178913004 Cr-Commit-Position: refs/heads/master@{#407533} [modify] https://crrev.com/94ed3a8d23c4136814d3a5a09dee7ac6b27cd1a9/components/exo/test/run_all_unittests.cc
,
Jul 25 2016
Issue 630843 has been merged into this issue.
,
Jul 25 2016
Issue 630853 has been merged into this issue.
,
Jul 25 2016
Kept tagged as Sheriff-Chromium as an FYI for future sheriffs. Future sheriffs, if you don't see any additional reports from the flakiness bot on this issue by this time tomorrow, please feel free to close it.
,
Jul 26 2016
Issue 631417 has been merged into this issue.
,
Jul 26 2016
I spot-checked a couple of chromium-try-flakes logs linked from this bug or the ones duplicated into it. It seems that no further come in after #14, so removing the sheriff label. oshima@ -- not sure whether you want this open of closed, please adjust as you see fit.
,
Jul 26 2016
Issue 631351 has been merged into this issue.
,
Jul 26 2016
Issue 631361 has been merged into this issue.
,
Jul 26 2016
,
Aug 3 2016
Issue 633822 has been merged into this issue.
,
Aug 3 2016
Seems like there have still been some flakes after the revert, could you have a look? Thanks! https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLgsSBUZsYWtlIiNEaXNwbGF5VGVzdC5DcmVhdGVQb3B1cFNoZWxsU3VyZmFjZQw
,
Aug 3 2016
Issue 633808 has been merged into this issue.
,
Aug 3 2016
Issue 633831 has been merged into this issue.
,
Aug 3 2016
Issue 634081 has been merged into this issue.
,
Aug 4 2016
,
Aug 4 2016
The failure looks the same, so it probably was flaky before my change. Dave, can you look into it? +dennis who may also have some idea.
,
Aug 10 2016
Adding Sheriff-Chromium label since this continues to cause flakes and I didn't find this yesterday when it would have been helpful.
,
Aug 10 2016
Issue 631468 has been merged into this issue.
,
Aug 10 2016
Issue 636287 has been merged into this issue.
,
Aug 10 2016
The first reported flake was at "try run at 2016-07-19 22:33:36 UTC" so if the cause is a commit, it should be at or before https://crrev.com/149198074adf37cbda2f834901059728d3ad6342. The previously suspected https://crrev.com/5fbe6a031d7be91fa3f276a726fb05ed4fb31c82 was after that.
,
Aug 10 2016
All failures have been on linux_chromium_chromeos_rel_ng.
,
Aug 10 2016
I have not been able to reproduce this yet. Looks like a race condition when tearing down the SyncPointManager/Clients. I'll keep investigating and hopefully find a way to reproduce this.
,
Aug 10 2016
The best candidate I can find shortly before the first flake is "gpu: Take surface handle into account when importing buffers." (https://crrev.com/15e3d9a4516ad22cc56c1ff162f32f3d03e8de9f) Does that seems plausible?
,
Aug 10 2016
Issue 636510 has been merged into this issue.
,
Aug 10 2016
Issue 636498 has been merged into this issue.
,
Aug 10 2016
I'm disabling the assert here: https://codereview.chromium.org/2237653002
,
Aug 10 2016
,
Aug 10 2016
Please do not disable the assert. Please disable the tests instead until they are fixed. It looks like the only tests that trigger this are exo-specific tests.
,
Aug 11 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c66cdfb6f61518ea8b3e9187ffc451fe3716c246 commit c66cdfb6f61518ea8b3e9187ffc451fe3716c246 Author: piman <piman@chromium.org> Date: Thu Aug 11 00:44:43 2016 Disable exo unittests that are flaky BUG= 630625 TBR=reveman Review-Url: https://codereview.chromium.org/2235043002 Cr-Commit-Position: refs/heads/master@{#411207} [modify] https://crrev.com/c66cdfb6f61518ea8b3e9187ffc451fe3716c246/components/exo/BUILD.gn
,
Aug 11 2016
Issue 636618 has been merged into this issue.
,
Aug 11 2016
Issue 636620 has been merged into this issue.
,
Aug 11 2016
Issue 636619 has been merged into this issue.
,
Aug 11 2016
Ok, root-caused this. I'm not sure there's a particular given CL that caused the flakiness, more like a timing difference made it start to happen. To repro, patch in https://codereview.chromium.org/2237723002 (reactivates the tests and makes the timing for GamepadTest.OnAxis ever so slightly deterministic) and run exo_unittests --gtest_filter="GamepadTest.OnAxis" The issue is an interference between various events that happen (racily) at test teardown, that causes an InProcessCommandBuffer leak, which is then caught by the assert. Test teardown has multiple RunUntilIdle() loops, which, depending on timing, can cause different tasks to be executed. In particular, AshTestBase::TearDown calls RunAllPendingInMessageLoop before ash_test_helper_->TearDown(), which is what causes the window/ui::Compositor/cc::Display to be destroyed. It is possible that a frame gets drawn depending on timing (race #1). If it is, Buffer::Texture::ReleaseTexImage is called with a valid sync_token (right after the draw). Otherwise it is called with a null sync_token. If ReleaseTexImage is called with a valid sync_token, the Texture is kept alive in a callback passed to ContextProvider::SignalQuery. This callback will be posted from the GPU thread, after the commands have been submitted (or completed). Normally, when InProcessCommandBuffer (that backs the ContextProvider) is destroyed, we guarantee that the callback is posted. However, the Texture keeps a reference to the ContextProvider, so we have essentially a circular reference, until it's resolved with the callback being called naturally. This may or may not happen within the various RunUntilIdle on test teardown (race #2). If it is not, the InProcessCommandBuffer is not destroyed by the time the SyncPointManager is destroyed, causing the assert. I think what's needed is for Texture to not keep a reference to the ContextProvider, or at least provide a way to drop that reference at tear down, so that the InProcessCommandBuffer can be destroyed correctly.
,
Aug 11 2016
Taking this off the sheriff queue, the tests are all disabled now.
,
Aug 11 2016
piman@, thanks for the detailed analysis and the repro. I've uploaded a potential fix here: https://codereview.chromium.org/2240503002
,
Aug 12 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d0b75c385d2c1abf1ecff249d9ac4c79d157db73 commit d0b75c385d2c1abf1ecff249d9ac4c79d157db73 Author: reveman <reveman@chromium.org> Date: Fri Aug 12 02:46:52 2016 exo: Release context provider references when main thread context is lost. This allows use to break any circular reference to the context provider instance during tear down. BUG= 630625 TEST=exo_unittests --gtest_filter=GamepadTest.OnAxis (with https://codereview.chromium.org/2237723002 patched in) Review-Url: https://codereview.chromium.org/2240503002 Cr-Commit-Position: refs/heads/master@{#411529} [modify] https://crrev.com/d0b75c385d2c1abf1ecff249d9ac4c79d157db73/components/exo/BUILD.gn [modify] https://crrev.com/d0b75c385d2c1abf1ecff249d9ac4c79d157db73/components/exo/buffer.cc [modify] https://crrev.com/d0b75c385d2c1abf1ecff249d9ac4c79d157db73/ui/compositor/test/context_factories_for_test.cc
,
Aug 15 2016
,
Aug 29 2016
,
Oct 7 2016
,
Nov 19 2016
,
Jan 21 2017
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
||||||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||||||
Comment 1 by mpear...@chromium.org
, Jul 22 2016Components: Internals>GPU
Labels: -Sheriff-Chromium
Owner: dyen@chromium.org
Status: Assigned (was: Untriaged)