Issue metadata
Sign in to add a comment
|
"context_lost_tests (with patch)" is flaky |
||||||||||||||||||||||
Issue description"context_lost_tests (with patch)" is flaky. This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label. We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKgsSBUZsYWtlIh9jb250ZXh0X2xvc3RfdGVzdHMgKHdpdGggcGF0Y2gpDA. This flaky test/step was previously tracked in issue 643399 .
,
Sep 12 2016
It seems GpuCrash.GPUProcessCrashesExactlyOnce is crashing like this: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.linux%2Flinux_chromium_rel_ng%2F295623%2F%2B%2Frecipes%2Fsteps%2Fcontext_lost_tests_on_NVIDIA_GPU_on_Linux__with_patch__on_Linux%2F0%2Fstdout
,
Sep 27 2016
Assigning to current pixel wrangler. brianderson@ could you please take a deeper look? We are DCHECKING on a bad Resource state after a forced context loss. [1:3:0909/211338:FATAL:resource_provider.cc(1392)] Check failed: !delegated_sync_points_required_ || resource->dirty_image || !resource->needs_sync_token(). #0 0x7fd5da07646e base::debug::StackTrace::StackTrace() #1 0x7fd5da08d72b logging::LogMessage::~LogMessage() #2 0x7fd5daaf3d32 cc::ResourceProvider::PrepareSendToParent() #3 0x7fd5dab12ab5 cc::LayerTreeHostImpl::DrawLayers() #4 0x7fd5dabc789d cc::ProxyImpl::DrawAndSwapInternal() #5 0x7fd5dabc75bf cc::ProxyImpl::ScheduledActionDrawAndSwapIfPossible() #6 0x7fd5dab94c39 cc::Scheduler::ProcessScheduledActions() #7 0x7fd5dab94829 cc::Scheduler::OnBeginImplFrameDeadline() #8 0x7fd5dab969ee cc::Scheduler::BeginImplFrameWithDeadline() #9 0x7fd5dab9646f cc::Scheduler::OnBeginFrameDerivedImpl() #10 0x7fd5daaf844d cc::BeginFrameObserverBase::OnBeginFrame() #11 0x7fd5daaf9f7c cc::ExternalBeginFrameSource::OnBeginFrame()
,
Sep 27 2016
If a recent CL caused the test to become flaky it should be reverted, investigated and relanded.
,
Sep 28 2016
Detected 3 new flakes for test/step "context_lost_tests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKgsSBUZsYWtlIh9jb250ZXh0X2xvc3RfdGVzdHMgKHdpdGggcGF0Y2gpDA. This message was posted automatically by the chromium-try-flakes app.
,
Sep 29 2016
What is the status of this bug? It is significantly affecting the commit queue -- there have been 10 flakes of this test on Linux today. Brian, please treat this bug with the highest priority, find the CL that broke this and revert it. Thanks.
,
Sep 29 2016
Is this bug applicable to any specific OS or all OSs?
,
Sep 29 2016
Detected 8 new flakes for test/step "context_lost_tests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKgsSBUZsYWtlIh9jb250ZXh0X2xvc3RfdGVzdHMgKHdpdGggcGF0Y2gpDA. This message was posted automatically by the chromium-try-flakes app.
,
Sep 30 2016
Detected 12 new flakes for test/step "context_lost_tests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKgsSBUZsYWtlIh9jb250ZXh0X2xvc3RfdGVzdHMgKHdpdGggcGF0Y2gpDA. This message was posted automatically by the chromium-try-flakes app.
,
Sep 30 2016
Eric, do you think https://codereview.chromium.org/2353033003 could be related to the flakes? The patch landed on 9/26 at 20:35 and the flakes started about 2 hours later. From a quick look at the test (GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash), it crashes the GPU service twice, with a wait for the first crash to finish before running the second crash. There's another test that crashes only once (ContextLost_WebGLContextLostFromGPUProcessExit), but that doesn't appear to flake. I wonder if the wait before the second crash sometimes lets the logic in the patch trigger, which somehow causes us to draw without a sync token since we've lost the context and purged skia's caches? I also notice the patch previously landed and was reverted on 9/22-9/23, but don't see any flakes during that period. So maybe it's not your patch, unless there was a significant change the second time it landed.
,
Sep 30 2016
Issue 651850 has been merged into this issue.
,
Sep 30 2016
Similar report in Issue 651850 . If https://codereview.chromium.org/2353033003 will revert cleanly then please revert it now and let's watch to see if it clears up the flakiness.
,
Sep 30 2016
That patch seems super unlikely to me, that is dropping caches inside skia, this is a problem of synchronization in cc.
,
Sep 30 2016
> I wonder if the wait before the second crash sometimes lets the logic in the > patch trigger, which somehow causes us to draw without a sync token since we've > lost the context and purged skia's caches? Can you explain what you mean here? Why would purging skia caches affect sync tokens?
,
Sep 30 2016
Hmm... seems unlikely given that it didn't flake the last time it was landed, and we're seeing a lot of flakes now (patch was not modified between revert/reland).... although it's possible my patch interacts with something else that was landed in that interval. please go ahead and revert and we'll see if that fixes the issue.
,
Sep 30 2016
// Check the synchronization and sync token state when delegated sync points
// are required. The only case where we allow a sync token to not be set is
// the case where the image is dirty. In that case we will bind the image
// lazily and generate a sync token at that point.
DCHECK(!delegated_sync_points_required_ || resource->dirty_image ||
!resource->needs_sync_token());
needs_sync_token_ can become true in
- Resource::set_mailbox()
- Resource::SetLocallyUsed()
it becomes false always in
- Resource::UpdateSyncToken()
So we're failing to UpdateSyncToken in LayerTreeHostImpl::DrawLayers on some resource before we draw the frame, or we're using a resource we shouldn't be yet.
This seems like a problem related to TileManager more likely.
,
Sep 30 2016
Yeah, I agree that it seems unlikely that this is my CL - the first flake I see at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKgsSBUZsYWtlIh9jb250ZXh0X2xvc3RfdGVzdHMgKHdpdGggcGF0Y2gpDA is at 21:55UTC, ~17hrs after my patch landed, am I missing some? Looking at changes before this, there are a couple between my CL and the first flake that may also be worth looking into: Sunnyps@'s scheduler change - crrev.com/864a70f6f93a87ff374bf2aea2494d4d7d0150d7 vmpstr@'s re-ordering of cleanup in tile manager - crrev.com/63faa3a635d7ad979c9fc30de93ee0957003586d
,
Sep 30 2016
Sorry Eric. I was looking at the wrong day. I'm almost certain this is triggered by Sunny's scheduler change now. Looking at the flakes of just the linux_chromium_rel_ng bot, there was a previous streak of flakes from 9/10 to 9/12, where Sunny's patch went in temporarily and then it started again on 9/27 after the patch relanded. I canceled the revert of Eric's patch and will be reverting Sunny's to see if it helps.
,
Oct 1 2016
Detected 8 new flakes for test/step "context_lost_tests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKgsSBUZsYWtlIh9jb250ZXh0X2xvc3RfdGVzdHMgKHdpdGggcGF0Y2gpDA. This message was posted automatically by the chromium-try-flakes app.
,
Oct 3 2016
Just to update M-55 gets branched in 2/3 days from now. If possible please get the fix landed/reverted before M-55 branch.
,
Oct 3 2016
Looks like OS-Linux at the very least. Please add OS tags to all release blockers, it's how the release team tracks them.
,
Oct 3 2016
Detected 3 new flakes for test/step "context_lost_tests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKgsSBUZsYWtlIh9jb250ZXh0X2xvc3RfdGVzdHMgKHdpdGggcGF0Y2gpDA. This message was posted automatically by the chromium-try-flakes app.
,
Oct 3 2016
What's the status of this bug? The recent flakes are the same problem: [1:3:1003/105038:FATAL:resource_provider.cc(1415)] Check failed: !delegated_sync_points_required_ || resource->dirty_image || !resource->needs_sync_token(). #0 0x7f9da92c1b0e base::debug::StackTrace::StackTrace() #1 0x7f9da92d93fb logging::LogMessage::~LogMessage() #2 0x7f9da9d60122 cc::ResourceProvider::PrepareSendToParent() #3 0x7f9da9d79ea5 cc::LayerTreeHostImpl::DrawLayers() #4 0x7f9dacad944e cc::ProxyImpl::DrawAndSwapInternal() #5 0x7f9dacad918f cc::ProxyImpl::ScheduledActionDrawAndSwapIfPossible() #6 0x7f9dacad04f9 cc::Scheduler::ProcessScheduledActions() #7 0x7f9dacad01a9 cc::Scheduler::OnBeginImplFrameDeadline() #8 0x7f9dacad2096 cc::Scheduler::BeginImplFrameWithDeadline() #9 0x7f9dacad1b8c cc::Scheduler::OnBeginFrameDerivedImpl() #10 0x7f9da9d647ed cc::BeginFrameObserverBase::OnBeginFrame() #11 0x7f9da9d6631c cc::ExternalBeginFrameSource::OnBeginFrame() #12 0x7f9dac615abb _ZN3IPC8MessageTI23ViewMsg_BeginFrame_MetaSt5tupleIJN2cc14BeginFrameArgsEEEvE8DispatchIN7content34CompositorExternalBeginFrameSourceES9_vMS9_FvRKS4_EEEbPKNS_7MessageEPT_PT0_PT1_T2_ #13 0x7f9dac61589d content::CompositorExternalBeginFrameSource::OnMessageReceived() #14 0x7f9dac616409 content::CompositorForwardingMessageFilter::ProcessMessageOnCompositorThread() #15 0x7f9da9355c84 base::debug::TaskAnnotator::RunTask() Please point to any reverts that might have occurred; this bug wasn't updated.
,
Oct 3 2016
Revert failed to apply and is trying again today: https://codereview.chromium.org/2386183003/
,
Oct 4 2016
,
Oct 4 2016
A friendly reminder that M55 Beta launch is coming soon! Your bug is labelled as Beta ReleaseBlock, pls make sure to land the fix ASAP. Thank you.
,
Oct 4 2016
Not flaking since the revert landed so this isn't blocking any more. I was able to reproduce this failure. It happens when the worker context detects that the gpu process has crashed but the compositor context doesn't until later. During this time the worker context produces null sync tokens which causes ScopedWriteLockGL to not call UpdateSyncToken. Then during draw the DCHECK above fails. At no point does the compositor context notice the gpu process crash. I'm going to make ScopedWriteLockGL always set the sync token (even when it's null) on the resource and add a DCHECK that there was a GL error when we get a resource without a sync token in PrepareSendToParent. piman, danakj: Any other suggestions on how to improve this? Any other things that could be affected by this race? Note that in the async worker context mode (gpu streams) the compositor and worker contexts won't be in the same share group so context loss won't propagate via ShareGroup::IsLost and the race might be more visible. brianderson, jbauman, vmiura: FYI
,
Oct 4 2016
My #1 suggestion is to bind the worker context to the compositor thread so the compositor can listen to the lost callback. But that means either 1) stop sharing worker contexts 2) allow sharing contexts created on another thread. 2 is my long-term crusade right now.
,
Oct 5 2016
I uploaded a fix for the problem for now: https://codereview.chromium.org/2387333003/ #1 *should* just work with the --enable-gpu-async-worker-context flag. That was added so that gpu raster / uploads could happen on a different stream in the gpu process and contexts in different streams can't be in a share group. I don't know about other uses of worker context on the main thread but I'm guessing most (all?) of those use mailboxes and sync tokens to synchronize already.
,
Oct 5 2016
> #1 *should* just work with the --enable-gpu-async-worker-context flag. That was > added so that gpu raster / uploads could happen on a different stream in the > gpu process and contexts in different streams can't be in a share group. Oh by share I mean literally there is 1 worker context that all compositors share. We'd currently have to make 1 per compositor in order to bind it on the compositor thread. (https://bugs.chromium.org/p/chromium/issues/detail?id=487471)
,
Oct 6 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e9fd952d2b376783ae04864e5fcdec647cf2942e commit e9fd952d2b376783ae04864e5fcdec647cf2942e Author: sunnyps <sunnyps@chromium.org> Date: Thu Oct 06 23:03:27 2016 cc: Set sync token on resource even if context is lost. If the gpu process crashes the worker context will return null sync tokens which the resource write lock ignores. It's possible for the compositor context to not detect the gpu process crash until much later. Until then the scheduler might initiate a draw which will crash when DCHECKing that resources being sent have sync tokens. This CL also enables sync tokens for LayerTreeTests by default and includes some minor cleanup in LayerTreeHostTestContext. R=brianderson@chromium.org,danakj@chromium.org BUG= 645736 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel Review-Url: https://codereview.chromium.org/2387333003 Cr-Commit-Position: refs/heads/master@{#423720} [modify] https://crrev.com/e9fd952d2b376783ae04864e5fcdec647cf2942e/cc/resources/resource_provider.cc [modify] https://crrev.com/e9fd952d2b376783ae04864e5fcdec647cf2942e/cc/resources/resource_provider.h [modify] https://crrev.com/e9fd952d2b376783ae04864e5fcdec647cf2942e/cc/test/test_compositor_frame_sink.cc [modify] https://crrev.com/e9fd952d2b376783ae04864e5fcdec647cf2942e/cc/test/test_compositor_frame_sink.h [modify] https://crrev.com/e9fd952d2b376783ae04864e5fcdec647cf2942e/cc/test/test_web_graphics_context_3d.cc [modify] https://crrev.com/e9fd952d2b376783ae04864e5fcdec647cf2942e/cc/trees/layer_tree_host_unittest_context.cc
,
Oct 27 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e9fd952d2b376783ae04864e5fcdec647cf2942e commit e9fd952d2b376783ae04864e5fcdec647cf2942e Author: sunnyps <sunnyps@chromium.org> Date: Thu Oct 06 23:03:27 2016 cc: Set sync token on resource even if context is lost. If the gpu process crashes the worker context will return null sync tokens which the resource write lock ignores. It's possible for the compositor context to not detect the gpu process crash until much later. Until then the scheduler might initiate a draw which will crash when DCHECKing that resources being sent have sync tokens. This CL also enables sync tokens for LayerTreeTests by default and includes some minor cleanup in LayerTreeHostTestContext. R=brianderson@chromium.org,danakj@chromium.org BUG= 645736 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel Review-Url: https://codereview.chromium.org/2387333003 Cr-Commit-Position: refs/heads/master@{#423720} [modify] https://crrev.com/e9fd952d2b376783ae04864e5fcdec647cf2942e/cc/resources/resource_provider.cc [modify] https://crrev.com/e9fd952d2b376783ae04864e5fcdec647cf2942e/cc/resources/resource_provider.h [modify] https://crrev.com/e9fd952d2b376783ae04864e5fcdec647cf2942e/cc/test/test_compositor_frame_sink.cc [modify] https://crrev.com/e9fd952d2b376783ae04864e5fcdec647cf2942e/cc/test/test_compositor_frame_sink.h [modify] https://crrev.com/e9fd952d2b376783ae04864e5fcdec647cf2942e/cc/test/test_web_graphics_context_3d.cc [modify] https://crrev.com/e9fd952d2b376783ae04864e5fcdec647cf2942e/cc/trees/layer_tree_host_unittest_context.cc
,
Nov 4 2016
[Automated comment] removing mislabelled merge-merged-2840
,
Feb 10 2017
Detected 8 new flakes for test/step "context_lost_tests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKgsSBUZsYWtlIh9jb250ZXh0X2xvc3RfdGVzdHMgKHdpdGggcGF0Y2gpDA. This message was posted automatically by the chromium-try-flakes app.
,
Feb 14 2017
All of the above flakes in #34 were from recipe changes in Issue 676210 . At this point the flakiness of this test suite is well within acceptable levels. There are 0 flakes on most days. Closing as Fixed. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 Deleted