Synthetic context loss (e.g. OOM) on renderer raster worker fails to recover |
|||
Issue descriptionspin off from crbug.com/785279 1- patch in https://chromium-review.googlesource.com/#/c/chromium/src/+/777884 2- run chrome with gpu rasterization enabled (e.g. out/Release/chrome --force-gpu-rasterization ) 3- open https://github.com/brettwooldridge/HikariCP/issues/763#issuecomment-344457445 , produce damage / create new tiles (scroll, etc.) After 30 calls to TexStorage2DImpl (logged on output), we get into a state where the raster worker context is lost but we don't notice and don't try to recover. This is because the raster worker is now on its own (client-side) share group, and so losing it doesn't force losing the compositor context, but we don't have a lost context callback set up on the shared raster worker. -> Sunny because this comes from async worker, but would love Dana's suggestions about how to route this callback best.
,
Nov 17 2017
From the other bug also: Adding a lost context listener to the worker without moving worker context to the compositor thread will be a bit awkward. I guess we'd have to go and ReleaseFrameSink for every RenderWidget. Moving binding of the worker to the compositor thread is hard: https://bugs.chromium.org/p/chromium/issues/detail?id=487471 The latter would be nice, then we can handle it on the compositor thread just the same as compositor context. But.. without that RenderThreadImpl will need to listen for it and do basically what GpuProcessTransportFactory does to remove the CompositorFrameSink from each RenderWidget I think..
,
Nov 17 2017
,
Nov 21 2017
wdyt about a fine grained lock used only in AddObserver, RemoveObserver, and when calling OnContextLost? That way each frame sink can register an observer that forwards context lost from worker thread to compositor thread. We need to be able to add/remove observers from the compositor thread for that.
,
Nov 21 2017
I think we can do something simple-ish by adding and observer (RenderThreadImpl?) on the worker context (note: callback is dispatched on the main thread) that will notify all the frame sinks (since it creates them, it can keep a WeakPtr to them and post to the compositor thread). That way we don't need to deal with multiple locks all through the stack (or risk lock inversion issues).
,
Nov 21 2017
Yep that'll work without introducing wholly new paths to reset the framesink in cc.
,
Nov 22 2017
After off-bug discussion with danakj, I'm making RenderThreadImpl keep a map of routing id to frame sink weak ptr. Entries are added/replaced in RequestNewLayerTreeFrameSink and removed in RemoveRoute. This keeps the number of weak ptrs bounded in absence of a destroy frame sink callback. RenderThreadImpl uses a helper context context lost observer object and forwards context loss to the frame sink on the compositor thread. Experimenting with piman's repro suggests that the context doesn't detect context loss correctly. It seems ContextProviderCommandBuffer::OnLostContext isn't called at all. I'll keep digging. (tangential) btw for compositor context we call AddObserver after BindToCurrentThread. The comments in ContextProvider say that we must call AddObserver before to prevent a race. It might be ok for compositor context because we never use it on another thread, but I'm not sure.
,
Nov 22 2017
The race is if it's lost after being but before addobserver, then you'll be left with a lost context that you never hear about. Since they are in the same stack and the notification comes as a posted task, it should be okay I believe.
,
Nov 23 2017
WIP CL https://chromium-review.googlesource.com/c/chromium/src/+/777884 (piman's patch is included) This doesn't work yet. The problem I'm seeing is this: 1. Context lost on worker context. RenderThreadImpl forwards this to LTFS on compositor thread which initiates creation of new LTFS on main thread. 2. New compositor and worker contexts and LTFS created on main thread. ProxyMain posts InitializeLTFS task to ProxyImpl on compositor thread. 3. New LTFS is lost because TexStorageImpl keeps throwing errors. This makes its way to RenderThreadImpl which posts a task to compositor thread like in 1 but this never makes it. Maybe I'm holding weak ptrs wrong? LTFS has a weak ptr factory which vends weak ptrs on main thread but invalidates and checks them on compositor thread. I'm pretty sure this pattern is supposed to work. Or maybe the routing id to weak ptr map isn't working as expected? We replace the frame sink weak ptr in that map in RenderThreadImpl::DidCreateLayerTreeFrameSink whenever a new frame sink is created. I've verified that we call this for each new LTFS.
,
Nov 23 2017
> LTFS has a weak ptr factory which vends weak ptrs on main thread but > invalidates and checks them on compositor thread. GetWeakPtr() has to happen on the same thread as invalidate/checking. The flag inside WeakPtr isn't threadsafe, and the first call to GetWeakPtr makes that flag. What you said you're doing should DCHECK. https://cs.chromium.org/chromium/src/base/memory/weak_ptr.h?rcl=5718a54ca055445e157a85e7d6ac8bc864821577&l=55
,
Nov 27 2017
@#10: I don't think that's the problem. You're right that GetWeakPtr isn't synchronized, but in that CL the usage pattern is safe, because it is called before the LTFS/WeakPtrFactory is used on another thread, and there is a happens-before relationship (PostTask) between the GetWeakPtr and the InvalidateWeakPtrs. TBH that's a pattern we use in many places, so I'd be surprised if it was broken.
,
Nov 28 2017
What was happening: 1. RenderThreadImpl posts LTFS::OnWorkerContextLost task to compositor thread. 2. Meanwhile, compositor thread sees a channel error and initiates context loss. (haven't figured out why yet) 3. New context is created on main thread from 2. 4. Context loss from 1 runs on compositor thread. Changing LTHI::DidLoseLTFS to nop on repeated context lost for same context fixes the problem. Uploaded a new CL for that because the old one ended up being owned by piman@ for some reason: https://chromium-review.googlesource.com/#/c/chromium/src/+/777884 I'll clean it up for review tomorrow.
,
Dec 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/158d26db58f1ee35e04a6695cfbe1484a81ff875 commit 158d26db58f1ee35e04a6695cfbe1484a81ff875 Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Tue Dec 12 19:44:10 2017 cc: Tear down frame sink when worker context is lost. Worker context is on a different share group from compositor context so worker context lost notification does not propagate to compositor context. This is mostly fine because we restart the gpu process anyway but we don't for synthetic context loss such as out of memory conditions. R=piman,danakj BUG= 786570 Cq-Include-Trybots: master.tryserver.blink:linux_trusty_blink_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel Change-Id: I4afda056ba7399806c9dd56d83a2efb208c95847 Reviewed-on: https://chromium-review.googlesource.com/795184 Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Reviewed-by: Bo <boliu@chromium.org> Reviewed-by: Sadrul Chowdhury <sadrul@chromium.org> Reviewed-by: danakj <danakj@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Cr-Commit-Position: refs/heads/master@{#523513} [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/cc/test/fake_layer_tree_frame_sink.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/cc/trees/layer_tree_frame_sink.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/cc/trees/layer_tree_frame_sink.h [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/cc/trees/layer_tree_frame_sink_unittest.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/cc/trees/layer_tree_host_impl.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/cc/trees/layer_tree_host_impl_unittest.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/cc/trees/layer_tree_host_unittest_context.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/components/viz/client/client_layer_tree_frame_sink.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/components/viz/client/client_layer_tree_frame_sink.h [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/components/viz/client/client_layer_tree_frame_sink_unittest.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/components/viz/common/gpu/context_provider.h [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/components/viz/service/frame_sinks/direct_layer_tree_frame_sink.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/components/viz/service/frame_sinks/direct_layer_tree_frame_sink.h [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/components/viz/service/frame_sinks/direct_layer_tree_frame_sink_unittest.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/components/viz/test/test_layer_tree_frame_sink.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/components/viz/test/test_layer_tree_frame_sink.h [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/content/browser/compositor/gpu_process_transport_factory.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/content/browser/compositor/viz_process_transport_factory.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/content/browser/renderer_host/compositor_impl_android.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/content/renderer/android/synchronous_layer_tree_frame_sink.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/content/renderer/android/synchronous_layer_tree_frame_sink.h [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/content/renderer/render_thread_impl.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/content/renderer/webgraphicscontext3d_provider_impl.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/services/ui/public/cpp/gpu/context_provider_command_buffer.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/ui/aura/local/layer_tree_frame_sink_local.cc [modify] https://crrev.com/158d26db58f1ee35e04a6695cfbe1484a81ff875/ui/compositor/test/in_process_context_factory.cc
,
Dec 15 2017
The above CL fixes the simulated context loss in #0. |
|||
►
Sign in to add a comment |
|||
Comment 1 by piman@chromium.org
, Nov 17 2017