Mus clients don't deal with gpu-crash |
|||
Issue descriptionRun with SingleProcessMash enabled and go to chrome://gpucrash. Notice that all content from clients is black (or white in release). Resizing the window doesn't fix things. AFAICT when going to chrome://gpucrash MusContextFactory::OnEstablishedGpuChannel() is called and a new AsyncLayerTreeFrameSink is created and set on the Compositor. Further in the ash side HostFrameSinkManager::CreateCompositorFrameSink() is called. What else needs to happen when the client loses a connection to the gpu?
,
Jan 15
> I assume, but worth verifying, that the lost context signal is getting through to cc? What function are you referring to?
,
Jan 15
The LayerTreeFrameSinkClient::DidLoseLayerTreeFrameSink() override in LayerTreeHostImpl. That gets it to request a new frame sink from the embedder which it /sounds/ like is happening but I don't wanna assume, maybe code is set up to do that for some other reason cuz I know aura is using LayerTreeFrameSink outside of ui::Compositor/cc::LayerTreeHost (which is awkward)
,
Jan 15
In SingleProcessMash ash/browser are in separate thread right? Which thread uses MustContextProvider, ash or chrome? There is some missing code there for handling context errors and such but it doesn't look like the problem here. If the browser is getting a new LayerTreeFramesink and then submitting CompositorFrames after the GPU crashes then it sounds like ash isn't embedding the browser anymore. Is ash submitting a CompositorFrame containing a SurfaceDrawQuad with the new browser SurfaceId after the GPU crashes? It looks like [1] is getting hit after the crash which means ash has the wrong LocalSurfaceId or something like that. When SurfaceAggregator tries to lookup the browser SurfaceId it doesn't find a Surface. [1] https://cs.chromium.org/chromium/src/components/viz/service/display/surface_aggregator.cc?l=201&rcl=8ee8373ae9828abad9ed511dff1fddaac7479955
,
Jan 15
,
Jan 15
In SingleProcessMash ash/browser are on the same thread, the main thread. The big difference with SPM is non-ash code (which includes content and browser windows) are using aura configured with mus. This means content/browser are using MusContextFactory for top-level windows. I will trace it, but I don't think a new surface-id is generated when the gpu-crashes. What combination of FrameSinkId/LocalSurfaceId needs to change when the gpu-crashes?
,
Jan 15
The FrameSinkId should stay the same and probably a new LocalSurfaceId should be generated. samans/jonross would know this better. However, since after resize nothing shows up still it might be that ash isn't finding out about the new LocalSurfaceIds at all?
,
Jan 15
It's actually ash that currently allocates the LSIs for top-levels. I will verify these are making there way to the client correctly.
,
Jan 15
I'm pretty sure we always recover without allocating new LocalSurfaceIds on other platforms. Do you still see gl errors?
,
Jan 15
> However, since after resize nothing shows up still it might be that ash isn't finding out about the new LocalSurfaceIds at all? AFAICT ash is still generating new ids during a resize, and the client is picking them up appropriately. > The LayerTreeFrameSinkClient::DidLoseLayerTreeFrameSink() override in LayerTreeHostImpl. This is being called as well, and makes it way to LayerTreeHost.
,
Jan 15
> Do you still see gl errors? The sequence I'm trying is (chromeos build on linux-desktop): 1. launch chrome. I get a single window with the ntp. 2. type in chrome://gpucrash 3. resize window At step 1 I get: [260197:260197:0115/153105.564810:ERROR:GrGLInterface.cpp(455)] ../../third_party/skia/src/gpu/gl/GrGLInterface.cpp:455 GrGLInterface::validate() failed. [260197:260197:0115/153105.579017:ERROR:GrGLInterface.cpp(455)] ../../third_party/skia/src/gpu/gl/GrGLInterface.cpp:455 GrGLInterface::validate() failed. And after step 2, [260415:260415:0115/153144.526995:ERROR:GrGLInterface.cpp(455)] ../../third_party/skia/src/gpu/gl/GrGLInterface.cpp:455 GrGLInterface::validate() failed. [260415:260415:0115/153144.543428:ERROR:GrGLInterface.cpp(455)] ../../third_party/skia/src/gpu/gl/GrGLInterface.cpp:455 GrGLInterface::validate() failed. [260415:260415:0115/153144.544821:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process. A bit later, say 10 seconds or so after resizing and waiting I get: [260163:260163:0115/153214.706810:ERROR:surface_manager.cc(485)] Old/orphaned temporary reference to SurfaceId(FrameSinkId[](4294967295, 2), LocalSurfaceId(77, 2, 90B8...)) [260163:260163:0115/153214.706862:ERROR:surface_manager.cc(485)] Old/orphaned temporary reference to SurfaceId(FrameSinkId[](4294967295, 2), LocalSurfaceId(61, 2, 90B8...)) [260163:260163:0115/153214.706883:ERROR:surface_manager.cc(485)] Old/orphaned temporary reference to SurfaceId(FrameSinkId[](4294967295, 2), LocalSurfaceId(13, 2, 90B8...)) The FrameSinkId in the log output is the frame sink id for the browser window. I only get this log output if I do step 3. Without step 3 I don't see it.
,
Jan 16
(6 days ago)
That error message you get after step 3 means something like the following happened: 1. Chrome browser submitted a CompositorFrame with new LSI to create surface S1. 2. Chrome browser submitted a CompositorFrame with new LSI to create surface S2. 3. Chrome browser submitted a CompositorFrame with new LSI to create surface S3. 4. Chrome browser submitted a CompositorFrame with new LSI to create surface S4. 5. After 10-20 seconds ash hasn't embedded S1, S2, S3 or S4. SurfaceManager expires the temporary reference to S1, S2 and S3 to delete them. Since S4 is the latest surface from an active client it's kept around forever.
,
Jan 16
(6 days ago)
I'm intrigued by the erratic jumps in LocalSurfaceId. Anyway, seems like the mus client has recovered properly, but there is a mismatch between what the mus thinks the client's LocalSurfaceId is and what it actually is. Unfortunately, I'm not familiar with how the coordination between mus and mus clients works to comment on why things don't work.
,
Jan 16
(6 days ago)
I believe the issue is that under the right conditions (which includes a gpu-crash) LayerTreeHostImpl generates a new LocalSurfaceId. This id is not communicated back to mus, so we get the id mismatch and nothing paints. The fix is likely something along the lines of what Jon outlined in bug 921129.
,
Jan 18
(4 days ago)
Saman, Jon, Sadrul and myself met on this. We decided best approach appears to be to have LayerTreeHostImpl own allocation. WindowPortMus will observe new surface ids via a new CompositorObserver function this is called through LayerTreeHostImplClient when a new id is generated.
,
Jan 18
(4 days ago)
LayerTreeHostImplClient is the proxy in cc, I presume this meant to be LayerTreeHostClient?
,
Jan 18
(4 days ago)
Yes, sorry, I didn't list the full chain. Any time LayerTreeHostImpl generates an id it'll call something like OnAllocatedNewChildLocalSurfaceId() on all the following classes -> SingleThreadProxy -> LayerTreeHost -> LayerTreeHostClient (which is UICompositor) -> CompositorObserver.
,
Jan 18
(4 days ago)
Does the WindowPortMus itself allocate a new LSI, or does it ask LTHI to allocate the LSI?
,
Today
(13 hours ago)
Ignoring the cases in comment 14, WindowPortMus itself allocates a new LSI. I'm looking into having a mode for LTHI that it *always* allocates LSIs and informs the delegate when it does this. This would mean WindowPortMus, at least for top-levels, does not allocate LSIs at all. |
|||
►
Sign in to add a comment |
|||
Comment 1 by danakj@chromium.org
, Jan 15