Issue metadata
Sign in to add a comment
|
151.4% regression in rendering.desktop/mean_frame_time_renderer_compositor at 594614:594797 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Oct 3
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/128ca552e40000
,
Oct 3
📍 Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/128ca552e40000 Reland "Use SharedImageInterface for gpu and OOP raster" by piman@chromium.org https://chromium.googlesource.com/chromium/src/+/6283c799ca8e416c3ee6fb7d195191601bb5663b mean_frame_time_renderer_compositor: 36.69 → 92.42 (+55.73) Understanding performance regressions: http://g.co/ChromePerformanceRegressions Benchmark documentation link: https://bit.ly/rendering-benchmarks
,
Oct 4
So, this benchmark is terrible (500 layers that animate each with an independent setTimeout that drift over time), so it's hard to get something completely reproducible from run to run, but in a nutshell here's what's happening. First, this benchmark (with GPU raster) is GPU main-thread bound (possibly GPU-bound). Before my patch, the compositor command buffer runs out of space when creating textures (because creating 500 textures takes more than the 64kb we have), and so it ends up synchronizing with the GPU process main thread, providing some accidental throttling in the work that it submits to the GPU process (compositor thread is blocked, and main thread is blocked). My patch bypasses the command buffer to create/destroy the textures, so it doesn't run into this accidental throttling. Compositor thread and main thread are free. But in turns what it looks like is that we're submitting more work, and so it appears that the display compositor runs less often and we end up with longer thread times. It's very easy to reproduce this behavior before my patch just by increasing the command buffer size to 1MB. I'm not actually sure why the display compositor runs less often, I would expect that the GPU scheduler would prioritize it over tiles that it doesn't depend on (e.g. for next frame) - tasks are small, there's just a lot of them. It's somewhat difficult to investigate because the benchmark is not super reproducible. In the short term, I don't think we should revert or anything for this particular regression. It's sufficiently of an edge case (500 large overlapping layers is extreme), and the patch even has (some) better properties (not blocking the compositor thread / main thread). We should investigate the scheduling behavior though, and see if we should consciously (as opposed to accidentally) throttle frame production based on GPU execution. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Oct 3