New issue
Advanced search Search tips

Issue 891784 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

151.4% regression in rendering.desktop/mean_frame_time_renderer_compositor at 594614:594797

Project Member Reported by chiniforooshan@chromium.org, Oct 3

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=891784

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=651b94d19de4b03b6511c24d2630029e615cc55ce133cde21a82e293b3b7993d


Bot(s) for this bug's original alert(s):

Win 7 Nvidia GPU Perf

rendering.desktop - Benchmark documentation link:
  https://bit.ly/rendering-benchmarks
Cc: piman@chromium.org
Owner: piman@chromium.org
Status: Assigned (was: Untriaged)
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/128ca552e40000

Reland "Use SharedImageInterface for gpu and OOP raster" by piman@chromium.org
https://chromium.googlesource.com/chromium/src/+/6283c799ca8e416c3ee6fb7d195191601bb5663b
mean_frame_time_renderer_compositor: 36.69 → 92.42 (+55.73)

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions

Benchmark documentation link:
  https://bit.ly/rendering-benchmarks
Cc: sunn...@chromium.org
So, this benchmark is terrible (500 layers that animate each with an independent setTimeout that drift over time), so it's hard to get something completely reproducible from run to run, but in a nutshell here's what's happening. First, this benchmark (with GPU raster) is GPU main-thread bound (possibly GPU-bound).

Before my patch, the compositor command buffer runs out of space when creating textures (because creating 500 textures takes more than the 64kb we have), and so it ends up synchronizing with the GPU process main thread, providing some accidental throttling in the work that it submits to the GPU process (compositor thread is blocked, and main thread is blocked).

My patch bypasses the command buffer to create/destroy the textures, so it doesn't run into this accidental throttling. Compositor thread and main thread are free. But in turns what it looks like is that we're submitting more work, and so it appears that the display compositor runs less often and we end up with longer thread times. It's very easy to reproduce this behavior before my patch just by increasing the command buffer size to 1MB.


I'm not actually sure why the display compositor runs less often, I would expect that the GPU scheduler would prioritize it over tiles that it doesn't depend on (e.g. for next frame) - tasks are small, there's just a lot of them. It's somewhat difficult to investigate because the benchmark is not super reproducible.


In the short term, I don't think we should revert or anything for this particular regression. It's sufficiently of an edge case (500 large overlapping layers is extreme), and the patch even has (some) better properties (not blocking the compositor thread / main thread).
We should investigate the scheduling behavior though, and see if we should consciously (as opposed to accidentally) throttle frame production based on GPU execution.

Sign in to add a comment