GPU Process Lifetime Event UMA problems |
||||
Issue descriptionWe noticed some oddities in the UMA data for GPU.GPUProcessLifetimeEvents, GPU.SwiftShaderLifetimeEvents, GPU.DisplayCompositorLifetimeEvents during the OOP-D finch trial. https://uma.googleplex.com/p/chrome/variations/?sid=e68a5511c891c0f3139ea7287ef0ee0b I've taken a look at the code and roughly understand how it works. On GPU process launch, for any reason, GPU.GPUProcessLifetimeEvents.LAUNCHED is incremented. If the GPU process, which is running GPU compositing crashes GPU.GPUProcessLifetimeEvents.DIED_FIRST_TIME is supposed to be incremented. However, the order is reversed and count is incremented then UMA recorded so GPU.GPUProcessLifetimeEvents.DIED_SECOND_TIME is increment instead of DIED_FIRST_TIME. This is a bug. The GPU process can crash three times (recently, I'll come back to this) before we give up on GPU compositing and switch to software compositing with SwiftShader GL. So after three crashes we'd have the following histograms: GPU.GPUProcessLifetimeEvents.LAUNCHED = 4 GPU.GPUProcessLifetimeEvents.DIED_SECOND_TIME = 1 GPU.GPUProcessLifetimeEvents.DIED_THIRD_TIME = 1 GPU.GPUProcessLifetimeEvents.DIED_FOURTH_TIME = 1 The fourth launch of the GPU process is (likely) only going to be for SwiftShader GL with software compositing. There is a different UMA histogram GPU.SwiftShaderLifetimeEvents.LAUNCHED that probably should have been incremented instead of GPU.GPUProcessLifetimeEvents.LAUNCHED. If another GPU process crash happens now then GPU.SwiftShaderLifetimeEvents.DIED_FIRST_TIME is incremented (which is correct). We'd have the following histograms now after four crashes: GPU.GPUProcessLifetimeEvents.LAUNCHED = 5 GPU.GPUProcessLifetimeEvents.DIED_SECOND_TIME = 1 GPU.GPUProcessLifetimeEvents.DIED_THIRD_TIME = 1 GPU.GPUProcessLifetimeEvents.DIED_FOURTH_TIME = 1 GPU.SwiftShaderLifetimeEvents.DIED_FIRST_TIME = 1 There is a similar fallback behaviour with SwiftShader GL where GPU process can crash 3 times before we give up on it and just use software compositing with WebGL disabled. For non-OOP-D we wouldn't start the GPU process again at this point. After six GPU process crashes have the following histograms: GPU.GPUProcessLifetimeEvents.LAUNCHED = 6 GPU.GPUProcessLifetimeEvents.DIED_SECOND_TIME = 1 GPU.GPUProcessLifetimeEvents.DIED_THIRD_TIME = 1 GPU.GPUProcessLifetimeEvents.DIED_FOURTH_TIME = 1 GPU.SwiftShaderLifetimeEvents.DIED_FIRST_TIME = 1 GPU.SwiftShaderLifetimeEvents.DIED_SECOND_TIME = 1 GPU.SwiftShaderLifetimeEvents.DIED_THIRD_TIME = 1 What I think the histograms should look at this point is this instead are as follows: GPU.GPUProcessLifetimeEvents.LAUNCHED = 3 GPU.GPUProcessLifetimeEvents.DIED_FIRST_TIME = 1 GPU.GPUProcessLifetimeEvents.DIED_SECOND_TIME = 1 GPU.GPUProcessLifetimeEvents.DIED_THIRD_TIME = 1 GPU.SwiftShaderLifetimeEvents.LAUNCHED = 3 GPU.SwiftShaderLifetimeEvents.DIED_FIRST_TIME = 1 GPU.SwiftShaderLifetimeEvents.DIED_SECOND_TIME = 1 GPU.SwiftShaderLifetimeEvents.DIED_THIRD_TIME = 1 Another thing to note is that we forgive one crash per hour when deciding if falling back to software compositing is necessary. So GPU.GPUProcessLifetimeEvents.5, GPU.GPUProcessLifetimeEvents.6, etc. can also get incremented since we record the actual crash count not the recent crash count. Another thing to note is that with OOP-D the GPU process is still launched for pure software compositing. There is another UMA histogram category added GPU.DisplayCompositorLifetimeEvents. It works similar to GPU.SwiftShaderLifetimeEvents where the LAUNCHED is attributed GPU.GPUProcessLifetimeEvents.LAUNCHED but the crash bucket is correct. I can fix the bucketing if that sounds reasonable? It will be problematic for historical comparisons however.
,
May 25 2018
>Will DisplayCompositorLifetimeEvents.LAUNCHED only be incremented when we're doing software compositing no webgl, ie when GPUProcessLifetimeEvents.LAUNCHED and SwiftShaderLifetimeEvents.LAUNCHED don't change? Yes. The histogram names aren't ideal but I can also improve the descriptions so it's clear the following: GPU.GPUProcessLifetimeEvents: GPU process running for GPU compositing (and display compositor if OOP-D). GPU.SwiftShaderLifetimeEvents: GPU process running for SwiftShader WebGL (and display compositor if OOP-D). GPU.GPUProcessLifetimeEvents: GPU process running only for display compositor (OOP-D only).
,
May 25 2018
I assume u mean DisplayCompositorLifetimeEvents for the 3rd one, so ok cool thanks.
,
May 28 2018
Oops, yes, DisplayCompositorLifetimeEvents is what I meant for the third one.
,
May 28 2018
+capn and vmiura who are listed as owners for the UMA histograms. Just an FYI about the changes I intended to make.
,
Jun 5 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2dd7c4195127ffe7f4445aee9ffe6a8636f86f38 commit 2dd7c4195127ffe7f4445aee9ffe6a8636f86f38 Author: kylechar <kylechar@chromium.org> Date: Tue Jun 05 01:59:07 2018 Fix GPU process lifetime UMA histograms. There are a couple issues with the GPU process UMA histograms that this CL fixes. 1. GPU.GPUProcessLifetimeEvents, DIED_FIRST_TIME was never incremented. The first crash had DIED_SECOND_TIME incremented instead. Increment crash count after UMA output instead. 2. When the GPU process was started for SwiftShader WebGL or to run the display compositor with OOP-D the wrong histogram had LAUNCHED incremented. Make launch and crash lifetime events use the same histogram. 3. Deprecate old histograms since fixes will make data difficult to compare. Add new histograms with improved descriptions. 4. Change histograms to record the recent crash count instead of total crash count. This provides information about outcome of the crash, for example if a feature gets disabled or not. 5. Fix GPUProcessLifetimeEvent enum description. 6. Mark GPU.SoftwareRendererLifetimeEvents as obsolete. As far as I can tell there is no UMA data ever recorded for it. 7. Make GpuProcessHost::GpuCrashCount() return the total number of GPU crashes for about://gpu. Bug: 846333 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: Icb74ec9c52b450c8c5154f513e825d224bb7ba9e Reviewed-on: https://chromium-review.googlesource.com/1075513 Reviewed-by: Mark Pearson <mpearson@chromium.org> Reviewed-by: Mohsen Izadi <mohsen@chromium.org> Reviewed-by: Alexis Hétu <sugoi@chromium.org> Reviewed-by: Victor Miura <vmiura@chromium.org> Reviewed-by: Zhenyao Mo <zmo@chromium.org> Commit-Queue: kylechar <kylechar@chromium.org> Cr-Commit-Position: refs/heads/master@{#564345} [modify] https://crrev.com/2dd7c4195127ffe7f4445aee9ffe6a8636f86f38/content/browser/gpu/gpu_process_host.cc [modify] https://crrev.com/2dd7c4195127ffe7f4445aee9ffe6a8636f86f38/content/browser/gpu/gpu_process_host.h [modify] https://crrev.com/2dd7c4195127ffe7f4445aee9ffe6a8636f86f38/tools/metrics/histograms/enums.xml [modify] https://crrev.com/2dd7c4195127ffe7f4445aee9ffe6a8636f86f38/tools/metrics/histograms/histograms.xml
,
Jun 7 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6549357808ca40f6995aad3c67c8aa89400f3e15 commit 6549357808ca40f6995aad3c67c8aa89400f3e15 Author: kylechar <kylechar@chromium.org> Date: Thu Jun 07 23:17:47 2018 Don't increment LAUNCHED UMA for unsandboxed. We don't increment the crashed UMA for GPU_PROCESS_KIND_UNSANDBOXED. Don't increment the launched UMA either. Bug: 846333 Change-Id: I3f5845c3bd1b221640a11098ed7c5b7b1da2148b Reviewed-on: https://chromium-review.googlesource.com/1091492 Reviewed-by: Zhenyao Mo <zmo@chromium.org> Commit-Queue: kylechar <kylechar@chromium.org> Cr-Commit-Position: refs/heads/master@{#565455} [modify] https://crrev.com/6549357808ca40f6995aad3c67c8aa89400f3e15/content/browser/gpu/gpu_process_host.cc
,
Jun 12 2018
,
Jun 14 2018
|
||||
►
Sign in to add a comment |
||||
Comment 1 by danakj@chromium.org
, May 25 2018