New issue
Advanced search Search tips

Issue 846333 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on:
issue 852063

Blocking:
issue 787097



Sign in to add a comment

GPU Process Lifetime Event UMA problems

Project Member Reported by kylec...@chromium.org, May 24 2018

Issue description

We noticed some oddities in the UMA data for GPU.GPUProcessLifetimeEvents, GPU.SwiftShaderLifetimeEvents, GPU.DisplayCompositorLifetimeEvents during the OOP-D finch trial.

https://uma.googleplex.com/p/chrome/variations/?sid=e68a5511c891c0f3139ea7287ef0ee0b

I've taken a look at the code and roughly understand how it works. On GPU process launch, for any reason, GPU.GPUProcessLifetimeEvents.LAUNCHED is incremented. If the GPU process, which is running GPU compositing crashes GPU.GPUProcessLifetimeEvents.DIED_FIRST_TIME is supposed to be incremented. However, the order is reversed and count is incremented then UMA recorded so GPU.GPUProcessLifetimeEvents.DIED_SECOND_TIME is increment instead of DIED_FIRST_TIME. This is a bug.

The GPU process can crash three times (recently, I'll come back to this) before we give up on GPU compositing and switch to software compositing with SwiftShader GL. So after three crashes we'd have the following histograms:

GPU.GPUProcessLifetimeEvents.LAUNCHED = 4
GPU.GPUProcessLifetimeEvents.DIED_SECOND_TIME = 1
GPU.GPUProcessLifetimeEvents.DIED_THIRD_TIME = 1
GPU.GPUProcessLifetimeEvents.DIED_FOURTH_TIME = 1

The fourth launch of the GPU process is (likely) only going to be for SwiftShader GL with software compositing. There is a different UMA histogram GPU.SwiftShaderLifetimeEvents.LAUNCHED that probably should have been incremented instead of GPU.GPUProcessLifetimeEvents.LAUNCHED.  If another GPU process crash happens now then GPU.SwiftShaderLifetimeEvents.DIED_FIRST_TIME is incremented (which is correct). We'd have the following histograms now after four crashes:

GPU.GPUProcessLifetimeEvents.LAUNCHED = 5
GPU.GPUProcessLifetimeEvents.DIED_SECOND_TIME = 1
GPU.GPUProcessLifetimeEvents.DIED_THIRD_TIME = 1
GPU.GPUProcessLifetimeEvents.DIED_FOURTH_TIME = 1
GPU.SwiftShaderLifetimeEvents.DIED_FIRST_TIME = 1

There is a similar fallback behaviour with SwiftShader GL where GPU process can crash 3 times before we give up on it and just use software compositing with WebGL disabled. For non-OOP-D we wouldn't start the GPU process again at this point. After six GPU process crashes have the following histograms:

GPU.GPUProcessLifetimeEvents.LAUNCHED = 6
GPU.GPUProcessLifetimeEvents.DIED_SECOND_TIME = 1
GPU.GPUProcessLifetimeEvents.DIED_THIRD_TIME = 1
GPU.GPUProcessLifetimeEvents.DIED_FOURTH_TIME = 1
GPU.SwiftShaderLifetimeEvents.DIED_FIRST_TIME = 1
GPU.SwiftShaderLifetimeEvents.DIED_SECOND_TIME = 1
GPU.SwiftShaderLifetimeEvents.DIED_THIRD_TIME = 1

What I think the histograms should look at this point is this instead are as follows:

GPU.GPUProcessLifetimeEvents.LAUNCHED = 3
GPU.GPUProcessLifetimeEvents.DIED_FIRST_TIME = 1
GPU.GPUProcessLifetimeEvents.DIED_SECOND_TIME = 1
GPU.GPUProcessLifetimeEvents.DIED_THIRD_TIME = 1
GPU.SwiftShaderLifetimeEvents.LAUNCHED = 3
GPU.SwiftShaderLifetimeEvents.DIED_FIRST_TIME = 1
GPU.SwiftShaderLifetimeEvents.DIED_SECOND_TIME = 1
GPU.SwiftShaderLifetimeEvents.DIED_THIRD_TIME = 1

Another thing to note is that we forgive one crash per hour when deciding if falling back to software compositing is necessary. So GPU.GPUProcessLifetimeEvents.5, GPU.GPUProcessLifetimeEvents.6, etc. can also get incremented since we record the actual crash count not the recent crash count.

Another thing to note is that with OOP-D the GPU process is still launched for pure software compositing. There is another UMA histogram category added GPU.DisplayCompositorLifetimeEvents. It works similar to GPU.SwiftShaderLifetimeEvents where the LAUNCHED is attributed GPU.GPUProcessLifetimeEvents.LAUNCHED but the crash bucket is correct.

I can fix the bucketing if that sounds reasonable? It will be problematic for historical comparisons however.
 

Comment 1 by danakj@chromium.org, May 25 2018

Sounds good to me, thanks for the explanation.

Will DisplayCompositorLifetimeEvents.LAUNCHED only be incremented when we're doing software compositing no webgl, ie when GPUProcessLifetimeEvents.LAUNCHED and SwiftShaderLifetimeEvents.LAUNCHED don't change? Or should we have something like that (instead) for SoftwareCompositorOnlyLifetimeEvents or so?
>Will DisplayCompositorLifetimeEvents.LAUNCHED only be incremented when we're doing software compositing no webgl, ie when GPUProcessLifetimeEvents.LAUNCHED and SwiftShaderLifetimeEvents.LAUNCHED don't change? 

Yes. The histogram names aren't ideal but I can also improve the descriptions so it's clear the following:

GPU.GPUProcessLifetimeEvents:
GPU process running for GPU compositing (and display compositor if OOP-D).

GPU.SwiftShaderLifetimeEvents:
GPU process running for SwiftShader WebGL (and display compositor if OOP-D).

GPU.GPUProcessLifetimeEvents:
GPU process running only for display compositor (OOP-D only).

Comment 3 by danakj@chromium.org, May 25 2018

I assume u mean DisplayCompositorLifetimeEvents for the 3rd one, so ok cool thanks.
Oops, yes, DisplayCompositorLifetimeEvents is what I meant for the third one.
Cc: capn@chromium.org vmi...@chromium.org
+capn and vmiura who are listed as owners for the UMA histograms. Just an FYI about the changes I intended to make.
Project Member

Comment 6 by bugdroid1@chromium.org, Jun 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2dd7c4195127ffe7f4445aee9ffe6a8636f86f38

commit 2dd7c4195127ffe7f4445aee9ffe6a8636f86f38
Author: kylechar <kylechar@chromium.org>
Date: Tue Jun 05 01:59:07 2018

Fix GPU process lifetime UMA histograms.

There are a couple issues with the GPU process UMA histograms that this
CL fixes.

1. GPU.GPUProcessLifetimeEvents, DIED_FIRST_TIME was never incremented.
   The first crash had DIED_SECOND_TIME incremented instead. Increment
   crash count after UMA output instead.
2. When the GPU process was started for SwiftShader WebGL or to run the
   display compositor with OOP-D the wrong histogram had LAUNCHED
   incremented. Make launch and crash lifetime events use the same
   histogram.
3. Deprecate old histograms since fixes will make data difficult to
   compare. Add new histograms with improved descriptions.
4. Change histograms to record the recent crash count instead of total
   crash count. This provides information about outcome of the crash,
   for example if a feature gets disabled or not.
5. Fix GPUProcessLifetimeEvent enum description.
6. Mark GPU.SoftwareRendererLifetimeEvents as obsolete. As far as I can
   tell there is no UMA data ever recorded for it.
7. Make GpuProcessHost::GpuCrashCount() return the total number of GPU
   crashes for about://gpu.

Bug:  846333 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: Icb74ec9c52b450c8c5154f513e825d224bb7ba9e
Reviewed-on: https://chromium-review.googlesource.com/1075513
Reviewed-by: Mark Pearson <mpearson@chromium.org>
Reviewed-by: Mohsen Izadi <mohsen@chromium.org>
Reviewed-by: Alexis Hétu <sugoi@chromium.org>
Reviewed-by: Victor Miura <vmiura@chromium.org>
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Commit-Queue: kylechar <kylechar@chromium.org>
Cr-Commit-Position: refs/heads/master@{#564345}
[modify] https://crrev.com/2dd7c4195127ffe7f4445aee9ffe6a8636f86f38/content/browser/gpu/gpu_process_host.cc
[modify] https://crrev.com/2dd7c4195127ffe7f4445aee9ffe6a8636f86f38/content/browser/gpu/gpu_process_host.h
[modify] https://crrev.com/2dd7c4195127ffe7f4445aee9ffe6a8636f86f38/tools/metrics/histograms/enums.xml
[modify] https://crrev.com/2dd7c4195127ffe7f4445aee9ffe6a8636f86f38/tools/metrics/histograms/histograms.xml

Project Member

Comment 7 by bugdroid1@chromium.org, Jun 7 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6549357808ca40f6995aad3c67c8aa89400f3e15

commit 6549357808ca40f6995aad3c67c8aa89400f3e15
Author: kylechar <kylechar@chromium.org>
Date: Thu Jun 07 23:17:47 2018

Don't increment LAUNCHED UMA for unsandboxed.

We don't increment the crashed UMA for GPU_PROCESS_KIND_UNSANDBOXED. Don't
increment the launched UMA either.

Bug:  846333 
Change-Id: I3f5845c3bd1b221640a11098ed7c5b7b1da2148b
Reviewed-on: https://chromium-review.googlesource.com/1091492
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Commit-Queue: kylechar <kylechar@chromium.org>
Cr-Commit-Position: refs/heads/master@{#565455}
[modify] https://crrev.com/6549357808ca40f6995aad3c67c8aa89400f3e15/content/browser/gpu/gpu_process_host.cc

Blockedon: 852063
Status: Fixed (was: Assigned)

Sign in to add a comment