New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 869419 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 3
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Chrome , Mac
Pri: 1
Type: Bug

Blocking:
issue 849639



Sign in to add a comment

GpuProcessHost GpuMode fallback not working correctly

Project Member Reported by kylec...@chromium.org, Jul 31

Issue description

I aliased the crash counts in GpuProcessHost::RecordProcessFallback() to get a bit more information for https://crbug.com/849639. With OOP-D after enough crashes we crash the browser intentionally. I would expect the recent crash counts for hardware_accelerated_recent_crash_count_, swiftshader_recent_crash_count_ and display_compositor_recent_crash_count_ to be 3/3/3 by the time we crash the browser process or potentially 0/3/3 if hardware acceleration is initially disabled through preferences or a command line flag. I was looking at crash dumps and found that wasn't always the case, for example I saw multiple crash counts that were 4/0/3. 

In RecordProcessCrash() when |hardware_accelerated_recent_crash_count_| hits 3 I would expect GpuDataManagerImpl::GetInstance()->FallBackToNextGpuMode() to be called and the next GpuProcessHost |mode_| to be GpuMode::SWIFTSHADER instead of GpuMode::HARDWARE_ACCELERATED. The next GPU process crash would increment |swiftshader_recent_crash_count_| and |hardware_accelerated_recent_crash_count_| should never hit 4.

So there are still some bug(s) in the fallback logic. I can see two ways this might happen:
1. The same GpuProcessHost instance calls RecordProcessCrash() more than once. If both GpuProcessHost::OnProcessLaunchFailed() and GpuProcessHost::OnProcessCrashed() could run this would possible. Alternatively if GpuProcessHost::OnProcessCrashed() was called more than once it would also happen.
2. A new GpuProcessHost is created before RecordProcessCrash() for the original GpuProcessHost. The new GpuProcessHost would have |mode_| GpuMode::HARDWARE_ACCELERATED instead of GpuMode::SWIFTSHADER and when it crashes it would increment |hardware_accelerated_recent_crash_count_|  to 4 (and trigger the fallback logic again).

I hope that (1) isn't possible. Looking at (2) I think it's possible if GpuProcessHost::Send() fails, which triggers GpuProcessHost::SendOutstandingReplies() and sets |valid_| to false. If something calls GpuProcessHost::Get() before GpuProcessHost::OnProcessCrashed() is called for the existing GpuProcessHost, then Get() will see that |valid_| is false for the existing GpuProcessHost instance and create a new instance immediately. The new GpuProcessHost instance won't have changed modes because the old instance doesn't know it's crashed yet and hasn't run RecordProcessCrash().

I *think* there is another problem with the fallback on Windows as well. Windows starts an unsandboxed GPU process to do info collection. RecordProcessFallback() is smart enough not to record UMA or trigger the fallback code if the unsandboxed GPU process crashes. However, GpuProcessHost::DidFailInitialize() doesn't have a check for unsandboxed process and if initialization fails it could trigger fallback unnecessarily? This wouldn't increment the crash counts, so it can't be the reason for the problem I observed in crash dumps, but it would run the fallback code and be problematic.
 
While GpuDataManagerImpl is thread-safe, GpuProcessHost is not. It seems we need to add a lock or PostTask to a single thread where all mode related codes run.


Good point on unsandboxed gpu process issue. It should be an easy fix by adding a sandboxed field to the GpuProcessHost constructor.
I don't think this is so much an issue needing a lock as just an ordering problem on the IO thread.

I was able to reproduce something like (2) pretty easily by introducing a crash on GPU process initialization. The following sequence can happen where the next GpuProcessHost instance is created before fallback code runs.

GpuProcessHost::OnProcessCrashed()
  GpuProcessHost::SendOutstandingReplies()
    GpuClientImpl::OnEstablishGpuChannel() (via callback)
      GpuClientImpl::EstablishGpuChannel()
        GpuProcessHost::Get()
        * Create a new GpuProcessHost with wrong GpuMode here
  GpuProcessHost::RecordProcessCrash()
  * Fallback logic happens here to change GpuMode

That is fixable by reordering GpuProcessHost::OnProcessCrashed(). In general an easy fix for running the fallback logic more than once GpuMode is to pass in the current mode to FallBackToNextGpuMode(), so that the fallback only happens once per mode?

Additionally we do have a field in GpuProcessHost for sandbox/unsandboxed, it's |kind_|, we just don't check it in GpuProcessHost::DidFailInitialize(). That should be a simple fix.
Project Member

Comment 3 by bugdroid1@chromium.org, Jul 31

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b90c7c8c335a2e2a4abdd7bde17a44f92c8b3a54

commit b90c7c8c335a2e2a4abdd7bde17a44f92c8b3a54
Author: kylechar <kylechar@chromium.org>
Date: Tue Jul 31 23:51:12 2018

Fix GPU process fallback logic.

1. In GpuProcessHost::OnProcessCrashed() record the process crash first.
   This means the GPU mode fallback will happen before a new GPU process
   is started.
2. Don't call FallBackToNextGpuMode() if GPU process initialization
   fails for an unsandboxed GPU process. The unsandboxed GPU is only
   used for collect information and it's failure doesn't indicate a need
   to change GPU modes.

Bug:  869419 
Change-Id: I8bd0a03268f0ea8809f3df8458d4e6a92db9391f
Reviewed-on: https://chromium-review.googlesource.com/1157164
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Commit-Queue: kylechar <kylechar@chromium.org>
Cr-Commit-Position: refs/heads/master@{#579625}
[modify] https://crrev.com/b90c7c8c335a2e2a4abdd7bde17a44f92c8b3a54/content/browser/gpu/gpu_process_host.cc

Labels: Merge-Request-69
Requesting merge of #3 to M69. It fixes two bugs with GPU mode fallback when the GPU process crashes/fails. It's also pretty trivial/low risk.
Project Member

Comment 5 by sheriffbot@chromium.org, Aug 3

Labels: -Merge-Request-69 Merge-Review-69 Hotlist-Merge-Review
This bug requires manual review: M69 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), kariahda@(iOS), cindyb@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Is the change looking good so far in canary and fully safe to merge?
Yes, the change looks good in canary so far and should be safe.
Labels: -Merge-Review-69 Merge-Approved-69
Approving merge to M69 branch 3497 based on comment #4 and #7. Please merge ASAP. Thank you.
Project Member

Comment 9 by bugdroid1@chromium.org, Aug 3

Labels: -merge-approved-69 merge-merged-3497
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/40337e43d620868cfcf7e39fbca2f6e412d5c8ad

commit 40337e43d620868cfcf7e39fbca2f6e412d5c8ad
Author: kylechar <kylechar@chromium.org>
Date: Fri Aug 03 14:36:00 2018

Fix GPU process fallback logic.

1. In GpuProcessHost::OnProcessCrashed() record the process crash first.
   This means the GPU mode fallback will happen before a new GPU process
   is started.
2. Don't call FallBackToNextGpuMode() if GPU process initialization
   fails for an unsandboxed GPU process. The unsandboxed GPU is only
   used for collect information and it's failure doesn't indicate a need
   to change GPU modes.

Bug:  869419 
Change-Id: I8bd0a03268f0ea8809f3df8458d4e6a92db9391f
Reviewed-on: https://chromium-review.googlesource.com/1157164
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Commit-Queue: kylechar <kylechar@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#579625}(cherry picked from commit b90c7c8c335a2e2a4abdd7bde17a44f92c8b3a54)
Reviewed-on: https://chromium-review.googlesource.com/1162080
Reviewed-by: kylechar <kylechar@chromium.org>
Cr-Commit-Position: refs/branch-heads/3497@{#379}
Cr-Branched-From: 271eaf50594eb818c9295dc78d364aea18c82ea8-refs/heads/master@{#576753}
[modify] https://crrev.com/40337e43d620868cfcf7e39fbca2f6e412d5c8ad/content/browser/gpu/gpu_process_host.cc

Status: Fixed (was: Assigned)

Sign in to add a comment