GpuProcessHost GpuMode fallback not working correctly |
||||||
Issue descriptionI aliased the crash counts in GpuProcessHost::RecordProcessFallback() to get a bit more information for https://crbug.com/849639. With OOP-D after enough crashes we crash the browser intentionally. I would expect the recent crash counts for hardware_accelerated_recent_crash_count_, swiftshader_recent_crash_count_ and display_compositor_recent_crash_count_ to be 3/3/3 by the time we crash the browser process or potentially 0/3/3 if hardware acceleration is initially disabled through preferences or a command line flag. I was looking at crash dumps and found that wasn't always the case, for example I saw multiple crash counts that were 4/0/3. In RecordProcessCrash() when |hardware_accelerated_recent_crash_count_| hits 3 I would expect GpuDataManagerImpl::GetInstance()->FallBackToNextGpuMode() to be called and the next GpuProcessHost |mode_| to be GpuMode::SWIFTSHADER instead of GpuMode::HARDWARE_ACCELERATED. The next GPU process crash would increment |swiftshader_recent_crash_count_| and |hardware_accelerated_recent_crash_count_| should never hit 4. So there are still some bug(s) in the fallback logic. I can see two ways this might happen: 1. The same GpuProcessHost instance calls RecordProcessCrash() more than once. If both GpuProcessHost::OnProcessLaunchFailed() and GpuProcessHost::OnProcessCrashed() could run this would possible. Alternatively if GpuProcessHost::OnProcessCrashed() was called more than once it would also happen. 2. A new GpuProcessHost is created before RecordProcessCrash() for the original GpuProcessHost. The new GpuProcessHost would have |mode_| GpuMode::HARDWARE_ACCELERATED instead of GpuMode::SWIFTSHADER and when it crashes it would increment |hardware_accelerated_recent_crash_count_| to 4 (and trigger the fallback logic again). I hope that (1) isn't possible. Looking at (2) I think it's possible if GpuProcessHost::Send() fails, which triggers GpuProcessHost::SendOutstandingReplies() and sets |valid_| to false. If something calls GpuProcessHost::Get() before GpuProcessHost::OnProcessCrashed() is called for the existing GpuProcessHost, then Get() will see that |valid_| is false for the existing GpuProcessHost instance and create a new instance immediately. The new GpuProcessHost instance won't have changed modes because the old instance doesn't know it's crashed yet and hasn't run RecordProcessCrash(). I *think* there is another problem with the fallback on Windows as well. Windows starts an unsandboxed GPU process to do info collection. RecordProcessFallback() is smart enough not to record UMA or trigger the fallback code if the unsandboxed GPU process crashes. However, GpuProcessHost::DidFailInitialize() doesn't have a check for unsandboxed process and if initialization fails it could trigger fallback unnecessarily? This wouldn't increment the crash counts, so it can't be the reason for the problem I observed in crash dumps, but it would run the fallback code and be problematic.
,
Jul 31
I don't think this is so much an issue needing a lock as just an ordering problem on the IO thread.
I was able to reproduce something like (2) pretty easily by introducing a crash on GPU process initialization. The following sequence can happen where the next GpuProcessHost instance is created before fallback code runs.
GpuProcessHost::OnProcessCrashed()
GpuProcessHost::SendOutstandingReplies()
GpuClientImpl::OnEstablishGpuChannel() (via callback)
GpuClientImpl::EstablishGpuChannel()
GpuProcessHost::Get()
* Create a new GpuProcessHost with wrong GpuMode here
GpuProcessHost::RecordProcessCrash()
* Fallback logic happens here to change GpuMode
That is fixable by reordering GpuProcessHost::OnProcessCrashed(). In general an easy fix for running the fallback logic more than once GpuMode is to pass in the current mode to FallBackToNextGpuMode(), so that the fallback only happens once per mode?
Additionally we do have a field in GpuProcessHost for sandbox/unsandboxed, it's |kind_|, we just don't check it in GpuProcessHost::DidFailInitialize(). That should be a simple fix.
,
Jul 31
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b90c7c8c335a2e2a4abdd7bde17a44f92c8b3a54 commit b90c7c8c335a2e2a4abdd7bde17a44f92c8b3a54 Author: kylechar <kylechar@chromium.org> Date: Tue Jul 31 23:51:12 2018 Fix GPU process fallback logic. 1. In GpuProcessHost::OnProcessCrashed() record the process crash first. This means the GPU mode fallback will happen before a new GPU process is started. 2. Don't call FallBackToNextGpuMode() if GPU process initialization fails for an unsandboxed GPU process. The unsandboxed GPU is only used for collect information and it's failure doesn't indicate a need to change GPU modes. Bug: 869419 Change-Id: I8bd0a03268f0ea8809f3df8458d4e6a92db9391f Reviewed-on: https://chromium-review.googlesource.com/1157164 Reviewed-by: Zhenyao Mo <zmo@chromium.org> Commit-Queue: kylechar <kylechar@chromium.org> Cr-Commit-Position: refs/heads/master@{#579625} [modify] https://crrev.com/b90c7c8c335a2e2a4abdd7bde17a44f92c8b3a54/content/browser/gpu/gpu_process_host.cc
,
Aug 3
Requesting merge of #3 to M69. It fixes two bugs with GPU mode fallback when the GPU process crashes/fails. It's also pretty trivial/low risk.
,
Aug 3
This bug requires manual review: M69 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: amineer@(Android), kariahda@(iOS), cindyb@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 3
Is the change looking good so far in canary and fully safe to merge?
,
Aug 3
Yes, the change looks good in canary so far and should be safe.
,
Aug 3
Approving merge to M69 branch 3497 based on comment #4 and #7. Please merge ASAP. Thank you.
,
Aug 3
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/40337e43d620868cfcf7e39fbca2f6e412d5c8ad commit 40337e43d620868cfcf7e39fbca2f6e412d5c8ad Author: kylechar <kylechar@chromium.org> Date: Fri Aug 03 14:36:00 2018 Fix GPU process fallback logic. 1. In GpuProcessHost::OnProcessCrashed() record the process crash first. This means the GPU mode fallback will happen before a new GPU process is started. 2. Don't call FallBackToNextGpuMode() if GPU process initialization fails for an unsandboxed GPU process. The unsandboxed GPU is only used for collect information and it's failure doesn't indicate a need to change GPU modes. Bug: 869419 Change-Id: I8bd0a03268f0ea8809f3df8458d4e6a92db9391f Reviewed-on: https://chromium-review.googlesource.com/1157164 Reviewed-by: Zhenyao Mo <zmo@chromium.org> Commit-Queue: kylechar <kylechar@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#579625}(cherry picked from commit b90c7c8c335a2e2a4abdd7bde17a44f92c8b3a54) Reviewed-on: https://chromium-review.googlesource.com/1162080 Reviewed-by: kylechar <kylechar@chromium.org> Cr-Commit-Position: refs/branch-heads/3497@{#379} Cr-Branched-From: 271eaf50594eb818c9295dc78d364aea18c82ea8-refs/heads/master@{#576753} [modify] https://crrev.com/40337e43d620868cfcf7e39fbca2f6e412d5c8ad/content/browser/gpu/gpu_process_host.cc
,
Aug 3
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by zmo@chromium.org
, Jul 31