Issue metadata
Sign in to add a comment
|
Improve GPU watchdog to postpone crashing when I/O queue is saturated |
||||||||||||||||
Issue descriptionThe idea is to try to do an unbuffered write in GPU watchdog just before crashing the process. If the process is slow due to heavy I/O this should give it more time to unblock. This should theoretically help with some of GPU hangs.
,
Jun 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4015b488f743a7399e3362fd49917f494ff7caaf commit 4015b488f743a7399e3362fd49917f494ff7caaf Author: stanisc <stanisc@chromium.org> Date: Fri Jun 10 17:47:04 2016 GPU Watchdog to check I/O before terminating The idea is to try to do an unbuffered write in GPU watchdog just before crashing the process. If the process is slow due to heavy I/O this should give it more time to unblock. This should theoretically help with some of GPU hangs. BUG=612607 Review-Url: https://codereview.chromium.org/1980263002 Cr-Commit-Position: refs/heads/master@{#399222} [modify] https://crrev.com/4015b488f743a7399e3362fd49917f494ff7caaf/content/browser/gpu/gpu_process_host.cc [add] https://crrev.com/4015b488f743a7399e3362fd49917f494ff7caaf/content/common/gpu_watchdog_utils.cc [add] https://crrev.com/4015b488f743a7399e3362fd49917f494ff7caaf/content/common/gpu_watchdog_utils.h [modify] https://crrev.com/4015b488f743a7399e3362fd49917f494ff7caaf/content/content_common.gypi [modify] https://crrev.com/4015b488f743a7399e3362fd49917f494ff7caaf/content/gpu/gpu_watchdog_thread.cc [modify] https://crrev.com/4015b488f743a7399e3362fd49917f494ff7caaf/content/gpu/gpu_watchdog_thread.h
,
Jun 15 2016
,
Jun 15 2016
Note that the CL above caused flakiness in the context_lost_tests on the GPU bots, affecting the commit queue as well as some of the waterfall bots. See Issue 619196 . A revert is in progress in https://codereview.chromium.org/2071613002/ .
,
Jun 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/40684e97fcc0cc24a2504f0dc2679a3b88557d9b commit 40684e97fcc0cc24a2504f0dc2679a3b88557d9b Author: kbr <kbr@chromium.org> Date: Wed Jun 15 20:18:29 2016 Revert of GPU Watchdog to check I/O before terminating GPU process (patchset #5 id:120001 of https://codereview.chromium.org/1980263002/ ) Reason for revert: This CL seems to have caused intermittent assertion failures in the context_lost_tests on the commit queue and reliable assertion failures on some of the GPU bots. See http://crbug.com/619196 . Original issue's description: > GPU Watchdog to check I/O before terminating > > The idea is to try to do an unbuffered write in GPU watchdog > just before crashing the process. If the process is slow due > to heavy I/O this should give it more time to unblock. > > This should theoretically help with some of GPU hangs. > > BUG=612607 > > Committed: https://crrev.com/4015b488f743a7399e3362fd49917f494ff7caaf > Cr-Commit-Position: refs/heads/master@{#399222} TBR=jbauman@chromium.org,wfh@chromium.org,nick@chromium.org,manzagop@chromium.org,pmonette@chromium.org,brucedawson@chromium.org,stanisc@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG=612607 Review-Url: https://codereview.chromium.org/2071613002 Cr-Commit-Position: refs/heads/master@{#399998} [modify] https://crrev.com/40684e97fcc0cc24a2504f0dc2679a3b88557d9b/content/browser/gpu/gpu_process_host.cc [delete] https://crrev.com/cd6c0ba34a71547db27d1abf9016d86e13e1b7ea/content/common/gpu_watchdog_utils.cc [delete] https://crrev.com/cd6c0ba34a71547db27d1abf9016d86e13e1b7ea/content/common/gpu_watchdog_utils.h [modify] https://crrev.com/40684e97fcc0cc24a2504f0dc2679a3b88557d9b/content/content_common.gypi [modify] https://crrev.com/40684e97fcc0cc24a2504f0dc2679a3b88557d9b/content/gpu/gpu_watchdog_thread.cc [modify] https://crrev.com/40684e97fcc0cc24a2504f0dc2679a3b88557d9b/content/gpu/gpu_watchdog_thread.h
,
Jun 16 2016
This change ran on builds 53.0.2765.0 - 53.0.2768.0 It looks like it had a positive impact on the crash rate. For MessagePumpForGpu::WaitForWork, for the 5 builds before the fix the CPM was: 7.358, 11.46, 11.282, 9.809, 10.437 In the 5 builds after the fix the CPM was: 5.911, 7.101, 7.324, 3.831, 5.327 I looked at a number of crash dumps for [GPU hang] MessagePumpForGpu::WaitForWork and other [GPU hang] crash signatures. The interesting number captured there is the duration of I/O check. That is how long did it take to write 32 bytes of data to the temp file and flush the changes to disk. Here are some examples of I/O check duration (in seconds): MessagePumpForGpu::WaitForWork: 2.605, 2.243, 1.023, 0.617, 0.543, 0.287, 0.231, 0.166, 0.129, 0.083, 0.072, 0.044, 0.018, 0.001. d3dcompiler_47.dll: 1.812, 0.621, 0.107 CreateD3DDevManager: 0.487 MessagePumpForGpu::DoRunLoop (with PeekMessage at the top): 0.008, 0.005, 0.004 It is interesting the I/O check duration was consistently low for hangs with PeekMessage call at the top of the call stack. Those looks like true deadlocks to me.
,
Aug 12 2016
Results of investigating another batch of crash dumps in one of M53 Dev builds. For MessagePumpForGpu::WaitForWork - 52.6% of cases had I/O check duration longer than 1 second, the longest duration was 5.6 seconds, and the average - 1.4 seconds (out of 19 samples). For [GPU hang] overall (e.g. GPU hangs with all signatures including WaitForWork) - 40% of cases had I/O check duration longer than 1 second, the longest I/O check was 8.9 sec, and the average - 1.2 sec (out of 68 samples).
,
Aug 26 2016
If we end up re-implementing this, the code in gpu_process_host.cc would need to make sure that the temp path used with the sandbox rule isn't a reparse point.
Previously it has been failing here:
if (!PreProcessName(&mod_name)) {
// The path to be added might contain a reparse point.
>>>>NOTREACHED();
return false;
}
,
Nov 1 2017
,
Jan 10
Downgrading P2s that haven't been modified in more than 6 months, which have no component or owner. |
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by bugdroid1@chromium.org
, Jun 10 2016