Issue metadata
Sign in to add a comment
|
Crash: [Shutdown hang] gpu::GpuChannelHost::Send |
||||||||||||||||||||||
Issue descriptionCrash Signature: [Shutdown hang] gpu::GpuChannelHost::Send Process Type: Browser Platform: Mac Channel: Dev Version: 53.0.2763.0 Distinct Clients: 1 CPM: 0.20 Crash Reports: 1 Median Uptime: shutdown Infected Clients: 0.0% Sample Reports: https://crash.corp.google.com/browse?q=reportid=%270e205e3c00000000%27 https://crash.corp.google.com/browse?q=reportid=%273c57d3dc00000000%27 https://crash.corp.google.com/browse?q=reportid=%27b9f70ddc00000000%27 https://crash.corp.google.com/browse?q=reportid=%2735c856fa00000000%27 https://crash.corp.google.com/browse?q=reportid=%27e7ec7afa00000000%27 Crash Link: https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Mac%27%20AND%20product.version%3D%2753.0.2763.0%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27%5BShutdown%20hang%5D%20gpu%3A%3AGpuChannelHost%3A%3ASend%27 Crash Link (with version impact distribution): https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Mac%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27%5BShutdown%20hang%5D%20gpu%3A%3AGpuChannelHost%3A%3ASend%27 Crash Stacktrace: 0x43507378 (0x105ad061b) #1 0x105b3ac77 in base::WaitableEvent::WaitMany base/synchronization/waitable_event_posix.cc:283 #2 0x10671d952 in IPC::SyncChannel::WaitForReply ipc/ipc_sync_channel.cc:524 #3 0x10671d63c in IPC::SyncChannel::Send ipc/ipc_sync_channel.cc:505 #4 0x106a58c16 in gpu::GpuChannelHost::Send gpu/ipc/client/gpu_channel_host.cc:121 #5 0x106a53d9f in gpu::CommandBufferProxyImpl::DisconnectChannel gpu/ipc/client/command_buffer_proxy_impl.cc:868 #6 0x106a53b16 in gpu::CommandBufferProxyImpl::~CommandBufferProxyImpl gpu/ipc/client/command_buffer_proxy_impl.cc:109 #7 0x106a53e4d in <name omitted> gpu/ipc/client/command_buffer_proxy_impl.cc:107 #8 0x10678f056 in content::ContextProviderCommandBuffer::~ContextProviderCommandBuffer third_party/llvm-build/Release+Asserts/include/c++/v1/memory:2540 #9 0x10678f15d in content::ContextProviderCommandBuffer::~ContextProviderCommandBuffer content/common/gpu/client/context_provider_command_buffer.cc:88 #10 0x10934bd07 in content::GpuProcessTransportFactory::~GpuProcessTransportFactory base/memory/ref_counted.h:196 #11 0x10934be34 in non-virtual thunk to content::GpuProcessTransportFactory::~GpuProcessTransportFactory content/browser/compositor/gpu_process_transport_factory.cc:180 #12 0x10934ede7 in content::ImageTransportFactory::Terminate content/browser/compositor/image_transport_factory.cc:50 #13 0x1090156dc in content::BrowserMainLoop::ShutdownThreadsAndCleanUp content/browser/browser_main_loop.cc:1033 #14 0x10901788d in content::BrowserMainRunnerImpl::Shutdown content/browser/browser_main_runner.cc:210 #15 0x109011818 in content::BrowserMain content/browser/browser_main.cc:48 #16 0x105a9faaf in content::ContentMainRunnerImpl::Run content/app/content_main_runner.cc:787 #17 0x105a9ecf5 in content::ContentMain content/app/content_main.cc:20 #18 0x10556bb69 in ChromeMain chrome/app/chrome_main.cc:84 #19 0x1054edd41 in main chrome/app/chrome_exe_main_mac.c:87 #20 0x1054edb23 in start Reporter: beherad
,
Jun 15 2016
Users experienced this crash on the following builds: Mac Dev 53.0.2763.0 - 0.55 CPM, 5 reports, 4 clients (signature [Shutdown hang] gpu::GpuChannelHost::Send) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Jun 15 2016
This is not related to my CL. +piman : Can you find the right owner for this ?
,
Jun 15 2016
It's very low rate on 51. This is not a crash, but a hang on shutdown. This is waiting for the GPU process to finish, so I suspect the GPU itself is hung. My suspicion for the uptick in 53 is Eric's https://codereview.chromium.org/2028303002 (fits in the 53.0.2757-53.0.2763.0, because it looks like there are possible cases where the fence doesn't trigger for unknown reasons). Some mitigation just landed (https://codereview.chromium.org/2064853002 ), so maybe this will resolve itself? Another possibility would be Erik's changes around glDescheduleUntilFinishedCHROMIUM (which would also delay the GpuChannelMsg_DestroyCommandBuffer), but those were reverted and relanded a couple of times, and it doesn't seem to fit the uptick pattern. An alternative theory is that there is a regression at the IPC level, where maybe sync messages are not properly cancelled in some cases if either the other end is terminated (GPU process terminated) - but I don't think should be the case at this stage of shutdown, unless the GPU process crashed (and I don't see evidence of that).
,
Jun 16 2016
The mitigation I landed for the other issue (https://codereview.chromium.org/2064853002) doesn't seem to have fixed this issue, so I wonder if this is a different issue? (my mitigation *should* stop us from ever waiting more than 32ms - would the browser every trigger a hang due to a 32ms delay?)
,
Jun 16 2016
Users experienced this crash on the following builds: Mac Beta 52.0.2743.33 - 0.11 CPM, 6 reports, 6 clients (signature [Shutdown hang] gpu::GpuChannelHost::Send) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Jun 16 2016
I think the browser timeout is much longer than that (seconds).
,
Jun 16 2016
FYI, a bit more data:
Looking at Canary crashes only (for more granularity), we see the following:
| 2757 - 1 crash
2758 - 0 crashes < My change landed
2759 - 0 crashes
2760 - 0 crashes
2761 - 0 crashes
2762 - 0 crashes
| 2763 - 1 crash
2764 - 0 crashes
|| 2765 - 2 crashes
|||| 2766 - 4 crashes
|| 2767 - 2 crashes < My mitigation landed
||| 2768 - 3 crashes
This doesn't quite line up with my change (we had 5 0 crash builds after my CL landed). Instead, I'd suspect some change between 2761/2762-2763
Also interesting is that Beta seems to spike as well (https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Mac%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27%5BShutdown%20hang%5D%20gpu%3A%3AGpuChannelHost%3A%3ASend%27%20AND%20custom_data.ChromeCrashProto.channel%3D%27beta%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D)
Beta goes from 0 reports in 2743.24 seems to 6 reports at 2743.33 - this may indicate that the regression was merged into Beta between Jun 02 (2743.24) and Jun 09 (2743.33). So my guess is we're looking at something that was checked in between 2761.00 and 2763.00, and was then merged to Beta between 2743.24 and 2743.33. I did a manual scan of these changes and nothing stood out :/ So not 100% sure.
,
Jun 16 2016
Ok, I have a new theory here - if you look at the current crash rates, you'll see that the crashes occur on 3 OSs: 10.1 (puma) <<<<<< Misreported - this is actually 12.12 Sierra Beta 10.11 (el capitan) 10.10 (yosemite) On 10.11 and 10.10, the crash rate is fairly steady (no large spike in counts, sub 0.75% browser crash percentages): https://crash.corp.google.com/browse?q=product.name%3D%27Chrome_Mac%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27%5BShutdown%20hang%5D%20gpu%3A%3AGpuChannelHost%3A%3ASend%27%20AND%20custom_data.ChromeCrashProto.os_family!%3D%2710.1%20(Puma)%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D If you look at 10.12 only, we see that the crash rate is much higher - it's consistently at 2 or 3 crashes a day, which, given the smaller 10.12 population accounts for 50-85% browser crash percentage. Taking these two things together, we see that 10.12 only started appearing this week (when Apple released it at WWDC), and because of this, the impact of 10.12's much higher crash rate is only seen in recent chrome builds. This is why Canary doesn't seem to spike until 2765/2766 (releases from Sun/Mon, when sierra became available to devs), while Dev spikes at 2763 (the dev release available at that time). Given this, I think the issue is 10.12 specific. Looking at 53.0.2763.0, this is the #1 browser crash on 10.12 at 87.5%. In comparison, on 10.11/10.10, this is the #64 browser crash at 0.53%. Given the really high crash rate on 10.12, I'm hoping this is reproducible. I'll try to get a 10.12 machine to experiment with.
,
Jun 16 2016
Thank you, Eric, great find. I think you're spot on.
,
Jun 20 2016
,
Jun 29 2016
Users experienced this crash on the following builds: Mac Stable 51.0.2704.106 - 0.07 CPM, 4 reports, 4 clients (signature [Shutdown hang] gpu::GpuChannelHost::Send) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Jul 1 2016
This crash has high impact on Chrome's stability. Signature: [Shutdown hang] gpu::GpuChannelHost::Send. Channel: canary. Platform: mac. Labeling issue 620259 with Pri-0. Labeling issue 620259 with ReleaseBlock-Dev. If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Jul 1 2016
I set up a mac Sierra machine this week, I'll start investigating the issue. Not sure that like Pri-0 / Release block dev makes sense for a non-regression that only affects a pre-release OS?
,
Jul 3 2016
Users experienced this crash on the following builds: Mac Canary 54.0.2786.0 - 6.66 CPM, 5 reports, 5 clients (signature [Shutdown hang] gpu::GpuChannelHost::Send) If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates. - Go/Fracas
,
Jul 4 2016
I changed this to Pri-1, RB-B. Note that we may have a significant users on M53 by the time Sierra hits public, so we do need to get it done for M53.
,
Jul 6 2016
This is very reproducible - just launch Chrome under Sierra and ctrl+Q - may take 2 or 3 attempts, but should repro.
,
Jul 6 2016
This looks like an IPC issue - I added tracing to my binary and am seeing the following: 1) Browser sends GpuChannelMessage_DestroyCommandBuffer 2) GPU Proc receives message and shuts down (no hang during shutdown/etc...) 3) GPU Proc believes it successfully sends the response (traced this down through the various levels to ChannelPosix::ProcessOutgoingMessages, where the fn believes it successfully sends the message without error). 4) Browser process appears to never get the response. I'll look at the browser process next to see why the message seems to be dropped. However, it might make sense for a mac IPC expert to take a look at this.
,
Jul 6 2016
Adding mark@ and rsesek@ to help triage this Sierra issue.
,
Jul 7 2016
quick note: in comment 17, I mean cmd+Q, not ctrl+Q (basically just anything that quits Chrome).
,
Jul 7 2016
,
Jul 7 2016
Here (attached) is how I see the browser’s main thread getting stuck. As Eric found, I’m able to reproduce this in about ¼–⅓ of all quits. All child processes except for the GPU process are gone. The GPU process has a main thread sitting idly in its MessagePumpCFRunLoop run loop, an idle work queue thread, and a Chrome_ChildIOThread sitting idly in its MessagePumpLibevent run loop.
,
Jul 7 2016
,
Jul 8 2016
Here's what I know so far: 1. I've confirmed that the GPU process is writing the response to the underlying socket. (Similar to c#18, but I traced the message all the way to sendmsg()). 2. The browser process waits forever for the message. The message is never parsed by IPC::ChannelReader. This suggests that there may be an issue with the underlying event waiting mechanism. Note the scary warning during startup: """ [warn] kq_init: detected broken kqueue; not using.: Undefined error: 0 https://bugs.chromium.org/p/chromium/issues/detail?id=626534 """ 3. rockot@ has been making non-trivial changes to SyncChannel in the last two weeks, including reverts/relands. They appear to be independent of the problem, since the issue also occurs on M51. The latest reland https://codereview.chromium.org/2101163002 does not fix the problem.
,
Jul 8 2016
Bug 626534 comment 2 ought to contain the fix to point 2 in this bug’s comment 24, and may also be the key to the fix for this bug.
,
Jul 8 2016
Making mark's suggested change in Bug 626534 removes the warning, and also fixes the hang on shutdown bug.
,
Jul 8 2016
Note that my change to SyncChannel have no effect on the underlying interprocess I/O; it only affects the mechanism to synchronize between the I/O thread and the SyncChannel's owning thread. Having said that, I have no interesting ideas about what could be causing this.
,
Jul 8 2016
,
Jul 8 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3b07cd446f6bf33618ebae11ca68273b7a0de2f8 commit 3b07cd446f6bf33618ebae11ca68273b7a0de2f8 Author: erikchen <erikchen@chromium.org> Date: Fri Jul 08 17:06:18 2016 Fix a logic bug in kqueue.c. Remove an unnecessary workaround for OS X 10.4 from kqueue.c. It was causing problems on macOS Sierra. All credit for this CL goes to mark@chromium.org. BUG= 626534 , 620259 Review-Url: https://codereview.chromium.org/2134603002 Cr-Commit-Position: refs/heads/master@{#404421} [modify] https://crrev.com/3b07cd446f6bf33618ebae11ca68273b7a0de2f8/base/third_party/libevent/README.chromium [modify] https://crrev.com/3b07cd446f6bf33618ebae11ca68273b7a0de2f8/base/third_party/libevent/kqueue.c
,
Jul 8 2016
,
Jul 11 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/45a5e0217612dfb48e8c3fd5dcd7e186d8288c52 commit 45a5e0217612dfb48e8c3fd5dcd7e186d8288c52 Author: erikchen <erikchen@chromium.org> Date: Mon Jul 11 17:21:58 2016 Fix a logic bug in kqueue.c. Remove an unnecessary workaround for OS X 10.4 from kqueue.c. It was causing problems on macOS Sierra. All credit for this CL goes to mark@chromium.org. BUG= 626534 , 620259 Review-Url: https://codereview.chromium.org/2134603002 Cr-Commit-Position: refs/heads/master@{#404421} (cherry picked from commit 3b07cd446f6bf33618ebae11ca68273b7a0de2f8) Review URL: https://codereview.chromium.org/2140723002 . Cr-Commit-Position: refs/branch-heads/2785@{#83} Cr-Branched-From: 68623971be0cfc492a2cb0427d7f478e7b214c24-refs/heads/master@{#403382} [modify] https://crrev.com/45a5e0217612dfb48e8c3fd5dcd7e186d8288c52/base/third_party/libevent/README.chromium [modify] https://crrev.com/45a5e0217612dfb48e8c3fd5dcd7e186d8288c52/base/third_party/libevent/kqueue.c
,
Jul 13 2016
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by durga.behera@chromium.org
, Jun 15 2016Components: Internals>GPU
Labels: -Type-Bug M-53 OS-Mac Type-Bug-Regression
Owner: jaydasika@chromium.org
Status: Assigned (was: Untriaged)