New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 866382 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 25
Cc:
Components:
EstimatedDays: ----
NextAction: 2018-07-27
OS: Linux , Windows , Chrome , Mac
Pri: 1
Type: Bug-Regression

Blocking:
issue 866365



Sign in to add a comment

Chrome: Crash Report - content::RenderFrameHostImpl::InvalidateMojoConnection

Project Member Reported by pnangunoori@chromium.org, Jul 23

Issue description

Filing this manually as facing issue creating bug from go/crash due to b/111623637.

Stack trace of 9df2c5ced403ddea

Thread 0 (id: 0xbdc) CRASHED [EXCEPTION_ACCESS_VIOLATION_READ @ 0xffffffffffffffff ] MAGIC SIGNATURE THREAD
Stack Quality100%Show frame trust levels
0x000007fef073c883	(chrome.dll -xtree:2070 )	std::_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::unique_ptr<BackgroundApplicationListModel::Application,std::default_delete<BackgroundApplicationListModel::Application> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<const std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::unique_ptr<BackgroundApplicationListModel::Application,std::default_delete<BackgroundApplicationListModel::Application> > > >,0> >::_Erase
0x000007fef073c7c8	(chrome.dll -xtree:1444 )	std::_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::unique_ptr<BackgroundApplicationListModel::Application,std::default_delete<BackgroundApplicationListModel::Application> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<const std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::unique_ptr<BackgroundApplicationListModel::Application,std::default_delete<BackgroundApplicationListModel::Application> > > >,0> >::erase
0x000007fef073c70e	(chrome.dll -xtree:1214 )	std::_Tree<std::_Tmap_traits<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::unique_ptr<BackgroundApplicationListModel::Application,std::default_delete<BackgroundApplicationListModel::Application> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<const std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::unique_ptr<BackgroundApplicationListModel::Application,std::default_delete<BackgroundApplicationListModel::Application> > > >,0> >::~_Tree
0x000007fef03323ac	(chrome.dll -memory:2304 )	std::unique_ptr<service_manager::BinderRegistryWithArgs<>,std::default_delete<service_manager::BinderRegistryWithArgs<> > >::reset(service_manager::BinderRegistryWithArgs<> *)
0x000007fef0cbcfaa	(chrome.dll -render_frame_host_impl.cc:4094 )	content::RenderFrameHostImpl::InvalidateMojoConnection()
0x000007fef0cb570b	(chrome.dll -render_frame_host_impl.cc:2014 )	content::RenderFrameHostImpl::OnRenderProcessGone(int,int)
0x000007fef0cb55bb	(chrome.dll -ipc_message_templates.h:146 )	IPC::MessageT<FrameHostMsg_RenderProcessGone_Meta,std::tuple<int,int>,void>::Dispatch<content::RenderFrameHostImpl,content::RenderFrameHostImpl,void,void (content::RenderFrameHostImpl::*)(int, int)>
0x000007fef06341a1	(chrome.dll -render_frame_host_impl.cc:944 )	content::RenderFrameHostImpl::OnMessageReceived(IPC::Message const &)
0x000007fef0de0e29	(chrome.dll -render_process_host_impl.cc:3884 )	content::RenderProcessHostImpl::ProcessDied(bool,content::ChildProcessTerminationInfo *)
0x000007fef0de0b77	(chrome.dll -render_process_host_impl.cc:2986 )	content::RenderProcessHostImpl::FastShutdownIfPossible(unsigned __int64,bool)
0x000007fef15fdc4b	(chrome.dll -tab_strip_model.cc:1318 )	TabStripModel::CloseWebContentses(base::span<content::WebContents * const,-1>,unsigned int)
0x000007fef15fbc00	(chrome.dll -tab_strip_model.cc:1285 )	TabStripModel::InternalCloseTabs(base::span<content::WebContents * const,-1>,unsigned int)
0x000007fef15fbd49	(chrome.dll -tab_strip_model.cc:588 )	TabStripModel::CloseWebContentsAt(int,unsigned int)
0x000007fef1ad6c9c	(chrome.dll -browser_tabstrip.cc:80 )	chrome::CloseWebContents(Browser *,content::WebContents *,bool)
0x000007fef0de72d9	(chrome.dll -ipc_message_templates.h:146 )	IPC::MessageT<ViewHostMsg_ClosePage_ACK_Meta,std::tuple<>,void>::Dispatch<content::RenderViewHostImpl,content::RenderViewHostImpl,void,void (content::RenderViewHostImpl::*)()>
0x000007fef0637a25	(chrome.dll -render_view_host_impl.cc:769 )	content::RenderViewHostImpl::OnMessageReceived(IPC::Message const &)
0x000007fef0637328	(chrome.dll -render_widget_host_impl.cc:612 )	content::RenderWidgetHostImpl::OnMessageReceived(IPC::Message const &)
0x000007fef0634012	(chrome.dll -render_process_host_impl.cc:3058 )	content::RenderProcessHostImpl::OnMessageReceived(IPC::Message const &)
0x000007fef0633ef8	(chrome.dll -ipc_channel_proxy.cc:320 )	IPC::ChannelProxy::Context::OnDispatchMessage(IPC::Message const &)
0x000007fef0060094	(chrome.dll -task_annotator.cc:101 )	base::debug::TaskAnnotator::RunTask(char const *,base::PendingTask *)
0x000007fef005fb6b	(chrome.dll -message_loop.cc:319 )	base::MessageLoop::RunTask(base::PendingTask *)
0x000007fef005f5b7	(chrome.dll -message_loop.cc:373 )	base::MessageLoop::DoWork()
0x000007fef0153a58	(chrome.dll -message_pump_win.cc:177 )	base::MessagePumpForUI::DoRunLoop()
0x000007fef00a4c47	(chrome.dll -message_pump_win.cc:56 )	base::MessagePumpWin::Run(base::MessagePump::Delegate *)
0x000007fef005f110	(chrome.dll -run_loop.cc:102 )	base::RunLoop::Run()
0x000007fef03eca51	(chrome.dll -chrome_browser_main.cc:2153 )	ChromeBrowserMainParts::MainMessageLoopRun(int *)
0x000007fef03ec847	(chrome.dll -browser_main_loop.cc:978 )	content::BrowserMainLoop::RunMainMessageLoopParts()
0x000007fef03ec7f2	(chrome.dll -browser_main_runner_impl.cc:169 )	content::BrowserMainRunnerImpl::Run()
0x000007fef0bdae52	(chrome.dll -browser_main.cc:51 )	content::BrowserMain(content::MainFunctionParams const &,std::unique_ptr<content::BrowserProcessSubThread,std::default_delete<content::BrowserProcessSubThread> >)
0x000007fef1121b3a	(chrome.dll -content_main_runner_impl.cc:620 )	content::RunBrowserProcessMain(content::MainFunctionParams const &,content::ContentMainDelegate *,std::unique_ptr<content::BrowserProcessSubThread,std::default_delete<content::BrowserProcessSubThread> >)
0x000007fef0055985	(chrome.dll -content_main_runner_impl.cc:964 )	content::ContentMainRunnerImpl::Run()
0x000007fef0045172	(chrome.dll -main.cc:459 )	service_manager::Main(service_manager::MainParams const &)
0x000007fef0044a07	(chrome.dll -content_main.cc:19 )	content::ContentMain(content::ContentMainParams const &)
0x000007fef0041af1	(chrome.dll -chrome_main.cc:101 )	ChromeMain
0x000000013f8e35d5	(chrome.exe -main_dll_loader_win.cc:201 )	MainDllLoader::Launch(HINSTANCE__ *,base::TimeTicks)
0x000000013f8e1698	(chrome.exe -chrome_exe_main_win.cc:230 )	wWinMain
0x000000013f9bbe35	(chrome.exe -exe_common.inl:283 )	__scrt_common_main_seh
0x775d59cc	(kernel32.dll + 0x000159cc )	BaseThreadInitThunk
0x7783383c	(ntdll.dll + 0x0005383c )	RtlUserThreadStart

Link to the crashes:
====================
https://crash.corp.google.com/browse?q=product_name%3D%27Chrome%27+AND+expanded_custom_data.ChromeCrashProto.ptype%3D%27browser%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27content%3A%3ARenderFrameHostImpl%3A%3AInvalidateMojoConnection%27

Crash data on other channels:
-----------------------------
69.0.3493.3	3.38%	5 - Dev
68.0.3440.68	10.14%	15 - Beta
67.0.3396.99	22.97%	34 - Stable

Crash data from previous Beta builds:
-------------------------------------
68.0.3440.68	83.33%	15
68.0.3440.25	5.56%	1
66.0.3359.106	5.56%	1
66.0.3359.45	5.56%	1

Note:
=====
1. This issue starting seen on M65-65.0.3325.181. But, recently spiked in latest Beta M68 - 68.0.3440.68
2. This is 'Browser' crash listed under Beta build 68.0.3440.68 for Windows on 15 different client ID's.
3. Currently this crash is ranked as number #27 with 15 instances.
4. Most of the crashes are observed on Windows 7 OS.
5. Though there is sudden spike in crashes in latest Beta, not marking this crash as RB-Stable as there are only 15 instances of crashes, please feel free to edit if this is not the case.
6. Crashes are not observed on Mac, Linux and Android.

Regression Range:
68.0.3440.59 - Good
68.0.3440.68 - Bad

From the above regression range, suspecting the change for the file 'render_view_host_impl.cc' - https://chromium.googlesource.com/chromium/src/+/8da446508bb2890542a71ea6814cc1ef24951a34
creis@ -- Could you please take a look into this issue.

Thanks!
 
Cc: abdulsyed@chromium.org manoranj...@chromium.org
Crashes are also seen with other magic signatures as well and the above CL could be the root cause for the crashes.

Magic Signature: content::NavigationControllerImpl::GetLastCommittedEntry
Sample Crash: http://go/crash/61d1a7a67e429449

Magic Signature: content::CrossProcessFrameConnector::GetRootRenderWidgetHostView
Sample Crash: http://go/crash/9719963d879bcdac

creis@ - Could you also look into the above crashes and let us know if these crashes are to be tracked separately.


Labels: ReleaseBlock-Stable
Also, adding RB-Stable as the crashes are seen with multiple crash signatures.
This bug needs to be fixed and merged back within the next week or it will start blocking our R68 stable release.
Cc: creis@chromium.org
Components: UI>Browser>Navigation Internals>Sandbox>SiteIsolation
Owner: alex...@chromium.org
Alex, can you take a look?  The suspect is the merge of the beforeunload fix, but I thought we didn't see crashes from that on M69.  Could it have had a bad interaction with M68, or is something else the cause?
First look hasn't turned up much yet, but here's a few observations so far.

I haven't spotted a connection between the https://chromium-review.googlesource.com/c/chromium/src/+/1137839 merge and the InvalidateMojoConnection crash yet.  That crash seems to predate that CL, and I don't yet see a reason for the spike.

The GetRootRenderWidgetHostView crash in comment 1 is interesting:
https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27content%3A%3ACrossProcessFrameConnector%3A%3AGetRootRenderWidgetHostView%27#-property-selector,productname:1000,productversion:100,-magicsignature:50,-magicsignature2:50,-stablesignature:50,-magicsignaturesorted:50
As of 68.0.3440.68, it seems to involve a re-entrant call to chrome::CloseWebContents, much like erikchen@ was fixing in https://chromium-review.googlesource.com/c/chromium/src/+/1101846, which was merged to M68 in 68.0.3440.59.  The new crashes do have RenderFrameHostImpl::ProcessBeforeUnloadACKFromFrame on the stack, which was introduced in Alex's https://chromium-review.googlesource.com/c/chromium/src/+/1137839, though I can't yet tell if that's the cause or not.

The GetLastCommittedEntry crashes from comment 1 seem tangentially related:
https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27content%3A%3ANavigationControllerImpl%3A%3AGetLastCommittedEntry%27+AND+product.Version%3D%2768.0.3440.68%27&stbtiq=&reportid=&index=0
They have a call to chrome::CloseWebContents, but not a re-entrant one, and they don't seem to involve ProcessBeforeUnloadACKFromFrame (or beforeunload at all).
Cc: erikc...@chromium.org
+erikchen, given the re-entrant call to CloseWebContents mentioned in the GetRootRenderWidgetHostView crashes, similar to issue 851400.

Alex, I suspect the ProcessBeforeUnloadACKFromFrame call from RenderFrameHostImpl's destructor is involved there, if you look at a crash like 3abb5276279975f8 in 69.0.3496.0 or 9719963d879bcdac from 68.0.3440.68.

Still haven't spotted the connection with the RenderFrameHostImpl::InvalidateMojoConnection crashes yet.
Alex and I are still looking.  So far all crashes involve FastShutdownIfPossible after a ClosePage_ACK.

1) InvalidateMojoConnection
This seems to solidly predate 68.0.3440.68 and the stack trace didn't change at that version.  So far I'm not seeing much of a connection to the merge of the beforeunload fix in r575133 to M68, other than being related to closing tabs.  Unclear if the spike is concerning or not yet relative to the crashes in M65-M67, but see note below in #2 about similarity and a change in behavior at 68.0.3440.68.

2) GetLastCommittedEntry
This one did change its stack trace around 68.0.3440.68, and it has a very similar similar stack track to the InvalidateMojoConnection crash, involving closing, FastShutdownIfPossible, ProcessDied, and OnRenderProcessGone.  No obvious connection to the beforeunload CL yet.  Possible connection to erikchen@'s r556289 (68.0.3422.0), but that doesn't line up with the version number with the spike.

3) GetRootRenderWidgetHostView
This one also changed its stack trace around 68.0.3440.68, and it has a re-entrant call to CloseWebContents via ProcessBeforeUnloadACKFromFrame and ~RFHI, as noted in comments 5-6.  This is likely due to Alex's r575133 and might benefit from a similar fix as erikchen@'s r568534.  That has 17 crashes in M68 so far.

I'm not sure yet whether #1 and #2 are related to the beforeunload CL, but Alex is trying to find a repro involving what we know about the crashes so we can understand it more.
I've made some good progress in understanding this and found a repro for both the InvalidateMojoConnection and GetRootRenderWidgetHostView crashes.

I concentrated on the GetRootRenderWidgetHostView() variant which seemed to have the most detailed stack, and also poked at a couple of minidumps, leading to the following analysis:

1. ClosePageACK arrives and triggers TabStripModel::CloseWebContentses().  This calls RenderProcessHostImpl::FastShutdownIfPossible() and proceeds to shut down the main frame's process.  This implies that the main frame process does not have an unload handler defined, otherwise FastShutdownIfPossible() wouldn't proceed.  It also implies that this isn't part of browser shutdown, otherwise FastShutdownIfPossible wouldn't be used.

2. As part of main frame's process shutdown, we call FTN::ResetForNewProcess() on the main frame.  That starts to destroy children FTNs, which triggers destruction of each child's RFHM and RFHI.

3. One subframe's RFHI destructor (let's call that subframe X) finds a valid |beforeunload_initiator|, namely, the main frame.  That implies the main frame has is_waiting_for_beforeunload_ack_ set to true.  This is where things get interesting.  On regular tab close, this isn't possible, because tab close will wait for all beforeunload handlers to run (as indicated by RFHM::OnBeforeUnloadACK) before proceeding, and prior to calling RFHM::OnBeforeUnloadACK(), we reset is_waiting_for_beforeunload_ack_.  So either (1) the ClosePage ACK isn't part of a browser-initiated tab close, or (2) it is, but something else reset is_waiting_for_beforeunload_ack_ to true while we were waiting for the ClosePage ACK. 

4. Subframe X is the final beforeunload ACK that the main frame is waiting for, and it proceeds to call RFHM::OnBeforeUnloadACK().  This implies that the beforeunload ACK was for a tab close and not a navigation.  A navigation would've set unload_ack_is_for_navigation_ to true and we wouldn't end up calling RFHM::OnBeforeUnloadACK at all (instead calling NavigatorImpl::OnBeforeUnloadACK).

5. RFHM::OnBeforeUnloadACK proceeds to close the page via RenderViewHostImpl::ClosePage().  This leads to the reentrant call to TabStripModel::CloseWebContentses.  Getting here already is a problem - see issue 851400.

6. Next, that leads to destruction of the WebContentsImpl, FrameTree, main frame FTN, main frame RFHM (which zeroes out current_frame_host()), and main frame RFHI.  The latter is the last active frame in the process, so it tries to clear out remaining proxies in main frame's SiteInstance.

7. One of those proxies is for subframe Y.  It's important that this is a different subframe than subframe X, since subframe X cleared out all proxies in its FTN as part of destroying its RFHM in step 2.

8. Destroying that proxy crashes inside CFPC::SetView(nullptr) -> GetRootRenderWidgetHostView(), because it looks up root frame's current_frame_host() and deferences it directly, but that's actually null and already destroyed above in step 6.  There's even a comment describing exactly what happened in ~RFHM at https://cs.chromium.org/chromium/src/content/browser/frame_host/render_frame_host_manager.cc?l=92&rcl=5f53767331d74e75e7e055542951e4d7f2c01f45.

The crux of this is how to have unload_ack_is_for_navigation_ set to true when ClosePage ACK arrives.  A navigation can't do this (see step 4).  I've tried a repeated tab close (clicking on "x" twice), but couldn't get that to work also, since it asks the main frame to run beforeunload again, and that means the new beforeunload ACK wouldn't arrive prior to the ClosePage ACK, and that means step 4 doesn't work because the subframe beforeunload ACK isn't the final beforeunload ACK.

I finally found that one way to get this is with window.close() when called on a *cross-process* window.  Same-process window.close works fine because it just asks the browser process to close the window ignoring unload events completely (see ViewHostMsg_Close, which calls ClosePageIgnoringUnloadEvents()).  However, cross-process window.close() goes through ViewHostMsg_RouteCloseEvent, which finds the active RVH and calls ClosePage() on it, which asks the right renderer to run unload events and then close.  That doesn't touch beforeunload at all, and it also waits for the ClosePage_ACK.  So if we do such a window.close() on a window that's in the middle of waiting for beforeunload, we can get the crashes here.

Repro steps:
1. Increase RenderViewHostImpl::kUnloadTimeoutMS to 10000ms (makes it much easier to repro)
2. Go to https://csreis.github.io
3. In devtools, execute 
   var w = open('http://alexmos17.github.io/tests/beforeunload.html');
4. In the new tab, click "Add frame"
5a. Click "Add beforeunload (slow, no dialog)" in the first frame, *or*
5b. Click "Add beforeunload (slow, no dialog)" in the second frame.  These just install a beforeunload handler that takes 5 seconds to run.
6. Click on the "x" to close the new frame.  That should hang waiting for subframe beforeunload handler.
7. Within the next 5 seconds, go back to first tab and execute w.close().
8. If you did 5a, you get the InvalidateMojoConnection crash.  If you did 5b, you get the GetRootRenderWidgetHostView crash.

Note also that if you're in a debug build or have dcheck_always_on, you'll hit erikchen@'s DCHECK(!reentrancy_guard_) in TabStripModel::InternalCloseTabs from issue 851400.  To repro the crashes here, I've disabled that temporarily.

I've pasted both stacks from my debug build below.

Received signal 11 SEGV_MAPERR 000000000130
#0 0x7f330138705c base::debug::StackTrace::StackTrace()
#1 0x7f3301386b31 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#2 0x7f32f5c590c0 <unknown>
#3 0x7f32fe7e70e2 content::CrossProcessFrameConnector::GetRootRenderWidgetHostView()
#4 0x7f32fe7e6687 content::CrossProcessFrameConnector::SetView()
#5 0x7f32fe7e579b content::CrossProcessFrameConnector::~CrossProcessFrameConnector()
#6 0x7f32fe7e592e content::CrossProcessFrameConnector::~CrossProcessFrameConnector()
#7 0x7f32fe8500b5 content::RenderFrameProxyHost::~RenderFrameProxyHost()
#8 0x7f32fe85011e content::RenderFrameProxyHost::~RenderFrameProxyHost()
#9 0x7f32fe5c2f0f std::__1::__hash_table<>::__erase_unique<>()
#10 0x7f32fe84592b content::RenderFrameHostManager::ActiveFrameCountIsZero()
#11 0x7f32feb6f2d4 content::SiteInstanceImpl::DecrementActiveFrameCount()
#12 0x7f32fe81b8e2 content::RenderFrameHostImpl::~RenderFrameHostImpl()
#13 0x7f32fe81ce3e content::RenderFrameHostImpl::~RenderFrameHostImpl()
#14 0x7f32fe841984 content::RenderFrameHostManager::~RenderFrameHostManager()
#15 0x7f32fe7eae57 content::FrameTreeNode::~FrameTreeNode()
#16 0x7f32fe7e8f9e content::FrameTree::~FrameTree()
#17 0x7f32feba926b content::WebContentsImpl::~WebContentsImpl()
#18 0x7f32feba96be content::WebContentsImpl::~WebContentsImpl()
#19 0x563465eebd1f TabStripModel::SendDetachWebContentsNotifications()
#20 0x563465ef0983 TabStripModel::CloseWebContentses()
#21 0x563465eed4b3 TabStripModel::InternalCloseTabs()
#22 0x563465eed739 TabStripModel::CloseWebContentsAt()
#23 0x563465ebef96 chrome::CloseWebContents()
#24 0x7f32fea6061f content::RenderViewHostImpl::ClosePage()
#25 0x7f32fe84274f content::RenderFrameHostManager::OnBeforeUnloadACK()
#26 0x7f32fe81cdc0 content::RenderFrameHostImpl::ProcessBeforeUnloadACKFromFrame()
#27 0x7f32fe81bce1 content::RenderFrameHostImpl::~RenderFrameHostImpl()
#28 0x7f32fe81ce3e content::RenderFrameHostImpl::~RenderFrameHostImpl()
#29 0x7f32fe841984 content::RenderFrameHostManager::~RenderFrameHostManager()
#30 0x7f32fe7eae57 content::FrameTreeNode::~FrameTreeNode()
#31 0x7f32fe7eb2a0 content::FrameTreeNode::ResetForNewProcess()
#32 0x7f32fe81f28e content::RenderFrameHostImpl::OnRenderProcessGone()
#33 0x7f32fe81f154 _ZN3IPC8MessageTI35FrameHostMsg_RenderProcessGone_MetaNSt3__15tupleIJiiEEEvE8DispatchIN7content19RenderFrameHostImplES8_vMS8_FviiEEEbPKNS_7MessageEPT_PT0_PT1_T2_
#34 0x7f32fe81e7e4 content::RenderFrameHostImpl::OnMessageReceived()
#35 0x7f32fea4ff28 content::RenderProcessHostImpl::ProcessDied()
#36 0x7f32fea4facb content::RenderProcessHostImpl::FastShutdownIfPossible()
#37 0x563465ef0612 TabStripModel::CloseWebContentses()
#38 0x563465eed4b3 TabStripModel::InternalCloseTabs()
#39 0x563465eed739 TabStripModel::CloseWebContentsAt()
#40 0x563465ebef96 chrome::CloseWebContents()
#41 0x7f32fea62266 content::RenderViewHostImpl::OnClosePageACK()
#42 0x7f32fea620a6 _ZN3IPC8MessageTI30ViewHostMsg_ClosePage_ACK_MetaNSt3__15tupleIJEEEvE8DispatchIN7content18RenderViewHostImplES8_vMS8_FvvEEEbPKNS_7MessageEPT_PT0_PT1_T2_
#43 0x7f32fea60f0e content::RenderViewHostImpl::OnMessageReceived()
#44 0x7f32fea505db content::RenderProcessHostImpl::OnMessageReceived()
#45 0x7f32ff8cf311 IPC::ChannelProxy::Context::OnDispatchMessage()
#46 0x7f32ff8d2228 _ZN4base8internal7InvokerINS0_9BindStateIMN3IPC12ChannelProxy7ContextEFvRKNS3_7MessageEEJ13scoped_refptrIS5_ES6_EEEFvvEE3RunEPNS0_13BindStateBaseE
#47 0x7f3301291e5d base::debug::TaskAnnotator::RunTask()
#48 0x7f33012c17d6 base::MessageLoop::RunTask()
#49 0x7f33012c1bea base::MessageLoop::DeferOrRunPendingTask()
#50 0x7f33012c1f1c base::MessageLoop::DoWork()
#51 0x7f33012c518f base::(anonymous namespace)::WorkSourceDispatch()
#52 0x7f32f53e8f07 g_main_context_dispatch
#53 0x7f32f53e9138 <unknown>
#54 0x7f32f53e91cc g_main_context_iteration
#55 0x7f33012c4ec2 base::MessagePumpGlib::Run()
#56 0x7f33012c1141 base::MessageLoop::Run()
#57 0x7f33012f4616 base::RunLoop::Run()
#58 0x563464cedf58 ChromeBrowserMainParts::MainMessageLoopRun()
#59 0x7f32fe679057 content::BrowserMainLoop::RunMainMessageLoopParts()
#60 0x7f32fe67c1f3 content::BrowserMainRunnerImpl::Run()
#61 0x7f32fe6750e9 content::BrowserMain()


[134254:134254:0723/224138.751313:FATAL:lock_impl_posix.cc(103)] Check failed: rv == 0 (22 vs. 0). Invalid argument. Hint: This is often related to a use-after-free.
#0 0x7fe7137ec05c base::debug::StackTrace::StackTrace()
#1 0x7fe71371623b logging::LogMessage::~LogMessage()
#2 0x7fe7137fc217 base::internal::LockImpl::Lock()
#3 0x7fe71375e55e base::SequenceCheckerImpl::CalledOnValidSequence()
#4 0x7fe71289ef9a mojo::internal::MultiplexRouter::CloseMessagePipe()
#5 0x7fe71289b40e mojo::internal::InterfacePtrStateBase::~InterfacePtrStateBase()
#6 0x7fe710c90555 content::RenderFrameHostImpl::InvalidateMojoConnection()
#7 0x7fe710c842bc content::RenderFrameHostImpl::OnRenderProcessGone()
#8 0x7fe710c84154 _ZN3IPC8MessageTI35FrameHostMsg_RenderProcessGone_MetaNSt3__15tupleIJiiEEEvE8DispatchIN7content19RenderFrameHostImplES8_vMS8_FviiEEEbPKNS_7MessageEPT_PT0_PT1_T2_
#9 0x7fe710c837e4 content::RenderFrameHostImpl::OnMessageReceived()
#10 0x7fe710eb4f28 content::RenderProcessHostImpl::ProcessDied()
#11 0x7fe710eb4acb content::RenderProcessHostImpl::FastShutdownIfPossible()
#12 0x561151fd2612 TabStripModel::CloseWebContentses()
#13 0x561151fcf4b3 TabStripModel::InternalCloseTabs()
#14 0x561151fcf739 TabStripModel::CloseWebContentsAt()
#15 0x561151fa0f96 chrome::CloseWebContents()
#16 0x7fe710ec7266 content::RenderViewHostImpl::OnClosePageACK()
#17 0x7fe710ec70a6 _ZN3IPC8MessageTI30ViewHostMsg_ClosePage_ACK_MetaNSt3__15tupleIJEEEvE8DispatchIN7content18RenderViewHostImplES8_vMS8_FvvEEEbPKNS_7MessageEPT_PT0_PT1_T2_
#18 0x7fe710ec5f0e content::RenderViewHostImpl::OnMessageReceived()
#19 0x7fe710eb55db content::RenderProcessHostImpl::OnMessageReceived()
#20 0x7fe711d34311 IPC::ChannelProxy::Context::OnDispatchMessage()
#21 0x7fe711d37228 _ZN4base8internal7InvokerINS0_9BindStateIMN3IPC12ChannelProxy7ContextEFvRKNS3_7MessageEEJ13scoped_refptrIS5_ES6_EEEFvvEE3RunEPNS0_13BindStateBaseE
#22 0x7fe7136f6e5d base::debug::TaskAnnotator::RunTask()
#23 0x7fe7137267d6 base::MessageLoop::RunTask()
#24 0x7fe713726bea base::MessageLoop::DeferOrRunPendingTask()
#25 0x7fe713726f1c base::MessageLoop::DoWork()
#26 0x7fe71372a18f base::(anonymous namespace)::WorkSourceDispatch()
#27 0x7fe70784df07 g_main_context_dispatch
#28 0x7fe70784e138 <unknown>
#29 0x7fe70784e1cc g_main_context_iteration
#30 0x7fe713729ec2 base::MessagePumpGlib::Run()
#31 0x7fe713726141 base::MessageLoop::Run()
#32 0x7fe713759616 base::RunLoop::Run()
#33 0x561150dcff58 ChromeBrowserMainParts::MainMessageLoopRun()
#34 0x7fe710ade057 content::BrowserMainLoop::RunMainMessageLoopParts()
#35 0x7fe710ae11f3 content::BrowserMainRunnerImpl::Run()
#36 0x7fe710ada0e9 content::BrowserMain()
#37 0x7fe7115d1694 content::ContentMainRunnerImpl::Run()
#38 0x7fe713a5f8b9 service_manager::Main()
#39 0x7fe7115cf6f1 content::ContentMain()
#40 0x5611507d31b3 ChromeMain
#41 0x7fe705f3f2b1 __libc_start_main
#42 0x5611507d302a _start

Slightly easier repro steps that don't involve modifying RenderViewHostImpl::kUnloadTimeoutMS, but instead rely on the beforeunload dialog being up:

1. Go to https://csreis.github.io
2. In devtools, execute 
   var w = open('http://alexmos17.github.io/tests/beforeunload.html');
3. In the new tab, click "Add frame"
4a. Click "Add beforeunload (slow, no dialog)" in the first frame, *or*
4b. Click "Add beforeunload (slow, no dialog)" in the second frame.  These just install a beforeunload handler that takes 5 seconds to run.
5. Go back to first tab and execute 
   setTimeout(() => w.close(), 5000)
6. Within the next 5 seconds, click on the "x" on the second tab.  That should bring up the subframe's beforeunload dialog.  Leave it up.  In 5 seconds, if you did 4a, you get the InvalidateMojoConnection crash.  If you did 4b, you get the GetRootRenderWidgetHostView crash.

Correction to steps in #9: in step 4, use the "Add beforeunload (with dialog)" button instead of "slow, no dialog".
Labels: OS-Linux OS-Mac OS-Windows
Verified that steps in #10 work on latest Mac canary (see crash ID 4c7fccd7bb9e81cd), in addition to Linux where I originally found them.  Some crash reports are also seen on Windows.
And just to complete the explanation of why we crash in InvalidateMojoConnection() - as part of the reentrant call to TabStripModel::CloseWebContentses, we destroyed the WebContents, FTN, RFHM, and then the main frame RFHI as part of TabStripModel::SendDetachWebContentsNotifications().  Then as we unwind the stack back, we come back to RenderFrameHostImpl::OnRenderProcessGone() in the main frame's RFHI which has already been freed.  Having just finished ResetForNewProcess(), the next call after that is InvalidateMojoConnection(), and it crashes when it tries to access the freed RFHI state.

Blocking: 866365
Labels: Restrict-View-SecurityTeam
Status: Started (was: Assigned)
Proactively marking Restrict-View-Security given the discussion of returning to freed objects, in case there's a UaF here.  Alex thinks the repro steps at least require user interaction, which would be a mitigating factor if so.
I've got a fix in progress at https://chromium-review.googlesource.com/c/chromium/src/+/1148775.  Hoping that it can land in time for tomorrow's canary.
Project Member

Comment 16 by bugdroid1@chromium.org, Jul 25

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fe000eecfd314aff088f7d981f96b19a80a20515

commit fe000eecfd314aff088f7d981f96b19a80a20515
Author: Alex Moshchuk <alexmos@chromium.org>
Date: Wed Jul 25 00:01:13 2018

Schedule OnBeforeUnloadACK to prevent reentrancy when destroying a RFHI.

Previously, it was possible that in the middle of closing a tab and
shutting down the main frame's process, we could call
ResetForNewProcess() to clear the main frame's children, but
destroying a child's RenderFrameHostImpl could reenter
TabStripModel::CloseWebContentses if the main frame was also waiting
for that child's beforeunload ACK.  This could result in freeing state
such as the WebContents or main frame's RFHM and RFHI, and then later
unwinding the stack to proceed with the first CloseWebContentses and
running into crashes when trying to dereference pointers which were
already freed or nulled out.

To prevent this reentrancy, schedule the beforeunload ACK in the cases
where it's used for tab close.  This is similar to a fix for issue
851400, but addresses a new way to get the reentrant calls that became
possible after r575133.

Bug:  866382 ,  866365 
Change-Id: Id20068cb24f457f1fae1005050d46441dac78d9a
Reviewed-on: https://chromium-review.googlesource.com/1148775
Reviewed-by: Charlie Reis <creis@chromium.org>
Commit-Queue: Alex Moshchuk <alexmos@chromium.org>
Cr-Commit-Position: refs/heads/master@{#577736}
[modify] https://crrev.com/fe000eecfd314aff088f7d981f96b19a80a20515/chrome/browser/chrome_site_per_process_browsertest.cc
[modify] https://crrev.com/fe000eecfd314aff088f7d981f96b19a80a20515/content/browser/frame_host/render_frame_host_impl.cc
[modify] https://crrev.com/fe000eecfd314aff088f7d981f96b19a80a20515/content/browser/frame_host/render_frame_host_impl.h

Status: Fixed (was: Started)
I can verify that the repro steps in comments 9-10 cause a crash before the fix on Mac Canary 69.0.3495.0 (259c44fe9fafece4, though in this case it's a CoreTabHelper::OnCloseStarted crash).  After the fix in today's Mac Canary 70.0.3502.0, the crash no longer occurs with either variation of the repro steps.  (The tab closes when the 5 second timeout occurs, which is expected given the TODO around the window.close case.)  Thanks Alex!

Let's keep an eye on the crashes today to make sure they go away and then figure out  merge details.
Crash links for monitoring below.  Hoping these are fixed in 70.0.3502.0, at least for the relevant stack traces, if not any pre-existing stack traces.

1) content::RenderFrameHostImpl::InvalidateMojoConnection
https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27content%3A%3ARenderFrameHostImpl%3A%3AInvalidateMojoConnection%27#-property-selector,productname:1000,productversion:1000,-magicsignature:50,-magicsignature2:50,-stablesignature:50,-magicsignaturesorted:50
(This was the #23 browser crash on 68.0.3440.68 beta, and currently the #77 browser crash on 68.0.3440.75 stable.)

2) content::NavigationControllerImpl::GetLastCommittedEntry 
https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27content%3A%3ANavigationControllerImpl%3A%3AGetLastCommittedEntry%27#-property-selector,productname:1000,productversion:1000,-magicsignature:50,-magicsignature2:50,-stablesignature:50,-magicsignaturesorted:50
(This was the #25 browser crash on 68.0.3440.68 beta, and currently the #64 browser crash on 68.0.3440.75 stable.)

3) content::CrossProcessFrameConnector::GetRootRenderWidgetHostView
https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27content%3A%3ACrossProcessFrameConnector%3A%3AGetRootRenderWidgetHostView%27#-property-selector,productname:1000,productversion:1000,-magicsignature:50,-magicsignature2:50,-stablesignature:50,-magicsignaturesorted:50
(This was the #30 browser crash on 68.0.3440.68 beta, and currently the #107 browser crash on 68.0.3440.75 stable.)

4) content::WebContentsImpl::NotifySwappedFromRenderManager (from  issue 866365 )
https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27content%3A%3AWebContentsImpl%3A%3ANotifySwappedFromRenderManager%27#-property-selector,productname:1000,productversion:1000,-magicsignature:50,-magicsignature2:50,-stablesignature:50,-magicsignaturesorted:50
(This was the #19 browser crash on 68.0.3440.68 beta, and currently the #21 browser crash on 68.0.3440.75 stable.)

No crashes in 70.0.3502.0 yet, but note that the Windows Canary hasn't updated to that version yet (per issue 867322).
Project Member

Comment 19 by sheriffbot@chromium.org, Jul 26

Labels: -Restrict-View-SecurityTeam Restrict-View-SecurityNotify
#18: Thanks for posting these monitoring links!  As of this morning, there still haven't been any of those crashes in 70.0.3502.0+.  Also confirming that the repro steps no longer lead to a crash for me either on Mac Canary 70.0.3503.0.
Labels: Merge-Request-69 M-69 M-70
Great!  Let's get the merge request going for M69, and I'll chat with Abdul about M68.
NextAction: 2018-07-27
Setting a reminder to request merge to M68 on Friday, after another day of bake time.
The NextAction date has arrived: 2018-07-27
Project Member

Comment 24 by sheriffbot@chromium.org, Jul 27

Labels: -Merge-Request-69 Hotlist-Merge-Approved Merge-Approved-69
Your change meets the bar and is auto-approved for M69. Please go ahead and merge the CL to branch 3497 manually. Please contact milestone owner if you have questions.
Owners: amineer@(Android), kariahda@(iOS), cindyb@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: Merge-Request-68
I've checked the monitoring links in comment 18 and there still aren't any instances of the 4 crash signatures since r577736 landed in 70.0.3502.0.

I've also tested a merge of that CL to a local M68 branch and it fixes the crash(es) using the repro steps in comments 9-10, and it passes the new tests.  There was one missing include in the test file as a merge conflict, which was easy to fix.  (Draft merge CL is here: https://chromium-review.googlesource.com/c/chromium/src/+/1153223.)

Given discussion with Abdul, we're requesting merge to M68 to resolve the crashes.  Thanks!

(Alex has a merge on the way for M69 already.)
Project Member

Comment 26 by sheriffbot@chromium.org, Jul 27

Labels: -Merge-Request-68 Hotlist-Merge-Review Merge-Review-68
This bug requires manual review: Request affecting a post-stable build
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), kariahda@(iOS), bhthompson@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 27 by bugdroid1@chromium.org, Jul 27

Labels: -merge-approved-69 merge-merged-3497
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/838874422ea8a995a120ac284e386b209ce4c533

commit 838874422ea8a995a120ac284e386b209ce4c533
Author: Alex Moshchuk <alexmos@chromium.org>
Date: Fri Jul 27 17:22:48 2018

Schedule OnBeforeUnloadACK to prevent reentrancy when destroying a RFHI (Merge to M69)

Previously, it was possible that in the middle of closing a tab and
shutting down the main frame's process, we could call
ResetForNewProcess() to clear the main frame's children, but
destroying a child's RenderFrameHostImpl could reenter
TabStripModel::CloseWebContentses if the main frame was also waiting
for that child's beforeunload ACK.  This could result in freeing state
such as the WebContents or main frame's RFHM and RFHI, and then later
unwinding the stack to proceed with the first CloseWebContentses and
running into crashes when trying to dereference pointers which were
already freed or nulled out.

To prevent this reentrancy, schedule the beforeunload ACK in the cases
where it's used for tab close.  This is similar to a fix for issue
851400, but addresses a new way to get the reentrant calls that became
possible after r575133.

TBR=alexmos@chromium.org

(cherry picked from commit fe000eecfd314aff088f7d981f96b19a80a20515)

Bug:  866382 ,  866365 
Change-Id: Id20068cb24f457f1fae1005050d46441dac78d9a
Reviewed-on: https://chromium-review.googlesource.com/1148775
Reviewed-by: Charlie Reis <creis@chromium.org>
Commit-Queue: Alex Moshchuk <alexmos@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#577736}
Reviewed-on: https://chromium-review.googlesource.com/1153366
Reviewed-by: Alex Moshchuk <alexmos@chromium.org>
Cr-Commit-Position: refs/branch-heads/3497@{#162}
Cr-Branched-From: 271eaf50594eb818c9295dc78d364aea18c82ea8-refs/heads/master@{#576753}
[modify] https://crrev.com/838874422ea8a995a120ac284e386b209ce4c533/chrome/browser/chrome_site_per_process_browsertest.cc
[modify] https://crrev.com/838874422ea8a995a120ac284e386b209ce4c533/content/browser/frame_host/render_frame_host_impl.cc
[modify] https://crrev.com/838874422ea8a995a120ac284e386b209ce4c533/content/browser/frame_host/render_frame_host_impl.h

Labels: -Merge-Review-68 Merge-Approved-68
Approving merge to M68. Branch:3440
Project Member

Comment 29 by bugdroid1@chromium.org, Jul 27

Labels: -merge-approved-68 merge-merged-3440
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3cccf1a40279cf1f5d789358f80c15deb6c884c2

commit 3cccf1a40279cf1f5d789358f80c15deb6c884c2
Author: Charlie Reis <creis@chromium.org>
Date: Fri Jul 27 18:31:35 2018

Merge to M68: Schedule OnBeforeUnloadACK to prevent reentrancy when destroying a RFHI.

Previously, it was possible that in the middle of closing a tab and
shutting down the main frame's process, we could call
ResetForNewProcess() to clear the main frame's children, but
destroying a child's RenderFrameHostImpl could reenter
TabStripModel::CloseWebContentses if the main frame was also waiting
for that child's beforeunload ACK.  This could result in freeing state
such as the WebContents or main frame's RFHM and RFHI, and then later
unwinding the stack to proceed with the first CloseWebContentses and
running into crashes when trying to dereference pointers which were
already freed or nulled out.

To prevent this reentrancy, schedule the beforeunload ACK in the cases
where it's used for tab close.  This is similar to a fix for issue
851400, but addresses a new way to get the reentrant calls that became
possible after r575133.

Bug:  866382 ,  866365 
Change-Id: Id20068cb24f457f1fae1005050d46441dac78d9a
Reviewed-on: https://chromium-review.googlesource.com/1148775
Reviewed-by: Charlie Reis <creis@chromium.org>
Commit-Queue: Alex Moshchuk <alexmos@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#577736}
Reviewed-on: https://chromium-review.googlesource.com/1153223
Reviewed-by: Alex Moshchuk <alexmos@chromium.org>
Cr-Commit-Position: refs/branch-heads/3440@{#761}
Cr-Branched-From: 010ddcfda246975d194964ccf20038ebbdec6084-refs/heads/master@{#561733}
[modify] https://crrev.com/3cccf1a40279cf1f5d789358f80c15deb6c884c2/chrome/browser/chrome_site_per_process_browsertest.cc
[modify] https://crrev.com/3cccf1a40279cf1f5d789358f80c15deb6c884c2/content/browser/frame_host/render_frame_host_impl.cc
[modify] https://crrev.com/3cccf1a40279cf1f5d789358f80c15deb6c884c2/content/browser/frame_host/render_frame_host_impl.h

Project Member

Comment 30 by sheriffbot@chromium.org, Nov 1

Labels: -Restrict-View-SecurityNotify allpublic
This bug has been closed for more than 14 weeks. Removing security view restrictions.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Sign in to add a comment