Issue metadata
Sign in to add a comment
|
Flaky UaF when running TabRestoreTest.RestoreFirstBrowserWhenSessionServiceEnabled |
||||||||||||||||||||||||
Issue descriptionFindit has detected a flake at test TabRestoreTest.RestoreFirstBrowserWhenSessionServiceEnabled. Culprit (70.0% confidence): https://chromium-review.googlesource.com/q/Ie87e48658864203d33d1ae647fb2d98a7c488765 Regression range: None Analysis: https://findit-for-me.appspot.com/waterfall/flake?key=ag9zfmZpbmRpdC1mb3ItbWVywQELEhdNYXN0ZXJGbGFrZUFuYWx5c2lzUm9vdCKKAWNocm9taXVtLm1lbW9yeS9NYWMgQVNhbiA2NCBUZXN0cyAoMSkvNDAxNzYvYnJvd3Nlcl90ZXN0cy9WR0ZpVW1WemRHOXlaVlJsYzNRdVVtVnpkRzl5WlVacGNuTjBRbkp2ZDNObGNsZG9aVzVUWlhOemFXOXVVMlZ5ZG1salpVVnVZV0pzWldRPQwLEhNNYXN0ZXJGbGFrZUFuYWx5c2lzGAEM If this result was incorrect, apply the label Findit-Incorrect-Result, mark the bug as Untriaged and the component Tools>Test>Findit>Flakiness.
,
Apr 23 2018
,
Apr 23 2018
==59818==ERROR: AddressSanitizer: heap-use-after-free on address 0x6180000e8ca8 at pc 0x0001139150b8 bp 0x700006891150 sp 0x700006891148
READ of size 8 at 0x6180000e8ca8 thread T7
#0 0x1139150b7 in mojo::BindingSetBase<ui::mojom::Gpu, mojo::Binding<ui::mojom::Gpu, mojo::RawPtrImplRefTraits<ui::mojom::Gpu> >, void>::AddBindingImpl(ui::mojom::Gpu*, mojo::InterfaceRequest<ui::mojom::Gpu>, bool) ??:0:0
#1 0x11391209b in content::GpuClient::Add(mojo::InterfaceRequest<ui::mojom::Gpu>) ??:0:0
#2 0x113f2990a in base::internal::Invoker<base::internal::BindState<void (content::GpuClient::*)(mojo::InterfaceRequest<ui::mojom::Gpu>), base::internal::UnretainedWrapper<content::GpuClient> >, void (mojo::InterfaceRequest<ui::mojom::Gpu>)>::Run(base::internal::BindStateBase*, mojo::InterfaceRequest<ui::mojom::Gpu>&&) ??:0:0
#3 0x113f29f08 in service_manager::CallbackBinder<ui::mojom::Gpu>::BindInterface(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mojo::ScopedHandleBase<mojo::MessagePipeHandle>) ??:0:0
#4 0x110744868 in service_manager::BinderRegistryWithArgs<>::BindInterface(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mojo::ScopedHandleBase<mojo::MessagePipeHandle>) ??:0:0
#5 0x113f05279 in content::RenderProcessHostImpl::ConnectionFilterImpl::OnBindInterface(service_manager::BindSourceInfo const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mojo::ScopedHandleBase<mojo::MessagePipeHandle>*, service_manager::Connector*) ??:0:0
#6 0x1117a85b1 in content::ServiceManagerConnectionImpl::IOThreadContext::OnBindInterface(service_manager::BindSourceInfo const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mojo::ScopedHandleBase<mojo::MessagePipeHandle>) ??:0:0
#7 0x11a29269c in service_manager::ForwardingService::OnBindInterface(service_manager::BindSourceInfo const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mojo::ScopedHandleBase<mojo::MessagePipeHandle>) ??:0:0
#8 0x11a295075 in service_manager::ServiceContext::OnBindInterface(service_manager::BindSourceInfo const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mojo::ScopedHandleBase<mojo::MessagePipeHandle>, base::OnceCallback<void ()>) ??:0:0
#9 0x11be1c586 in service_manager::mojom::ServiceStubDispatch::AcceptWithResponder(service_manager::mojom::Service*, mojo::Message*, std::__1::unique_ptr<mojo::MessageReceiverWithStatus, std::__1::default_delete<mojo::MessageReceiverWithStatus> >) ??:0:0
#10 0x11a297510 in service_manager::mojom::ServiceStub<mojo::RawPtrImplRefTraits<service_manager::mojom::Service> >::AcceptWithResponder(mojo::Message*, std::__1::unique_ptr<mojo::MessageReceiverWithStatus, std::__1::default_delete<mojo::MessageReceiverWithStatus> >) ??:0:0
#11 0x11bd26b84 in mojo::InterfaceEndpointClient::HandleValidatedMessage(mojo::Message*) ??:0:0
#12 0x11bd25480 in mojo::FilterChain::Accept(mojo::Message*) ??:0:0
#13 0x11bd2a77b in mojo::InterfaceEndpointClient::HandleIncomingMessage(mojo::Message*) ??:0:0
...
0x6180000e8ca8 is located 40 bytes inside of 856-byte region [0x6180000e8c80,0x6180000e8fd8)
freed by thread T7 here:
#0 0x1382086c2 in __sanitizer_finish_switch_fiber ??:0:0
#1 0x117dbd6a8 in base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) ??:0:0
#2 0x117e3a5ff in base::internal::IncomingTaskQueue::RunTask(base::PendingTask*) ??:0:0
#3 0x117e44e4e in base::MessageLoop::RunTask(base::PendingTask*) ??:0:0
...
previously allocated by thread T0 here:
#0 0x1382080c2 in __sanitizer_finish_switch_fiber ??:0:0
#1 0x113ecc924 in content::RenderProcessHostImpl::RenderProcessHostImpl(content::BrowserContext*, content::StoragePartitionImpl*, bool) ??:0:0
#2 0x113ef9b5b in content::(anonymous namespace)::SpareRenderProcessHostManager::WarmupSpareRenderProcessHost(content::BrowserContext*) ??:0:0
#3 0x113efcc9f in content::RenderProcessHostImpl::GetProcessHostForSiteInstance(content::BrowserContext*, content::SiteInstanceImpl*) ??:0:0
#4 0x1142996f9 in content::SiteInstanceImpl::GetProcess() ??:0:0
#5 0x114358e83 in content::WebContentsImpl::Init(content::WebContents::CreateParams const&) ??:0:0
#6 0x11433848b in content::WebContents::CreateWithSessionStorage(...) ??:0:0
...
,
Apr 23 2018
,
Apr 23 2018
+zmo@ and piman@ - could you PTAL since you've added in r520719 the code that uses base::Unretained(gpu_client_.get()): if (gpu_client_) { // |gpu_client_| outlives the registry, because its destruction is posted to // IO thread from the destructor of |this|. registry->AddInterface( base::Bind(&GpuClient::Add, base::Unretained(gpu_client_.get()))); }
,
Apr 23 2018
UAF -> P1.
,
Apr 24 2018
There are similar failures with SessionRestoreTest.RestoreForeignSession. https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.memory%2FMac_ASan_64_Tests__1_%2F40217%2F%2B%2Frecipes%2Fsteps%2Fbrowser_tests%2F0%2Flogs%2FSessionRestoreTest.RestoreForeignSession%2F0
,
Apr 24 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4572fe082970e7106968514059358385f0d60b58 commit 4572fe082970e7106968514059358385f0d60b58 Author: Lukasz Anforowicz <lukasza@chromium.org> Date: Tue Apr 24 01:42:45 2018 Disable tests that flakily repro a UaF when run with --site-per-process. Bug: 835577 , 835578 Change-Id: I4461cec21b31ee9945afe4e2b1df7c89d9905168 Reviewed-on: https://chromium-review.googlesource.com/1024401 Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Scott Violet <sky@chromium.org> Cr-Commit-Position: refs/heads/master@{#552949} [modify] https://crrev.com/4572fe082970e7106968514059358385f0d60b58/chrome/browser/sessions/session_restore_browsertest.cc [modify] https://crrev.com/4572fe082970e7106968514059358385f0d60b58/chrome/browser/sessions/tab_restore_browsertest.cc
,
Apr 26 2018
Very unfortunately, I am not able to reproduce this flaky crash locally. I build ASAN release build of browser_tests, and run with --site-per-process, and ran many times, but no luck. I am reading code a bit to see if I can find a potential racing here.
,
Apr 26 2018
+rockot@ for mojo expertise. #c3 seems to indicate that interface binding requests are still handled on the IO thread after RenderProcessHostImpl::gpu_client_ got deleted on the IO thread. Deletion of gpu_client_ is probably triggered by an earlier RenderProcessHostImpl::Cleanup call on the UI thread. In theory RenderProcessHostImpl::Cleanup should stop handling interface binding requests, but I see that a doc comment for ServiceManagerConnection::RemoveConnectionFilter says the removal (and destruction) happens *asynchronously* on the IO thread. This is worrisome, because RPHI should not be deleted until we can guarantee no more interface binding requests (judging by the amount of base::Unretained uses in RenderProcessHostImpl::RegisterMojoInterfaces).
,
Apr 26 2018
Hmmm... but even with the theory from #c10, I think that 1) IOThreadContext::RemoveConnectionFilterOnIOThread gets posted to the IO thread first (indirectly from RenderProcessHostImpl::Cleanup) before 2) deletion of RenderProcessHostImpl::gpu_client_ gets posted to the UI thread (from destructor of RenderProcessHostImpl via std::unique_ptr<GpuClient, BrowserThread::DeleteOnIOThread>). So, I guess I still don't understand how the UaF can happen...
,
Apr 26 2018
gpu_client_ is deleted on RenderProcessHostImpl's destruction time, by posting the actual deleting to IO thread (it's a std::unique_ptr<_, BrowserThread::DeleteOnIOThread>). The mojo interface registry is removed in Cleanup(). So it has to be a situation where RenderProcessHostImpl is destroyed, but Cleanup() isn't called somehow. I have a CL (not for landing) trying with multiple ASAN bots and see if we can make sense of why this is happening: https://chromium-review.googlesource.com/c/chromium/src/+/1030903
,
Apr 26 2018
Ooops... I meant: before 2) deletion of RenderProcessHostImpl::gpu_client_ gets posted to the *IO* thread.
,
Apr 26 2018
The filter is synchronously disabled from within Cleanup anyway, so I agree that this must mean an RPHI is destroyed without first calling Cleanup().
,
Apr 27 2018
OK, the situation is we called RegisterMojoInterface() twice on the RPHI without calling Cleanup(). I am still trying to upload a new CL to confirm why this happened, but looking at code, it seems in ProcessDies(), if delayed_cleanup_needed_ is false, then Cleanup() will not be called, but is_dead_ is set to true, which allows Init() to be called and in turn RegisterMojoInterface() to be called. So the second time we call RegisterMojoInterface, we clear the previous connection_filter_id_ without removing the registry, so in theory it is still alive and there may be a relayed message pending. I'll confirm when my new CL gathers some try bot results.
,
Apr 27 2018
Confirmed, it is ProcessDied() but Cleanup() isn't called. Now we can re-enter Init() to call RegisterMojoInterface(). Per offline discussion with rockot@, the fix will be in RegisterMojoInterface, we clear the previous one if it exists.
,
Apr 30 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0531101913581ee291a66463133de9233d3e26f3 commit 0531101913581ee291a66463133de9233d3e26f3 Author: Zhenyao Mo <zmo@chromium.org> Date: Mon Apr 30 21:16:32 2018 Always remove previous connection filter in RenderProcessHostImpl before creating a new one. Otherwise the previous one will outlive the RenderProcessHostImpl and may trigger flaky crashes in some browser_tests. BUG= 835577 TEST=asan bots R=rockot@chromium.org,piman@chromium.org TBR=sky@chromium.org Change-Id: Ie4c186866a7ada2dff523e3d9168f44b97503472 Reviewed-on: https://chromium-review.googlesource.com/1034017 Reviewed-by: Zhenyao Mo <zmo@chromium.org> Reviewed-by: Ken Rockot <rockot@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Commit-Queue: Zhenyao Mo <zmo@chromium.org> Cr-Commit-Position: refs/heads/master@{#554876} [modify] https://crrev.com/0531101913581ee291a66463133de9233d3e26f3/chrome/browser/sessions/session_restore_browsertest.cc [modify] https://crrev.com/0531101913581ee291a66463133de9233d3e26f3/chrome/browser/sessions/tab_restore_browsertest.cc [modify] https://crrev.com/0531101913581ee291a66463133de9233d3e26f3/content/browser/renderer_host/render_process_host_impl.cc
,
Apr 30 2018
,
Apr 30 2018
Is it too late to merge to M67?
,
Apr 30 2018
Pls apply appropriate OSs label. Thank you.
,
Apr 30 2018
,
Apr 30 2018
This bug requires manual review: M67 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 30 2018
Which Cls you're requesting a merge for? How safe they're? And why we need to merge? Note: CL listed at #17 landed in trunk 2 hrs back so not in canary yet.
,
Apr 30 2018
The first CL is disabling the failing test -> no need to merge. The second CL fixes the issue, re-enabling the test -> we need to merge the fix. This is a memory use-after-free case, so a potential security risk. From my analysis, this can happen to real users, so I highly recommend merging it. The fix is very localized, so I consider it pretty safe.
,
Apr 30 2018
Thank you zmo@. Pls update the bug with canary resul tomorrow for second CL listed at #17. +awhalley@ for M67 merge review (PTAL comment #24).
,
Apr 30 2018
Thanks for flagging, setting labels given UAF. We should wait to have this in Canary for a few days before merging, but I agree we should then take it in 67.
,
May 1 2018
The NextAction date has arrived: 2018-05-01
,
May 1 2018
,
May 1 2018
My Canary on Windows updated to include the fix. I visited a bunch of sites, things seem working fine to me.
,
May 1 2018
govind - good for 67
,
May 1 2018
Approving merge for CL listed at #17 to M67 branch 3396 based on comments #29 and #30. Please merge ASAP so we can pick it up for tomorrow's Beta release. Thank you.
,
May 1 2018
Per offline discussion with govind@, I'll merge CLs in #8 and #17. #8 is simply disabling two flaky tests, and #17 is negating #8 plus bug fix. Merging both can avoid manual editing of CL in #17.
,
May 1 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a556e346add4a254d4affc0584adcf46a37c0fd8 commit a556e346add4a254d4affc0584adcf46a37c0fd8 Author: Lukasz Anforowicz <lukasza@chromium.org> Date: Tue May 01 17:17:50 2018 Disable tests that flakily repro a UaF when run with --site-per-process. Bug: 835577 , 835578 Change-Id: I4461cec21b31ee9945afe4e2b1df7c89d9905168 Reviewed-on: https://chromium-review.googlesource.com/1024401 Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org> Reviewed-by: Scott Violet <sky@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#552949}(cherry picked from commit 4572fe082970e7106968514059358385f0d60b58) Reviewed-on: https://chromium-review.googlesource.com/1037603 Reviewed-by: Zhenyao Mo <zmo@chromium.org> Cr-Commit-Position: refs/branch-heads/3396@{#408} Cr-Branched-From: 9ef2aa869bc7bc0c089e255d698cca6e47d6b038-refs/heads/master@{#550428} [modify] https://crrev.com/a556e346add4a254d4affc0584adcf46a37c0fd8/chrome/browser/sessions/session_restore_browsertest.cc [modify] https://crrev.com/a556e346add4a254d4affc0584adcf46a37c0fd8/chrome/browser/sessions/tab_restore_browsertest.cc
,
May 1 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c04550754b1f1ef8900d8471302d6591f5dbe113 commit c04550754b1f1ef8900d8471302d6591f5dbe113 Author: Zhenyao Mo <zmo@chromium.org> Date: Tue May 01 17:18:29 2018 Always remove previous connection filter in RenderProcessHostImpl before creating a new one. Otherwise the previous one will outlive the RenderProcessHostImpl and may trigger flaky crashes in some browser_tests. BUG= 835577 TEST=asan bots R=rockot@chromium.org,piman@chromium.org TBR=sky@chromium.org Change-Id: Ie4c186866a7ada2dff523e3d9168f44b97503472 Reviewed-on: https://chromium-review.googlesource.com/1034017 Reviewed-by: Zhenyao Mo <zmo@chromium.org> Reviewed-by: Ken Rockot <rockot@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Commit-Queue: Zhenyao Mo <zmo@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#554876}(cherry picked from commit 0531101913581ee291a66463133de9233d3e26f3) Reviewed-on: https://chromium-review.googlesource.com/1037643 Cr-Commit-Position: refs/branch-heads/3396@{#409} Cr-Branched-From: 9ef2aa869bc7bc0c089e255d698cca6e47d6b038-refs/heads/master@{#550428} [modify] https://crrev.com/c04550754b1f1ef8900d8471302d6591f5dbe113/chrome/browser/sessions/session_restore_browsertest.cc [modify] https://crrev.com/c04550754b1f1ef8900d8471302d6591f5dbe113/chrome/browser/sessions/tab_restore_browsertest.cc [modify] https://crrev.com/c04550754b1f1ef8900d8471302d6591f5dbe113/content/browser/renderer_host/render_process_host_impl.cc
,
Aug 7
This bug has been closed for more than 14 weeks. Removing security view restrictions. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by Findit
, Apr 21 2018