New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 835577 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: 2018-05-01
OS: Linux , Android , Windows , Chrome , Mac , Fuchsia
Pri: 1
Type: Bug-Security
Flaky-Test: TabRestoreTest.RestoreFirstBrowserWhenSessionServiceEnabled



Sign in to add a comment

Flaky UaF when running TabRestoreTest.RestoreFirstBrowserWhenSessionServiceEnabled

Project Member Reported by Findit, Apr 21 2018

Issue description

Findit has detected a flake at test TabRestoreTest.RestoreFirstBrowserWhenSessionServiceEnabled.

Culprit (70.0% confidence): https://chromium-review.googlesource.com/q/Ie87e48658864203d33d1ae647fb2d98a7c488765
Regression range: None

Analysis: https://findit-for-me.appspot.com/waterfall/flake?key=ag9zfmZpbmRpdC1mb3ItbWVywQELEhdNYXN0ZXJGbGFrZUFuYWx5c2lzUm9vdCKKAWNocm9taXVtLm1lbW9yeS9NYWMgQVNhbiA2NCBUZXN0cyAoMSkvNDAxNzYvYnJvd3Nlcl90ZXN0cy9WR0ZpVW1WemRHOXlaVlJsYzNRdVVtVnpkRzl5WlVacGNuTjBRbkp2ZDNObGNsZG9aVzVUWlhOemFXOXVVMlZ5ZG1salpVVnVZV0pzWldRPQwLEhNNYXN0ZXJGbGFrZUFuYWx5c2lzGAEM

If this result was incorrect, apply the label Findit-Incorrect-Result, mark the bug as Untriaged and the component Tools>Test>Findit>Flakiness.
 
Project Member

Comment 1 by Findit, Apr 21 2018

Findit identified the culprit r552589 with confidence 70.0% in the config "chromium.memory / Mac ASan 64 Tests (1)"
based on the flakiness trend:

https://findit-for-me.appspot.com/waterfall/flake?key=ag9zfmZpbmRpdC1mb3ItbWVywQELEhdNYXN0ZXJGbGFrZUFuYWx5c2lzUm9vdCKKAWNocm9taXVtLm1lbW9yeS9NYWMgQVNhbiA2NCBUZXN0cyAoMSkvNDAxNzYvYnJvd3Nlcl90ZXN0cy9WR0ZpVW1WemRHOXlaVlJsYzNRdVVtVnpkRzl5WlVacGNuTjBRbkp2ZDNObGNsZG9aVzVUWlhOemFXOXVVMlZ5ZG1salpVVnVZV0pzWldRPQwLEhNNYXN0ZXJGbGFrZUFuYWx5c2lzGAEM


Automatically posted by the findit-for-me app (https://goo.gl/Ot9f7N).
Flake Analyzer is in beta.
Feedback is welcome! Please use component Tools>Test>FindIt>Flakiness
Labels: -Sheriff-Chromium
Owner: lukasza@chromium.org
Summary: Flaky UaF when running TabRestoreTest.RestoreFirstBrowserWhenSessionServiceEnabled (was: TabRestoreTest.RestoreFirstBrowserWhenSessionServiceEnabled is Flaky)
==59818==ERROR: AddressSanitizer: heap-use-after-free on address 0x6180000e8ca8 at pc 0x0001139150b8 bp 0x700006891150 sp 0x700006891148
READ of size 8 at 0x6180000e8ca8 thread T7
    #0 0x1139150b7 in mojo::BindingSetBase<ui::mojom::Gpu, mojo::Binding<ui::mojom::Gpu, mojo::RawPtrImplRefTraits<ui::mojom::Gpu> >, void>::AddBindingImpl(ui::mojom::Gpu*, mojo::InterfaceRequest<ui::mojom::Gpu>, bool) ??:0:0
    #1 0x11391209b in content::GpuClient::Add(mojo::InterfaceRequest<ui::mojom::Gpu>) ??:0:0
    #2 0x113f2990a in base::internal::Invoker<base::internal::BindState<void (content::GpuClient::*)(mojo::InterfaceRequest<ui::mojom::Gpu>), base::internal::UnretainedWrapper<content::GpuClient> >, void (mojo::InterfaceRequest<ui::mojom::Gpu>)>::Run(base::internal::BindStateBase*, mojo::InterfaceRequest<ui::mojom::Gpu>&&) ??:0:0
    #3 0x113f29f08 in service_manager::CallbackBinder<ui::mojom::Gpu>::BindInterface(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mojo::ScopedHandleBase<mojo::MessagePipeHandle>) ??:0:0
    #4 0x110744868 in service_manager::BinderRegistryWithArgs<>::BindInterface(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mojo::ScopedHandleBase<mojo::MessagePipeHandle>) ??:0:0
    #5 0x113f05279 in content::RenderProcessHostImpl::ConnectionFilterImpl::OnBindInterface(service_manager::BindSourceInfo const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mojo::ScopedHandleBase<mojo::MessagePipeHandle>*, service_manager::Connector*) ??:0:0
    #6 0x1117a85b1 in content::ServiceManagerConnectionImpl::IOThreadContext::OnBindInterface(service_manager::BindSourceInfo const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mojo::ScopedHandleBase<mojo::MessagePipeHandle>) ??:0:0
    #7 0x11a29269c in service_manager::ForwardingService::OnBindInterface(service_manager::BindSourceInfo const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mojo::ScopedHandleBase<mojo::MessagePipeHandle>) ??:0:0
    #8 0x11a295075 in service_manager::ServiceContext::OnBindInterface(service_manager::BindSourceInfo const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, mojo::ScopedHandleBase<mojo::MessagePipeHandle>, base::OnceCallback<void ()>) ??:0:0
    #9 0x11be1c586 in service_manager::mojom::ServiceStubDispatch::AcceptWithResponder(service_manager::mojom::Service*, mojo::Message*, std::__1::unique_ptr<mojo::MessageReceiverWithStatus, std::__1::default_delete<mojo::MessageReceiverWithStatus> >) ??:0:0
    #10 0x11a297510 in service_manager::mojom::ServiceStub<mojo::RawPtrImplRefTraits<service_manager::mojom::Service> >::AcceptWithResponder(mojo::Message*, std::__1::unique_ptr<mojo::MessageReceiverWithStatus, std::__1::default_delete<mojo::MessageReceiverWithStatus> >) ??:0:0
    #11 0x11bd26b84 in mojo::InterfaceEndpointClient::HandleValidatedMessage(mojo::Message*) ??:0:0
    #12 0x11bd25480 in mojo::FilterChain::Accept(mojo::Message*) ??:0:0
    #13 0x11bd2a77b in mojo::InterfaceEndpointClient::HandleIncomingMessage(mojo::Message*) ??:0:0
...

0x6180000e8ca8 is located 40 bytes inside of 856-byte region [0x6180000e8c80,0x6180000e8fd8)
freed by thread T7 here:
    #0 0x1382086c2 in __sanitizer_finish_switch_fiber ??:0:0
    #1 0x117dbd6a8 in base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) ??:0:0
    #2 0x117e3a5ff in base::internal::IncomingTaskQueue::RunTask(base::PendingTask*) ??:0:0
    #3 0x117e44e4e in base::MessageLoop::RunTask(base::PendingTask*) ??:0:0
...

previously allocated by thread T0 here:
    #0 0x1382080c2 in __sanitizer_finish_switch_fiber ??:0:0
    #1 0x113ecc924 in content::RenderProcessHostImpl::RenderProcessHostImpl(content::BrowserContext*, content::StoragePartitionImpl*, bool) ??:0:0
    #2 0x113ef9b5b in content::(anonymous namespace)::SpareRenderProcessHostManager::WarmupSpareRenderProcessHost(content::BrowserContext*) ??:0:0
    #3 0x113efcc9f in content::RenderProcessHostImpl::GetProcessHostForSiteInstance(content::BrowserContext*, content::SiteInstanceImpl*) ??:0:0
    #4 0x1142996f9 in content::SiteInstanceImpl::GetProcess() ??:0:0
    #5 0x114358e83 in content::WebContentsImpl::Init(content::WebContents::CreateParams const&) ??:0:0
    #6 0x11433848b in content::WebContents::CreateWithSessionStorage(...) ??:0:0
...
Cc: lukasza@chromium.org
 Issue 835578  has been merged into this issue.
Cc: piman@chromium.org
Owner: zmo@chromium.org
+zmo@ and piman@ - could you PTAL since you've added in r520719 the code that uses base::Unretained(gpu_client_.get()):

	  if (gpu_client_) {
	    // |gpu_client_| outlives the registry, because its destruction is posted to
	    // IO thread from the destructor of |this|.
	    registry->AddInterface(
	        base::Bind(&GpuClient::Add, base::Unretained(gpu_client_.get())));
	  }
Components: Internals>GPU
Labels: -Pri-3 Pri-1
UAF -> P1.
Project Member

Comment 8 by bugdroid1@chromium.org, Apr 24 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4572fe082970e7106968514059358385f0d60b58

commit 4572fe082970e7106968514059358385f0d60b58
Author: Lukasz Anforowicz <lukasza@chromium.org>
Date: Tue Apr 24 01:42:45 2018

Disable tests that flakily repro a UaF when run with --site-per-process.

Bug:  835577 ,  835578 
Change-Id: I4461cec21b31ee9945afe4e2b1df7c89d9905168
Reviewed-on: https://chromium-review.googlesource.com/1024401
Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org>
Reviewed-by: Scott Violet <sky@chromium.org>
Cr-Commit-Position: refs/heads/master@{#552949}
[modify] https://crrev.com/4572fe082970e7106968514059358385f0d60b58/chrome/browser/sessions/session_restore_browsertest.cc
[modify] https://crrev.com/4572fe082970e7106968514059358385f0d60b58/chrome/browser/sessions/tab_restore_browsertest.cc

Comment 9 by zmo@chromium.org, Apr 26 2018

Very unfortunately, I am not able to reproduce this flaky crash locally.

I build ASAN release build of browser_tests, and run with --site-per-process, and ran many times, but no luck.

I am reading code a bit to see if I can find a potential racing here.
Cc: roc...@chromium.org
+rockot@ for mojo expertise.  #c3 seems to indicate that interface binding requests are still handled on the IO thread after RenderProcessHostImpl::gpu_client_ got deleted on the IO thread.  Deletion of gpu_client_ is probably triggered by an earlier RenderProcessHostImpl::Cleanup call on the UI thread.  In theory RenderProcessHostImpl::Cleanup should stop handling interface binding requests, but I see that a doc comment for ServiceManagerConnection::RemoveConnectionFilter says the removal (and destruction) happens *asynchronously* on the IO thread.  This is worrisome, because RPHI should not be deleted until we can guarantee no more interface binding requests (judging by the amount of base::Unretained uses in RenderProcessHostImpl::RegisterMojoInterfaces).
Hmmm... but even with the theory from #c10, I think that 1) IOThreadContext::RemoveConnectionFilterOnIOThread gets posted to the IO thread first (indirectly from RenderProcessHostImpl::Cleanup) before 2) deletion of RenderProcessHostImpl::gpu_client_ gets posted to the UI thread (from destructor of RenderProcessHostImpl via std::unique_ptr<GpuClient, BrowserThread::DeleteOnIOThread>).  So, I guess I still don't understand how the UaF can happen...

Comment 12 by zmo@chromium.org, Apr 26 2018

gpu_client_ is deleted on RenderProcessHostImpl's destruction time, by posting the actual deleting to IO thread (it's a std::unique_ptr<_, BrowserThread::DeleteOnIOThread>). The mojo interface registry is removed in Cleanup(). So it has to be a situation where RenderProcessHostImpl is destroyed, but Cleanup() isn't called somehow.

I have a CL (not for landing) trying with multiple ASAN bots and see if we can make sense of why this is happening:

https://chromium-review.googlesource.com/c/chromium/src/+/1030903
Ooops... I meant: before 2) deletion of RenderProcessHostImpl::gpu_client_ gets posted to the *IO* thread.
The filter is synchronously disabled from within Cleanup anyway, so I agree that this must mean an RPHI is destroyed without first calling Cleanup().

Comment 15 by zmo@chromium.org, Apr 27 2018

OK, the situation is we called RegisterMojoInterface() twice on the RPHI without calling Cleanup(). I am still trying to upload a new CL to confirm why this happened, but looking at code, it seems in ProcessDies(), if delayed_cleanup_needed_ is false, then Cleanup() will not be called, but is_dead_ is set to true, which allows Init() to be called and in turn RegisterMojoInterface() to be called.

So the second time we call RegisterMojoInterface, we clear the previous connection_filter_id_ without removing the registry, so in theory it is still alive and there may be a relayed message pending.

I'll confirm when my new CL gathers some try bot results.

Comment 16 by zmo@chromium.org, Apr 27 2018

Status: Started (was: Available)
Confirmed, it is ProcessDied() but Cleanup() isn't called. Now we can re-enter Init() to call RegisterMojoInterface(). Per offline discussion with rockot@, the fix will be in RegisterMojoInterface, we clear the previous one if it exists.
Project Member

Comment 17 by bugdroid1@chromium.org, Apr 30 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0531101913581ee291a66463133de9233d3e26f3

commit 0531101913581ee291a66463133de9233d3e26f3
Author: Zhenyao Mo <zmo@chromium.org>
Date: Mon Apr 30 21:16:32 2018

Always remove previous connection filter in RenderProcessHostImpl before creating a new one.

Otherwise the previous one will outlive the RenderProcessHostImpl and
may trigger flaky crashes in some browser_tests.

BUG= 835577 
TEST=asan bots
R=rockot@chromium.org,piman@chromium.org
TBR=sky@chromium.org

Change-Id: Ie4c186866a7ada2dff523e3d9168f44b97503472
Reviewed-on: https://chromium-review.googlesource.com/1034017
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Reviewed-by: Ken Rockot <rockot@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Commit-Queue: Zhenyao Mo <zmo@chromium.org>
Cr-Commit-Position: refs/heads/master@{#554876}
[modify] https://crrev.com/0531101913581ee291a66463133de9233d3e26f3/chrome/browser/sessions/session_restore_browsertest.cc
[modify] https://crrev.com/0531101913581ee291a66463133de9233d3e26f3/chrome/browser/sessions/tab_restore_browsertest.cc
[modify] https://crrev.com/0531101913581ee291a66463133de9233d3e26f3/content/browser/renderer_host/render_process_host_impl.cc

Comment 18 by zmo@chromium.org, Apr 30 2018

Status: Fixed (was: Started)

Comment 19 by zmo@chromium.org, Apr 30 2018

Labels: Merge-Request-67 M-67
Is it too late to merge to M67?
Pls apply appropriate OSs label. Thank you.

Comment 21 by zmo@chromium.org, Apr 30 2018

Labels: OS-Android OS-Chrome OS-Fuchsia OS-Linux OS-Mac OS-Windows
Project Member

Comment 22 by sheriffbot@chromium.org, Apr 30 2018

Labels: -Merge-Request-67 Merge-Review-67 Hotlist-Merge-Review
This bug requires manual review: M67 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Which Cls you're requesting a merge for? How safe they're? And why we need to merge?

Note: CL listed at #17 landed in trunk 2 hrs back so not in canary yet.

Comment 24 by zmo@chromium.org, Apr 30 2018

The first CL is disabling the failing test -> no need to merge.

The second CL fixes the issue, re-enabling the test -> we need to merge the fix.

This is a memory use-after-free case, so a potential security risk. From my analysis, this can happen to real users, so I highly recommend merging it.

The fix is very localized, so I consider it pretty safe.
Cc: awhalley@chromium.org
NextAction: 2018-05-01
Thank you zmo@. Pls update the bug with canary resul tomorrow for second CL listed at #17.

+awhalley@ for M67 merge review (PTAL comment #24).
Labels: -Type-Bug Security_Impact-Head Security_Severity-High Type-Bug-Security
Thanks for flagging, setting labels given UAF.

We should wait to have this in Canary for a few days before merging, but I agree we should then take it in 67.
The NextAction date has arrived: 2018-05-01
Project Member

Comment 28 by sheriffbot@chromium.org, May 1 2018

Labels: Restrict-View-SecurityNotify

Comment 29 by zmo@chromium.org, May 1 2018

My Canary on Windows updated to include the fix. I visited a bunch of sites, things seem working fine to me.
govind - good for 67
Labels: -Merge-Review-67 Merge-Approved-67
Approving merge for CL listed at #17 to M67 branch 3396 based on comments #29 and #30. Please merge ASAP so we can pick it up for tomorrow's Beta release. Thank you.

Comment 32 by zmo@chromium.org, May 1 2018

Per offline discussion with govind@, I'll merge CLs in #8 and #17.

#8 is simply disabling two flaky tests, and #17 is negating #8 plus bug fix.

Merging both can avoid manual editing of CL in #17.
Project Member

Comment 33 by bugdroid1@chromium.org, May 1 2018

Labels: -merge-approved-67 merge-merged-3396
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a556e346add4a254d4affc0584adcf46a37c0fd8

commit a556e346add4a254d4affc0584adcf46a37c0fd8
Author: Lukasz Anforowicz <lukasza@chromium.org>
Date: Tue May 01 17:17:50 2018

Disable tests that flakily repro a UaF when run with --site-per-process.

Bug:  835577 ,  835578 
Change-Id: I4461cec21b31ee9945afe4e2b1df7c89d9905168
Reviewed-on: https://chromium-review.googlesource.com/1024401
Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org>
Reviewed-by: Scott Violet <sky@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#552949}(cherry picked from commit 4572fe082970e7106968514059358385f0d60b58)
Reviewed-on: https://chromium-review.googlesource.com/1037603
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Cr-Commit-Position: refs/branch-heads/3396@{#408}
Cr-Branched-From: 9ef2aa869bc7bc0c089e255d698cca6e47d6b038-refs/heads/master@{#550428}
[modify] https://crrev.com/a556e346add4a254d4affc0584adcf46a37c0fd8/chrome/browser/sessions/session_restore_browsertest.cc
[modify] https://crrev.com/a556e346add4a254d4affc0584adcf46a37c0fd8/chrome/browser/sessions/tab_restore_browsertest.cc

Project Member

Comment 34 by bugdroid1@chromium.org, May 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c04550754b1f1ef8900d8471302d6591f5dbe113

commit c04550754b1f1ef8900d8471302d6591f5dbe113
Author: Zhenyao Mo <zmo@chromium.org>
Date: Tue May 01 17:18:29 2018

Always remove previous connection filter in RenderProcessHostImpl before creating a new one.

Otherwise the previous one will outlive the RenderProcessHostImpl and
may trigger flaky crashes in some browser_tests.

BUG= 835577 
TEST=asan bots
R=​rockot@chromium.org,piman@chromium.org
TBR=sky@chromium.org

Change-Id: Ie4c186866a7ada2dff523e3d9168f44b97503472
Reviewed-on: https://chromium-review.googlesource.com/1034017
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Reviewed-by: Ken Rockot <rockot@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Commit-Queue: Zhenyao Mo <zmo@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#554876}(cherry picked from commit 0531101913581ee291a66463133de9233d3e26f3)
Reviewed-on: https://chromium-review.googlesource.com/1037643
Cr-Commit-Position: refs/branch-heads/3396@{#409}
Cr-Branched-From: 9ef2aa869bc7bc0c089e255d698cca6e47d6b038-refs/heads/master@{#550428}
[modify] https://crrev.com/c04550754b1f1ef8900d8471302d6591f5dbe113/chrome/browser/sessions/session_restore_browsertest.cc
[modify] https://crrev.com/c04550754b1f1ef8900d8471302d6591f5dbe113/chrome/browser/sessions/tab_restore_browsertest.cc
[modify] https://crrev.com/c04550754b1f1ef8900d8471302d6591f5dbe113/content/browser/renderer_host/render_process_host_impl.cc

Project Member

Comment 35 by sheriffbot@chromium.org, Aug 7

Labels: -Restrict-View-SecurityNotify allpublic
This bug has been closed for more than 14 weeks. Removing security view restrictions.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Sign in to add a comment