New issue
Advanced search Search tips

Issue 646649 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Browser deadlock in tracing observer net::NetLog::DeprecatedRemoveObserver

Project Member Reported by alph@chromium.org, Sep 13 2016

Issue description

I've got browser deadlock while stopping trace from trace viewer. Here are IO and main threads.

#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fcaed1cc649 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007fcaed1cc470 in __GI___pthread_mutex_lock (mutex=0xf5f2c7a8388) at ../nptl/pthread_mutex_lock.c:79
#3  0x00007fcaef7d85bd in base::internal::LockImpl::Lock() ()
#4  0x00007fcaefbf4d5b in net::NetLog::DeprecatedRemoveObserver(net::NetLog::ThreadSafeObserver*) ()
#5  0x00007fcaef7fd20a in base::trace_event::TraceLog::SetDisabledWhileLocked() ()
#6  0x00007fcaef7fab54 in base::trace_event::TraceLog::ThreadLocalEventBuffer::AddTraceEvent(base::trace_event::TraceEventHandle*) ()
#7  0x00007fcaef7ff226 in base::trace_event::TraceLog::AddTraceEventWithThreadIdAndTimestamp(char, unsigned char const*, char const*, char const*, unsigned long long, unsigned long long, int, base::TimeTicks const&, int, char const**, unsigned char const*, unsigned long long const*, std::unique_ptr<base::trace_event::ConvertableToTraceFormat, std::default_delete<base::trace_event::ConvertableToTraceFormat> >*, unsigned int) ()
#8  0x00007fcaf243e6a1 in net::TraceNetLogObserver::OnAddEntry(net::NetLog::Entry const&) ()
#9  0x00007fcaefbf4b42 in net::NetLog::AddEntry(net::NetLogEventType, net::NetLog::Source const&, net::NetLogEventPhase, base::Callback<std::unique_ptr<base::Value, std::default_delete<base::Value> > (net::NetLogCaptureMode), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> const*) ()
#10 0x00007fcaefdabb6d in net::internal::ClientSocketPoolBaseHelper::HandOutSocket(std::unique_ptr<net::StreamSocket, std::default_delete<net::StreamSocket> >, net::ClientSocketHandle::SocketReuseType, net::LoadTimingInfo::ConnectTiming const&, net::ClientSocketHandle*, base::TimeDelta, net::internal::ClientSocketPoolBaseHelper::Group*, net::BoundNetLog const&) ()
#11 0x00007fcaefdad5e9 in net::internal::ClientSocketPoolBaseHelper::OnConnectJobComplete(int, net::ConnectJob*) ()
#12 0x00007fcaefda9f7a in net::ConnectJob::NotifyDelegateOfCompletion(int) ()
#13 0x00007fcaefbfa7cb in net::SSLClientSocketImpl::OnHandshakeIOComplete(int) ()
#14 0x00007fcaefc176e3 in net::CachingCertVerifier::OnRequestFinished(net::CertVerifier::RequestParams const&, base::Time, base::Callback<void (int), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> const&, net::CertVerifyResult*, int) ()
#15 0x00007fcaefc21a36 in net::CertVerifierJob::OnJobCompleted(std::unique_ptr<net::(anonymous namespace)::ResultHelper, std::default_delete<net::(anonymous namespace)::ResultHelper> >) ()
#16 0x00007fcaefc21f18 in base::internal::Invoker<base::internal::BindState<void (net::CertVerifierJob::*)(std::unique_ptr<net::(anonymous namespace)::ResultHelper, std::default_delete<net::(anonymous namespace)::ResultHelper> >), base::WeakPtr<net::CertVerifierJob>, base::internal::PassedWrapper<std::unique_ptr<net::(anonymous namespace)::ResultHelper, std::default_delete<net::(anonymous namespace)::ResultHelper> > > >, void ()>::Run(base::internal::BindStateBase*) ()
#17 0x00007fcaef7e3cc3 in base::(anonymous namespace)::PostTaskAndReplyRelay::RunReplyAndSelfDestruct() ()
#18 0x00007fcaef80b9f1 in base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask const&) ()
#19 0x00007fcaef7aff05 in base::MessageLoop::RunTask(base::PendingTask const&) ()
#20 0x00007fcaef7b0228 in base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) ()
#21 0x00007fcaef7b056b in base::MessageLoop::DoWork() ()
#22 0x00007fcaef7b20a9 in base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) ()
#23 0x00007fcaef7c946e in base::RunLoop::Run() ()
#24 0x00007fcaeec71366 in content::BrowserThreadImpl::IOThreadRun(base::RunLoop*) ()
#25 0x00007fcaeec71519 in content::BrowserThreadImpl::Run(base::RunLoop*) ()
#26 0x00007fcaef7e758d in base::Thread::ThreadMain() ()
#27 0x00007fcaef7e3aa3 in base::(anonymous namespace)::ThreadFunc(void*) ()
#28 0x00007fcaed1ca184 in start_thread (arg=0x7fcad6359700) at pthread_create.c:312
#29 0x00007fcae6d0537d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111



#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007fcaf03c5878 in mojo::edk::Waiter::Wait(unsigned long, unsigned long*) ()
#2  0x00007fcaf03aee8f in mojo::edk::Core::WaitManyInternal(unsigned int const*, unsigned int const*, unsigned int, unsigned long, unsigned int*, mojo::edk::HandleSignalsState*) ()
#3  0x00007fcaf03aeba7 in mojo::edk::Core::Wait(unsigned int, unsigned int, unsigned long, MojoHandleSignalsState*) ()
#4  0x00007fcaefaeaeb9 in mojo::SyncHandleRegistry::WatchAllHandles(bool const**, unsigned long) ()
#5  0x00007fcaeffcf820 in IPC::SyncChannel::WaitForReply(mojo::SyncHandleRegistry*, IPC::SyncChannel::SyncContext*, bool) ()
#6  0x00007fcaeffcf339 in IPC::SyncChannel::Send(IPC::Message*) ()
#7  0x00007fcaee2a6204 in gpu::GpuChannelHost::Send(IPC::Message*) ()
#8  0x00007fcaee2a1dc3 in gpu::CommandBufferProxyImpl::Initialize(scoped_refptr<gpu::GpuChannelHost>, GPUCreateCommandBufferConfig const&, scoped_refptr<base::SingleThreadTaskRunner>) ()
#9  0x00007fcaee2a1931 in gpu::CommandBufferProxyImpl::Create(scoped_refptr<gpu::GpuChannelHost>, unsigned long, gpu::CommandBufferProxyImpl*, int, gpu::GpuStreamPriority, gpu::gles2::ContextCreationAttribHelper const&, GURL const&, scoped_refptr<base::SingleThreadTaskRunner>) ()
#10 0x00007fcaee201a7a in content::ContextProviderCommandBuffer::BindToCurrentThread() ()
#11 0x00007fcaeef98cc8 in content::GpuProcessTransportFactory::EstablishedGpuChannel(base::WeakPtr<ui::Compositor>, bool, int, scoped_refptr<gpu::GpuChannelHost>) ()
#12 0x00007fcaeef9aa2e in void base::internal::FunctorTraits<void (content::GpuProcessTransportFactory::*)(base::WeakPtr<ui::Compositor>, bool, int, scoped_refptr<gpu::GpuChannelHost>), void>::Invoke<base::WeakPtr<content::GpuProcessTransportFactory> const&, base::WeakPtr<ui::Compositor> const&, bool const&, int const&, scoped_refptr<gpu::GpuChannelHost> >(void (content::GpuProcessTransportFactory::*)(base::WeakPtr<ui::Compositor>, bool, int, scoped_refptr<gpu::GpuChannelHost>), base::WeakPtr<content::GpuProcessTransportFactory> const&, base::WeakPtr<ui::Compositor> const&, bool const&, int const&, scoped_refptr<gpu::GpuChannelHost>&&) ()
#13 0x00007fcaeef9a996 in base::internal::Invoker<base::internal::BindState<void (content::GpuProcessTransportFactory::*)(base::WeakPtr<ui::Compositor>, bool, int, scoped_refptr<gpu::GpuChannelHost>), base::WeakPtr<content::GpuProcessTransportFactory>, base::WeakPtr<ui::Compositor>, bool, int>, void (scoped_refptr<gpu::GpuChannelHost>)>::Run(base::internal::BindStateBase*, scoped_refptr<gpu::GpuChannelHost>&&) ()
#14 0x00007fcaeed46ddc in content::BrowserGpuChannelHostFactory::EstablishGpuChannel(base::Callback<void (scoped_refptr<gpu::GpuChannelHost>), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> const&) ()
#15 0x00007fcaeef97f4b in content::GpuProcessTransportFactory::CreateOutputSurface(base::WeakPtr<ui::Compositor>) ()
#16 0x00007fcaf042352c in ui::Compositor::RequestNewOutputSurface() ()
#17 0x00007fcaef80b9f1 in base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask const&) ()
#18 0x00007fcaef7aff05 in base::MessageLoop::RunTask(base::PendingTask const&) ()
#19 0x00007fcaef7b0228 in base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) ()
#20 0x00007fcaef7b056b in base::MessageLoop::DoWork() ()
#21 0x00007fcaef7b1899 in base::MessagePumpGlib::Run(base::MessagePump::Delegate*) ()
#22 0x00007fcaef7c946e in base::RunLoop::Run() ()
#23 0x00007fcaef68643e in ChromeBrowserMainParts::MainMessageLoopRun(int*) ()
#24 0x00007fcaeec664e9 in content::BrowserMainLoop::RunMainMessageLoopParts() ()
#25 0x00007fcaeec6957d in content::BrowserMainRunnerImpl::Run() ()
#26 0x00007fcaeec61c0e in content::BrowserMain(content::MainFunctionParams const&) ()
#27 0x00007fcaef468df7 in content::ContentMainRunnerImpl::Run() ()
#28 0x00007fcaef4678e0 in content::ContentMain(content::ContentMainParams const&) ()
#29 0x00007fcaee13036b in ChromeMain ()
#30 0x00007fcae6c2cf45 in __libc_start_main (main=0x7fcaee130320 <main>, argc=4, argv=0x7ffce7514f08, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7ffce7514ef8) at libc-start.c:287
#31 0x00007fcaee13023d in _start ()


 

Comment 1 by mmenke@chromium.org, Sep 14 2016

The NetLog contract is that observers can't be added or removed while it's informing them of an event.  Looks like the TraceLog::CheckIfBufferIsFullWhileLocked call can violate this.
My reading is that the following happen:
- you are recording in "record until full" mode.
- at some point something tries to inject an event from net/
- that event is the last drop which fills the buffer. At this point we (tracing)  disable tracing while creating the event.
- in turn, disabling tracing calls inline (i.e. in the same call stack) the EnabledStateObserver list.
- But now we are re-entering the network stack, as we are in the current situation:

(some net/ action) -> (tracing::addEvent) -> (tracing::disable) -> (tracing::notifyObservers) -> ( TraceNetLogObserver::OnTraceLogDisabled()) -> (net::RemoveObserver)

The flaw here is in TraceNetLogObserver::OnTraceLogDisabled(). That code is assuming that OnTraceLogDisabled cannot be called from an event (so from within the net stack itself), which is not true.
Honestly, this is also the reason why I hate the current EnableStateObserver design, it's a pretty dangerous footgun.

I wonder if the new AddAsyncEnabledStateObserver could help here (the callbacks are posted, are not called inline). Maybe we should think a bit more and switch to that. 

Re #1 the problem here is not in TraceLog::CheckIfBufferIsFullWhileLocked violating any contract (TraceLog is in base/, how can it possibly violate the contract of something in /net? It doesn't even know about the existance of that).
The problem is in the wrong assumption made in TraceNetLogObserver::OnTraceLogDisabled().

On the other side the design of the current EnabledStateObserver is horrendous and very easy to get wrong. Like in this case :) That's why I want to rip it out
More context in  Issue 610629 

Comment 4 by mmenke@chromium.org, Sep 14 2016

Actually, I just meant that's the first call that clearly causes the violation of the contract - everything above that doesn't make that behavior obvious.

The re-entrancy behavior of TraceLog::AddTraceEventWithThreadIdAndTimestamp should probably also be documented (Or removed).
Re #4, ah sorry my misunderstanding.

>The re-entrancy behavior of TraceLog::AddTraceEventWithThreadIdAndTimestamp should probably also be documented (Or removed).
I'd really go for removed.

kouhei@ do you (or anybody else in this bug) have spare cycles? I think the solution here might turn out to be to just switch to  AsyncEnabledStateObserver, but requires some thinking to check if there any other assumption on the time where the callback is dispatched. I just don't have bandwidth for thinking this week (and will be OOO half of next one) :/

Comment 6 by kouhei@chromium.org, Sep 14 2016

Cc: kouhei@chromium.org
Owner: ----
Sorry I'm traveling + Japan is holiday week next week. On what occasion do we hit this bug?

Sign in to add a comment