New issue
Advanced search Search tips

Issue 898597 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Deadlock while tracing toplevel.flow category

Project Member Reported by gab@google.com, Oct 24

Issue description

Revision: r597922

Reliable repro:
 1) Launch chrome://tracing with "toplevel.flow" category
 2) Browse normally until tracing buffers are full => deadlock occurs then.


What's happening:

TaskAnnotator::WillQueueTask has a TRACE_EVENT if "toplevel.flow" is enabled.

It's possible for this TRACE_EVENT to end up calling TraceEventSystemStatsMonitor::OnTraceLogDisabled() which posts a task (&TraceEventSystemStatsMonitor::StopProfiling).

If the TaskRunner it's posting to is the same that was being posted to (and initially called TaskAnnotator::WillQueueTask()), the second post task will try to reacquire MessageLoopTaskRunner::incoming_task_queue_lock_ which this thread already owns. This will deadlock (base::Lock isn't reentrant).

Deadlock stack:

   ntdll!NtWaitForAlertByThreadId+0xc
   base!base::internal::LockImpl::Lock+0x40 [base\synchronization\lock_impl_win.cc @ 37] 
*  base!base::internal::MessageLoopTaskRunner::AddToIncomingQueue+0x2da [base\message_loop\message_loop_task_runner.cc @ 152] 
   base!base::internal::MessageLoopTaskRunner::PostDelayedTask+0x44 [base\message_loop\message_loop_task_runner.cc @ 55] 
   base!base::TaskRunner::PostTask+0x40 [base\task_runner.cc @ 44] 
   base!base::trace_event::TraceEventSystemStatsMonitor::OnTraceLogDisabled+0xd0 [base\trace_event\trace_event_system_stats_monitor.cc @ 87] 
   base!base::trace_event::TraceLog::SetDisabledWhileLocked+0x1d7 [base\trace_event\trace_log.cc @ 741] 
   base!base::trace_event::TraceLog::ThreadLocalEventBuffer::AddTraceEvent+0x1ba [base\trace_event\trace_log.cc @ 288] 
   base!base::trace_event::TraceLog::AddTraceEventWithThreadIdAndTimestamp+0x5fb [base\trace_event\trace_log.cc @ 1338] 
   base!base::debug::TaskAnnotator::WillQueueTask+0x159 [base\debug\task_annotator.cc @ 45] 
   base!base::MessageLoop::Controller::WillQueueTask+0x13 [base\message_loop\message_loop.cc @ 102] 
*  base!base::internal::MessageLoopTaskRunner::AddToIncomingQueue+0x2ff [base\message_loop\message_loop_task_runner.cc @ 161] 
   base!base::internal::MessageLoopTaskRunner::PostDelayedTask+0x44 [base\message_loop\message_loop_task_runner.cc @ 55] 
   content!content::`anonymous namespace'::BrowserThreadTaskRunner::PostDelayedTask+0x55 [content\browser\scheduler\browser_task_executor.cc @ 32] 
   content!content::BrowserTaskExecutor::PostDelayedTaskWithTraits+0xbb [content\browser\scheduler\browser_task_executor.cc @ 87] 
   base!base::PostDelayedTaskWithTraits+0xa9 [base\task\post_task.cc @ 85] 
   base!base::PostTaskWithTraits+0x47 [base\task\post_task.cc @ 76] 
** chrome!`anonymous namespace'::UpdateMetricsUsagePrefsOnUIThread+0x8d [chrome\browser\io_thread.cc @ 182] 
   chrome!base::internal::Invoker<...>::Run+0x14 [base\bind_internal.h @ 671] 
   chrome!data_use_measurement::DataUseMeasurement::UpdateDataUsePrefs+0x95 [components\data_use_measurement\core\data_use_measurement.cc @ 281] 
   chrome!data_use_measurement::DataUseMeasurement::OnCompleted+0x10 [components\data_use_measurement\core\data_use_measurement.cc @ 180] 
   chrome!data_use_measurement::DataUseNetworkDelegate::OnCompletedInternal+0x25 [components\data_use_measurement\core\data_use_network_delegate.cc @ 70] 
   net!net::LayeredNetworkDelegate::OnCompleted+0x1b [net\base\layered_network_delegate.cc @ 142] 
   net!net::NetworkDelegate::NotifyCompleted+0xc4 [net\base\network_delegate.cc @ 118] 
   net!net::NetworkDelegate::NotifyCompleted+0xc4 [net\base\network_delegate.cc @ 118] 
   net!net::NetworkDelegate::NotifyCompleted+0xc4 [net\base\network_delegate.cc @ 118] 
   net!net::URLRequest::NotifyRequestCompleted+0x3d [net\url_request\url_request.cc @ 1168] 
   net!net::URLRequest::Read+0xb4 [net\url_request\url_request.cc @ 786] 
   content!content::ResourceLoader::ReadMore+0xe8 [content\browser\loader\resource_loader.cc @ 757] 
   content!content::ResourceLoader::PrepareToReadMore+0xe0 [content\browser\loader\resource_loader.cc @ 744] 
   content!content::ResourceLoader::Resume+0x399 [content\browser\loader\resource_loader.cc @ 575] 
   content!content::ResourceLoader::ScopedDeferral::~ScopedDeferral+0x83 [content\browser\loader\resource_loader.cc @ 217] 
   content!content::ResourceLoader::CompleteRead+0x137 [content\browser\loader\resource_loader.cc @ 808] 
   content!content::ResourceLoader::OnReadCompleted+0x15a [content\browser\loader\resource_loader.cc @ 485] 
   net!net::URLRequest::NotifyReadCompleted+0x3a [net\url_request\url_request.cc @ 1129] 
   net!net::URLRequestJob::SourceStreamReadComplete+0x11e [net\url_request\url_request_job.cc @ 661] 
   net!base::internal::Invoker<...>::Run+0x37 [base\bind_internal.h @ 674] 
   net!base::OnceCallback<...>::Run+0x31 [base\callback.h @ 99] 
   net!net::URLRequestJob::ReadRawDataComplete+0x12e [net\url_request\url_request_job.cc @ 520] 
   net!net::URLRequestHttpJob::OnReadCompleted+0xc0 [net\url_request\url_request_http_job.cc @ 1074] 
   net!base::internal::Invoker<...>::Run+0xf [base\bind_internal.h @ 674] 
   net!base::OnceCallback<...>::Run+0x31 [base\callback.h @ 99] 
   net!net::HttpCache::Transaction::DoLoop+0xcc3 [net\http\http_cache_transaction.cc @ 1034] 
   net!base::internal::Invoker<...>::Run+0x30 [base\bind_internal.h @ 674] 
   net!base::OnceCallback<...>::Run+0x31 [base\callback.h @ 99] 
   net!net::HttpCache::Writers::DoLoop+0x297 [net\http\http_cache_writers.cc @ 369] 
   net!base::internal::Invoker<...>::RunOnce+0x30 [base\bind_internal.h @ 662] 
   net!base::OnceCallback<...>::Run+0x31 [base\callback.h @ 99] 
   net!disk_cache::InFlightBackendIO::OnOperationComplete+0x3a [net\disk_cache\blockfile\in_flight_backend_io.cc @ 579] 
   net!disk_cache::InFlightIO::InvokeCallback+0x173 [net\disk_cache\blockfile\in_flight_io.cc @ 101] 
   net!disk_cache::BackgroundIO::OnIOSignalled+0x14 [net\disk_cache\blockfile\in_flight_io.cc @ 26] 
   base!base::debug::TaskAnnotator::RunTask+0x12a [base\debug\task_annotator.cc @ 99] 
   base!base::MessageLoop::RunTask+0xf6 [base\message_loop\message_loop.cc @ 436] 
   base!base::MessageLoop::DeferOrRunPendingTask+0x53 [base\message_loop\message_loop.cc @ 445] 
   base!base::MessageLoop::DoWork+0xcf [base\message_loop\message_loop.cc @ 517] 
   base!base::MessagePumpForIO::DoRunLoop+0x85 [base\message_loop\message_pump_win.cc @ 513] 
   base!base::MessagePumpWin::Run+0x41 [base\message_loop\message_pump_win.cc @ 54] 
   base!base::MessageLoop::Run+0x77 [base\message_loop\message_loop.cc @ 388] 
   base!base::RunLoop::Run+0xcc [base\run_loop.cc @ 105] 
   base!base::Thread::Run+0xa4 [base\threading\thread.cc @ 262] 
   content!content::BrowserProcessSubThread::IOThreadRun+0x25 [content\browser\browser_process_sub_thread.cc @ 175] 
   content!content::BrowserProcessSubThread::Run+0xb0 [content\browser\browser_process_sub_thread.cc @ 132] 
   base!base::Thread::ThreadMain+0x283 [base\threading\thread.cc @ 357] 
   base!base::`anonymous namespace'::ThreadFunc+0xa5 [base\threading\platform_thread_win.cc @ 103] 
   KERNEL32!BaseThreadInitThunk+0x24
   ntdll!RtlGetAppContainerNamedObjectPath+0x137
   ntdll!RtlGetAppContainerNamedObjectPath+0x107

*  Can't PostTask within a PostTask
** In every repro the bottom of the stack is different, but the key part is a task posting from thread T to the main thread and tracing doing an problematic nested post task to the main thread as a result

 
Description: Show this description
Components: Speed>Tracing
We've got a similar problem with the Perfetto backend with a not so great workaround in place[1], so it seems like we should come up with a general solution to this. Disallow tracing in non-reentrant APIs used by tracing -- or at least inside locked sections? Or some kind of deferred call mechanism inside tracing that knows to avoid this kind of re-entrancy? Other ideas?

[1] https://cs.chromium.org/chromium/src/services/tracing/public/cpp/perfetto/trace_event_data_source.cc?rcl=d133f2ce1ec9deaf918b150a7ebdf1b084479673&l=173
Status: Assigned (was: Untriaged)
This issue has an owner, a component and a priority, but is still listed as untriaged or unconfirmed. By definition, this bug is triaged. Changing status to "assigned". Please reach out to me if you disagree with how I've done this.

Sign in to add a comment