Perfetto: calling the TraceWriter dtor in the TLS dtor causes a CHECK() |
|||
Issue descriptionJust debugged this crash with Bruce on Windows. The problem is the following: 1) We have one TraceWriter instance per thread in a TLS. We register the TLS slot dtor to destroy the TraceWrite once done. 2) When the thread exists, TraceWriter dtor is invoked 3) TraceWriter dtor will try to do a Flush() (see https://android.googlesource.com/platform/external/perfetto/+/master/src/tracing/core/trace_writer_impl.cc#59) 4) Flush will try to send an IPC, which will end up in mojo, which will posttask something Now, the problem is that, when the TLS destructor runs, the order on which things are destroyed is unpredictable. On Windows I think we got into the state where TaskRunnerHandle::Get() was hitting a check, because the TLS that contain the current TaskRunner has been destroyed before. Cannot tell right now whether this is: A) A bug in Perfetto, and we shouldn't try to flush() on TraceWriter dtor. 2) A bug in the way chrome embeds it. Let's think a bit more on this.
,
May 18 2018
Hmm. Maybe rather than destroying the TW in the slot destructor, we should just post a DeleteSoon() to the taskrunner we're running Perfetto on?
,
May 21 2018
I think disabling the flush is causing the unittest to fail, because they test precisely that we don't lose the leftover in the buffer! > Maybe rather than destroying the TW in the slot destructor, we should just post a DeleteSoon() to the taskrunner we're running Perfetto on? Yeah the more I think about this the more I lean towards this. When we do that we should have a way to reset the threadchecker, to say "don't worry you are going to be moved "
,
May 24 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f commit 2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f Author: Oystein Eftevaag <oysteine@google.com> Date: Thu May 24 01:53:36 2018 Perfetto: Fix ProducerClient threading/sequencing issues * TraceWriters will no longer be destroyed when their TLS slot is cleaned up on thread shutdown as this can trigger a CommitData() call from Perfetto and cause asserts when Mojo does a TaskRunnerHandle::Get() for a PostTask; instead we leave the deletion to the taskrunner Perfetto is running on (if it's getting shut down, the task should get dropped). * Run ProducerClients on their own sequence. Bug: 844379 Change-Id: Ie45b36a4e463674122555dbec2105953a882f37c Reviewed-on: https://chromium-review.googlesource.com/1069528 Reviewed-by: Primiano Tucci <primiano@chromium.org> Commit-Queue: oysteine <oysteine@chromium.org> Cr-Commit-Position: refs/heads/master@{#561356} [modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/base/trace_event/trace_log.cc [modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/base/trace_event/trace_log.h [modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/perfetto/perfetto_integration_unittest.cc [modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/perfetto/perfetto_service.h [modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/perfetto/producer_host.cc [modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/perfetto/producer_client.cc [modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/perfetto/producer_client.h [modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/perfetto/task_runner.cc [modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/perfetto/task_runner.h [modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/perfetto/trace_event_data_source.cc [modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/perfetto/trace_event_data_source_unittest.cc [modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/trace_event_agent.cc
,
May 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5bf292009901cbaecf0b685582a9d0c8c1628ef9 commit 5bf292009901cbaecf0b685582a9d0c8c1628ef9 Author: Olga Sharonova <olka@chromium.org> Date: Fri May 25 10:31:23 2018 Revert "Perfetto: Fix ProducerClient threading/sequencing issues" This reverts commit 2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f. Reason for revert: Suspected cause of Issue 846609 Original change's description: > Perfetto: Fix ProducerClient threading/sequencing issues > > * TraceWriters will no longer be destroyed when their TLS slot is cleaned > up on thread shutdown as this can trigger a CommitData() call from > Perfetto and cause asserts when Mojo does a TaskRunnerHandle::Get() > for a PostTask; instead we leave the deletion to the taskrunner > Perfetto is running on (if it's getting shut down, the task should > get dropped). > * Run ProducerClients on their own sequence. > > Bug: 844379 > Change-Id: Ie45b36a4e463674122555dbec2105953a882f37c > Reviewed-on: https://chromium-review.googlesource.com/1069528 > Reviewed-by: Primiano Tucci <primiano@chromium.org> > Commit-Queue: oysteine <oysteine@chromium.org> > Cr-Commit-Position: refs/heads/master@{#561356} TBR=primiano@chromium.org,oysteine@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 844379 Change-Id: I82eb4e335521ef8f39119bdd2a984f4dc81ac88c Reviewed-on: https://chromium-review.googlesource.com/1073247 Reviewed-by: Olga Sharonova <olka@chromium.org> Commit-Queue: Olga Sharonova <olka@chromium.org> Cr-Commit-Position: refs/heads/master@{#561823} [modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/base/trace_event/trace_log.cc [modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/base/trace_event/trace_log.h [modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/perfetto/perfetto_integration_unittest.cc [modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/perfetto/perfetto_service.h [modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/perfetto/producer_host.cc [modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/perfetto/producer_client.cc [modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/perfetto/producer_client.h [modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/perfetto/task_runner.cc [modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/perfetto/task_runner.h [modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/perfetto/trace_event_data_source.cc [modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/perfetto/trace_event_data_source_unittest.cc [modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/trace_event_agent.cc
,
May 28 2018
Issue 846891 has been merged into this issue. Issue 846901 has been merged into this issue. Issue 846914 has been merged into this issue.
,
May 28 2018
,
Jun 1 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6bf61e1fb162b839f4220b1806459cd2d9c77450 commit 6bf61e1fb162b839f4220b1806459cd2d9c77450 Author: Oystein Eftevaag <oysteine@google.com> Date: Fri Jun 01 19:03:22 2018 Reland "Perfetto: Fix ProducerClient threading/sequencing issues" This is a reland of 2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f Relanding with an LSan fix for TraceEventDataSourceTest.EventWithPointerArgs Original change's description: > Perfetto: Fix ProducerClient threading/sequencing issues > > * TraceWriters will no longer be destroyed when their TLS slot is cleaned > up on thread shutdown as this can trigger a CommitData() call from > Perfetto and cause asserts when Mojo does a TaskRunnerHandle::Get() > for a PostTask; instead we leave the deletion to the taskrunner > Perfetto is running on (if it's getting shut down, the task should > get dropped). > * Run ProducerClients on their own sequence. > > Bug: 844379 > Change-Id: Ie45b36a4e463674122555dbec2105953a882f37c > Reviewed-on: https://chromium-review.googlesource.com/1069528 > Reviewed-by: Primiano Tucci <primiano@chromium.org> > Commit-Queue: oysteine <oysteine@chromium.org> > Cr-Commit-Position: refs/heads/master@{#561356} TBR=primiano@chromium.org Bug: 844379 Change-Id: Icad6996fd708045d98587c8d47e180145a43960a Reviewed-on: https://chromium-review.googlesource.com/1081834 Reviewed-by: oysteine <oysteine@chromium.org> Commit-Queue: oysteine <oysteine@chromium.org> Cr-Commit-Position: refs/heads/master@{#563746} [modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/base/trace_event/trace_log.cc [modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/base/trace_event/trace_log.h [modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/perfetto/perfetto_integration_unittest.cc [modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/perfetto/perfetto_service.h [modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/perfetto/producer_host.cc [modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/perfetto/producer_client.cc [modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/perfetto/producer_client.h [modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/perfetto/task_runner.cc [modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/perfetto/task_runner.h [modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/perfetto/trace_event_data_source.cc [modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/perfetto/trace_event_data_source_unittest.cc [modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/trace_event_agent.cc
,
Jun 7 2018
When I try to land crrev.com/c/1090756 and test tracing with this command: > chrome.exe --trace-startup= --trace-startup-file=test.json --trace-startup-duration=7 --enable-features=TracingPerfettoBackend I'm getting truncated .json files and crashes. The crash call stack varies. Here's one: 00 MSVCP140D!mtx_do_lock 01 MSVCP140D!_Mtx_lock 02 libperfetto!std::_Mtx_lockX 03 libperfetto!std::_Mutex_base::lock 04 libperfetto!std::lock_guard<std::mutex>::lock_guard 05 libperfetto!perfetto::SharedMemoryArbiterImpl::ReturnCompletedChunk 06 libperfetto!perfetto::TraceWriterImpl::Flush 07 libperfetto!perfetto::TraceWriterImpl::~TraceWriterImpl 08 libperfetto!perfetto::TraceWriterImpl::~TraceWriterImpl 09 tracing_cpp!base::DeleteHelper<perfetto::TraceWriter>::DoDelete 0a base!base::internal::FunctorTraits<void (*)(const void *),void>::Invoke<void (*)(const void *),const void *> 0b base!base::internal::InvokeHelper<0,void>::MakeItSo<void (*)(const void *),const void *> 0c base!base::internal::Invoker<base::internal::BindState<void (*)(const void *),const void *>,void ()>::RunImpl<void (*)(const void *),std::tuple<const void *>,0> 0d base!base::internal::Invoker<base::internal::BindState<void (*)(const void *),const void *>,void ()>::RunOnce 0e base!base::OnceCallback<void ()>::Run 0f base!base::debug::TaskAnnotator::RunTask 10 base!base::internal::TaskTracker::RunOrSkipTask 11 base!base::internal::TaskTracker::RunAndPopNextTask 12 base!base::internal::SchedulerWorker::RunWorker 13 base!base::internal::SchedulerWorker::RunBackgroundPooledWorker 14 base!base::internal::SchedulerWorker::ThreadMain 15 base!base::`anonymous namespace'::ThreadFunc 16 KERNEL32!BaseThreadInitThunk 17 ntdll!__RtlUserThreadStart 18 ntdll!_RtlUserThreadStart I've also seen this stack: 00 VCRUNTIME140D!memset 01 libperfetto!protozero::ScatteredStreamWriter::ReserveBytes 02 libperfetto!perfetto::TraceWriterImpl::NewTracePacket 03 tracing_cpp!tracing::TraceEventDataSource::ThreadLocalEventSink::AddTraceEvent 04 tracing_cpp!tracing::TraceEventDataSource::OnAddTraceEvent 05 base!base::trace_event::TraceLog::UpdateTraceEventDurationExplicit 06 base!base::trace_event::TraceLog::UpdateTraceEventDuration 07 base!trace_event_internal::ScopedTracer::~ScopedTracer 08 base!base::internal::TaskTracker::RunOrSkipTask 09 base!base::internal::TaskTracker::RunAndPopNextTask 0a base!base::internal::SchedulerWorker::RunWorker 0b base!base::internal::SchedulerWorker::RunBackgroundPooledWorker 0c base!base::internal::SchedulerWorker::ThreadMain 0d base!base::`anonymous namespace'::ThreadFunc 0e KERNEL32!BaseThreadInitThunk 0f ntdll!__RtlUserThreadStart 10 ntdll!_RtlUserThreadStart I'm not sure if this is because of this bug or a different one. The angle assert ( crbug.com/844421 ) is also still in there, FWIW.
,
Jun 27 2018
Looks like this is a different issue; it happens when the ProducerClient gets destroyed which in turn deletes the SharedMemoryArbiterImpl and unmaps the shared memory. The first callstack there is easy enough to solve (TraceWriter being destroyed after that point), but the second one (trace event still being written) means that I have to turn ProducerClient into a leaky singleton I think.
,
Jun 28 2018
,
Jun 28 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d99ae1252121b1f92aaaed90c595ae4b63b1321e commit d99ae1252121b1f92aaaed90c595ae4b63b1321e Author: Oystein Eftevaag <oysteine@chromium.org> Date: Thu Jun 28 15:47:20 2018 Perfetto: Make ProducerClient a leaky singleton This is needed as the TraceWriters we create and keep in TLS keep a reference to the SharedMemoryArbiter object that the ProducerClient keeps, and during shutdown may still be used after the ProducerClient previously got deleted. R=hjd@chromium.org BUG= 844379 Change-Id: Ib27b59b48ce50f88f954618e7eee4f046b3029b7 Reviewed-on: https://chromium-review.googlesource.com/1117776 Reviewed-by: Hector Dearman <hjd@chromium.org> Commit-Queue: oysteine <oysteine@chromium.org> Cr-Commit-Position: refs/heads/master@{#571140} [modify] https://crrev.com/d99ae1252121b1f92aaaed90c595ae4b63b1321e/services/tracing/perfetto/perfetto_integration_unittest.cc [modify] https://crrev.com/d99ae1252121b1f92aaaed90c595ae4b63b1321e/services/tracing/public/cpp/perfetto/producer_client.cc [modify] https://crrev.com/d99ae1252121b1f92aaaed90c595ae4b63b1321e/services/tracing/public/cpp/perfetto/producer_client.h [modify] https://crrev.com/d99ae1252121b1f92aaaed90c595ae4b63b1321e/services/tracing/public/cpp/trace_event_agent.cc
,
Jun 29 2018
bruce: This seems to work for me now when I build&test it on a Windows VM, can you confirm? (Modulo the Angle assert; I'll take care of that next).
,
Jun 30 2018
Yep, seems to work after commenting out the angle assert. I got a bad trace file on my first test but I think I just didn't wait long enough for the tracing to finish. The next two worked perfectly. This was my test command line: out\debug_component\chrome.exe --trace-startup= --trace-startup-file=test.json --trace-startup-duration=7 --enable-features=TracingPerfettoBackend I tested on a release build and found that landing crrev.com/c1090756 (thus making Perfetto backend available on Windows in Chrome) adds 166,912 bytes to the size of chrome*.dll, 138752 in chrome.dll and 28160 in chrome_child.dll. That's not huge, but it's enough to make me wonder if we should enable it in official builds - maybe debug or unofficial builds only for now, given that it won't be used for a while?
,
Jul 17
#14: Current OKRs is to switch over to Perfetto instead of TraceLog by default at the end of this quarter, so I think we might as well eat that size cost now rather than wait. We'll gain some of it back when we delete the old tracing implementation (though that won't help for the diff updates, of course).
,
Jul 17
,
Jul 19
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d9ef9da999d3bc199c848754e648e8fec912556f commit d9ef9da999d3bc199c848754e648e8fec912556f Author: Bruce Dawson <brucedawson@chromium.org> Date: Thu Jul 19 19:39:39 2018 Adjust gn and ifdefs to make Perfetto available This change adjusts some ifdefs and also updates two .gn files to enable Perfetto on Windows in Chromium. This change makes no difference unless the command-line option --enable-features=TracingPerfettoBackend is specified, which can be tested with: chrome.exe --trace-startup= --trace-startup-file=test.json ^ --trace-startup-duration=7 --enable-features=TracingPerfettoBackend This was originally landed in crrev.com/c/1064371 Bug: 844421 , 844379 Change-Id: I2f4049b8623003d6c64ae942f0c334b1fc83875d Reviewed-on: https://chromium-review.googlesource.com/1090756 Commit-Queue: oysteine <oysteine@chromium.org> Reviewed-by: oysteine <oysteine@chromium.org> Cr-Commit-Position: refs/heads/master@{#576590} [modify] https://crrev.com/d9ef9da999d3bc199c848754e648e8fec912556f/services/tracing/BUILD.gn [modify] https://crrev.com/d9ef9da999d3bc199c848754e648e8fec912556f/services/tracing/public/cpp/BUILD.gn [modify] https://crrev.com/d9ef9da999d3bc199c848754e648e8fec912556f/services/tracing/public/cpp/trace_event_agent.cc [modify] https://crrev.com/d9ef9da999d3bc199c848754e648e8fec912556f/services/tracing/tracing_service.h |
|||
►
Sign in to add a comment |
|||
Comment 1 by brucedaw...@chromium.org
, May 18 2018