New issue
Advanced search Search tips

Issue 844379 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 17
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac
Pri: 3
Type: Bug



Sign in to add a comment

Perfetto: calling the TraceWriter dtor in the TLS dtor causes a CHECK()

Project Member Reported by primiano@chromium.org, May 18 2018

Issue description

Just debugged this crash with Bruce on Windows.
The problem is the following:

1) We have one TraceWriter instance per thread in a TLS. We register the TLS slot dtor to destroy the TraceWrite once done.
2) When the thread exists, TraceWriter dtor is invoked
3) TraceWriter dtor will try to do a Flush() (see https://android.googlesource.com/platform/external/perfetto/+/master/src/tracing/core/trace_writer_impl.cc#59)
4) Flush will try to send an IPC, which will end up in mojo, which will posttask something

Now, the problem is that, when the TLS destructor runs, the order on which things are destroyed is unpredictable. On Windows I think we got into the state where TaskRunnerHandle::Get() was hitting a check, because the TLS that contain the current TaskRunner has been destroyed before.

Cannot tell right now whether this is:
A) A bug in Perfetto, and we shouldn't try to flush() on TraceWriter dtor.
2) A bug in the way chrome embeds it.


Let's think a bit more on this.
 
I'm going to conditionally disable the Flush() call for now to allow forward progress, see http://aosp/689941.

Hmm. Maybe rather than destroying the TW in the slot destructor, we should just post a DeleteSoon() to the taskrunner we're running Perfetto on?
I think disabling the flush is causing the unittest to fail, because they test precisely that we don't lose the leftover in the buffer!

> Maybe rather than destroying the TW in the slot destructor, we should just post a DeleteSoon() to the taskrunner we're running Perfetto on?

Yeah the more I think about this the more I lean towards this.
When we do that we should have a way to reset the threadchecker, to say "don't worry you are going to be moved "
Project Member

Comment 4 by bugdroid1@chromium.org, May 24 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f

commit 2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f
Author: Oystein Eftevaag <oysteine@google.com>
Date: Thu May 24 01:53:36 2018

Perfetto: Fix ProducerClient threading/sequencing issues

* TraceWriters will no longer be destroyed when their TLS slot is cleaned
  up on thread shutdown as this can trigger a CommitData() call from
  Perfetto and cause asserts when Mojo does a TaskRunnerHandle::Get()
  for a PostTask; instead we leave the deletion to the taskrunner
  Perfetto is running on (if it's getting shut down, the task should
  get dropped).
* Run ProducerClients on their own sequence.

Bug:  844379 
Change-Id: Ie45b36a4e463674122555dbec2105953a882f37c
Reviewed-on: https://chromium-review.googlesource.com/1069528
Reviewed-by: Primiano Tucci <primiano@chromium.org>
Commit-Queue: oysteine <oysteine@chromium.org>
Cr-Commit-Position: refs/heads/master@{#561356}
[modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/base/trace_event/trace_log.cc
[modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/base/trace_event/trace_log.h
[modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/perfetto/perfetto_integration_unittest.cc
[modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/perfetto/perfetto_service.h
[modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/perfetto/producer_host.cc
[modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/perfetto/producer_client.cc
[modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/perfetto/producer_client.h
[modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/perfetto/task_runner.cc
[modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/perfetto/task_runner.h
[modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/perfetto/trace_event_data_source.cc
[modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/perfetto/trace_event_data_source_unittest.cc
[modify] https://crrev.com/2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f/services/tracing/public/cpp/trace_event_agent.cc

Project Member

Comment 5 by bugdroid1@chromium.org, May 25 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5bf292009901cbaecf0b685582a9d0c8c1628ef9

commit 5bf292009901cbaecf0b685582a9d0c8c1628ef9
Author: Olga Sharonova <olka@chromium.org>
Date: Fri May 25 10:31:23 2018

Revert "Perfetto: Fix ProducerClient threading/sequencing issues"

This reverts commit 2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f.

Reason for revert: Suspected cause of  Issue 846609 

Original change's description:
> Perfetto: Fix ProducerClient threading/sequencing issues
> 
> * TraceWriters will no longer be destroyed when their TLS slot is cleaned
>   up on thread shutdown as this can trigger a CommitData() call from
>   Perfetto and cause asserts when Mojo does a TaskRunnerHandle::Get()
>   for a PostTask; instead we leave the deletion to the taskrunner
>   Perfetto is running on (if it's getting shut down, the task should
>   get dropped).
> * Run ProducerClients on their own sequence.
> 
> Bug:  844379 
> Change-Id: Ie45b36a4e463674122555dbec2105953a882f37c
> Reviewed-on: https://chromium-review.googlesource.com/1069528
> Reviewed-by: Primiano Tucci <primiano@chromium.org>
> Commit-Queue: oysteine <oysteine@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#561356}

TBR=primiano@chromium.org,oysteine@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug:  844379 
Change-Id: I82eb4e335521ef8f39119bdd2a984f4dc81ac88c
Reviewed-on: https://chromium-review.googlesource.com/1073247
Reviewed-by: Olga Sharonova <olka@chromium.org>
Commit-Queue: Olga Sharonova <olka@chromium.org>
Cr-Commit-Position: refs/heads/master@{#561823}
[modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/base/trace_event/trace_log.cc
[modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/base/trace_event/trace_log.h
[modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/perfetto/perfetto_integration_unittest.cc
[modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/perfetto/perfetto_service.h
[modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/perfetto/producer_host.cc
[modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/perfetto/producer_client.cc
[modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/perfetto/producer_client.h
[modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/perfetto/task_runner.cc
[modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/perfetto/task_runner.h
[modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/perfetto/trace_event_data_source.cc
[modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/perfetto/trace_event_data_source_unittest.cc
[modify] https://crrev.com/5bf292009901cbaecf0b685582a9d0c8c1628ef9/services/tracing/public/cpp/trace_event_agent.cc

Comment 6 by tapted@chromium.org, May 28 2018

 Issue 846891  has been merged into this issue.
 Issue 846901  has been merged into this issue.
 Issue 846914  has been merged into this issue.

Comment 7 by tapted@chromium.org, May 28 2018

Cc: olka@chromium.org oysteine@chromium.org
 Issue 846609  has been merged into this issue.
Project Member

Comment 8 by bugdroid1@chromium.org, Jun 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6bf61e1fb162b839f4220b1806459cd2d9c77450

commit 6bf61e1fb162b839f4220b1806459cd2d9c77450
Author: Oystein Eftevaag <oysteine@google.com>
Date: Fri Jun 01 19:03:22 2018

Reland "Perfetto: Fix ProducerClient threading/sequencing issues"

This is a reland of 2c2411fbf9f831b9f7f47f21a7c6b9dd82bbb37f

Relanding with an LSan fix for TraceEventDataSourceTest.EventWithPointerArgs

Original change's description:
> Perfetto: Fix ProducerClient threading/sequencing issues
>
> * TraceWriters will no longer be destroyed when their TLS slot is cleaned
>   up on thread shutdown as this can trigger a CommitData() call from
>   Perfetto and cause asserts when Mojo does a TaskRunnerHandle::Get()
>   for a PostTask; instead we leave the deletion to the taskrunner
>   Perfetto is running on (if it's getting shut down, the task should
>   get dropped).
> * Run ProducerClients on their own sequence.
>
> Bug:  844379 
> Change-Id: Ie45b36a4e463674122555dbec2105953a882f37c
> Reviewed-on: https://chromium-review.googlesource.com/1069528
> Reviewed-by: Primiano Tucci <primiano@chromium.org>
> Commit-Queue: oysteine <oysteine@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#561356}

TBR=primiano@chromium.org

Bug:  844379 
Change-Id: Icad6996fd708045d98587c8d47e180145a43960a
Reviewed-on: https://chromium-review.googlesource.com/1081834
Reviewed-by: oysteine <oysteine@chromium.org>
Commit-Queue: oysteine <oysteine@chromium.org>
Cr-Commit-Position: refs/heads/master@{#563746}
[modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/base/trace_event/trace_log.cc
[modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/base/trace_event/trace_log.h
[modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/perfetto/perfetto_integration_unittest.cc
[modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/perfetto/perfetto_service.h
[modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/perfetto/producer_host.cc
[modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/perfetto/producer_client.cc
[modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/perfetto/producer_client.h
[modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/perfetto/task_runner.cc
[modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/perfetto/task_runner.h
[modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/perfetto/trace_event_data_source.cc
[modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/perfetto/trace_event_data_source_unittest.cc
[modify] https://crrev.com/6bf61e1fb162b839f4220b1806459cd2d9c77450/services/tracing/public/cpp/trace_event_agent.cc

When I try to land crrev.com/c/1090756 and test tracing with this command:

> chrome.exe --trace-startup= --trace-startup-file=test.json --trace-startup-duration=7 --enable-features=TracingPerfettoBackend

I'm getting truncated .json files and crashes. The crash call stack varies. Here's one:

00 MSVCP140D!mtx_do_lock
01 MSVCP140D!_Mtx_lock
02 libperfetto!std::_Mtx_lockX
03 libperfetto!std::_Mutex_base::lock
04 libperfetto!std::lock_guard<std::mutex>::lock_guard
05 libperfetto!perfetto::SharedMemoryArbiterImpl::ReturnCompletedChunk
06 libperfetto!perfetto::TraceWriterImpl::Flush
07 libperfetto!perfetto::TraceWriterImpl::~TraceWriterImpl
08 libperfetto!perfetto::TraceWriterImpl::~TraceWriterImpl
09 tracing_cpp!base::DeleteHelper<perfetto::TraceWriter>::DoDelete
0a base!base::internal::FunctorTraits<void (*)(const void *),void>::Invoke<void (*)(const void *),const void *>
0b base!base::internal::InvokeHelper<0,void>::MakeItSo<void (*)(const void *),const void *>
0c base!base::internal::Invoker<base::internal::BindState<void (*)(const void *),const void *>,void ()>::RunImpl<void (*)(const void *),std::tuple<const void *>,0>
0d base!base::internal::Invoker<base::internal::BindState<void (*)(const void *),const void *>,void ()>::RunOnce
0e base!base::OnceCallback<void ()>::Run
0f base!base::debug::TaskAnnotator::RunTask
10 base!base::internal::TaskTracker::RunOrSkipTask
11 base!base::internal::TaskTracker::RunAndPopNextTask
12 base!base::internal::SchedulerWorker::RunWorker
13 base!base::internal::SchedulerWorker::RunBackgroundPooledWorker
14 base!base::internal::SchedulerWorker::ThreadMain
15 base!base::`anonymous namespace'::ThreadFunc
16 KERNEL32!BaseThreadInitThunk
17 ntdll!__RtlUserThreadStart
18 ntdll!_RtlUserThreadStart

I've also seen this stack:

00 VCRUNTIME140D!memset
01 libperfetto!protozero::ScatteredStreamWriter::ReserveBytes
02 libperfetto!perfetto::TraceWriterImpl::NewTracePacket
03 tracing_cpp!tracing::TraceEventDataSource::ThreadLocalEventSink::AddTraceEvent
04 tracing_cpp!tracing::TraceEventDataSource::OnAddTraceEvent
05 base!base::trace_event::TraceLog::UpdateTraceEventDurationExplicit
06 base!base::trace_event::TraceLog::UpdateTraceEventDuration
07 base!trace_event_internal::ScopedTracer::~ScopedTracer
08 base!base::internal::TaskTracker::RunOrSkipTask
09 base!base::internal::TaskTracker::RunAndPopNextTask
0a base!base::internal::SchedulerWorker::RunWorker
0b base!base::internal::SchedulerWorker::RunBackgroundPooledWorker
0c base!base::internal::SchedulerWorker::ThreadMain
0d base!base::`anonymous namespace'::ThreadFunc
0e KERNEL32!BaseThreadInitThunk
0f ntdll!__RtlUserThreadStart
10 ntdll!_RtlUserThreadStart

I'm not sure if this is because of this bug or a different one. The angle assert ( crbug.com/844421 ) is also still in there, FWIW.

Looks like this is a different issue; it happens when the ProducerClient gets destroyed which in turn deletes the SharedMemoryArbiterImpl and unmaps the shared memory. The first callstack there is easy enough to solve (TraceWriter being destroyed after that point), but the second one (trace event still being written) means that I have to turn ProducerClient into a leaky singleton I think.

Comment 11 by olka@chromium.org, Jun 28 2018

Cc: -olka@chromium.org
Project Member

Comment 12 by bugdroid1@chromium.org, Jun 28 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d99ae1252121b1f92aaaed90c595ae4b63b1321e

commit d99ae1252121b1f92aaaed90c595ae4b63b1321e
Author: Oystein Eftevaag <oysteine@chromium.org>
Date: Thu Jun 28 15:47:20 2018

Perfetto: Make ProducerClient a leaky singleton

This is needed as the TraceWriters we create and keep in TLS
keep a reference to the SharedMemoryArbiter object that the
ProducerClient keeps, and during shutdown may still be
used after the ProducerClient previously got deleted.

R=hjd@chromium.org
BUG= 844379 

Change-Id: Ib27b59b48ce50f88f954618e7eee4f046b3029b7
Reviewed-on: https://chromium-review.googlesource.com/1117776
Reviewed-by: Hector Dearman <hjd@chromium.org>
Commit-Queue: oysteine <oysteine@chromium.org>
Cr-Commit-Position: refs/heads/master@{#571140}
[modify] https://crrev.com/d99ae1252121b1f92aaaed90c595ae4b63b1321e/services/tracing/perfetto/perfetto_integration_unittest.cc
[modify] https://crrev.com/d99ae1252121b1f92aaaed90c595ae4b63b1321e/services/tracing/public/cpp/perfetto/producer_client.cc
[modify] https://crrev.com/d99ae1252121b1f92aaaed90c595ae4b63b1321e/services/tracing/public/cpp/perfetto/producer_client.h
[modify] https://crrev.com/d99ae1252121b1f92aaaed90c595ae4b63b1321e/services/tracing/public/cpp/trace_event_agent.cc

bruce: This seems to work for me now when I build&test it on a Windows VM, can you confirm? (Modulo the Angle assert; I'll take care of that next).
Yep, seems to work after commenting out the angle assert. I got a bad trace file on my first test but I think I just didn't wait long enough for the tracing to finish. The next two worked perfectly. This was my test command line:

out\debug_component\chrome.exe --trace-startup= --trace-startup-file=test.json --trace-startup-duration=7 --enable-features=TracingPerfettoBackend

I tested on a release build and found that landing crrev.com/c1090756 (thus making Perfetto backend available on Windows in Chrome) adds 166,912 bytes to the size of chrome*.dll, 138752 in chrome.dll and 28160 in chrome_child.dll. That's not huge, but it's enough to make me wonder if we should enable it in official builds - maybe debug or unofficial builds only for now, given that it won't be used for a while?

#14: Current OKRs is to switch over to Perfetto instead of TraceLog by default at the end of this quarter, so I think we might as well eat that size cost now rather than wait. We'll gain some of it back when we delete the old tracing implementation (though that won't help for the diff updates, of course).
Status: Fixed (was: Untriaged)
Project Member

Comment 17 by bugdroid1@chromium.org, Jul 19

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d9ef9da999d3bc199c848754e648e8fec912556f

commit d9ef9da999d3bc199c848754e648e8fec912556f
Author: Bruce Dawson <brucedawson@chromium.org>
Date: Thu Jul 19 19:39:39 2018

Adjust gn and ifdefs to make Perfetto available

This change adjusts some ifdefs and also updates two .gn files to enable
Perfetto on Windows in Chromium.

This change makes no difference unless the command-line option
--enable-features=TracingPerfettoBackend is specified, which can be
tested with:

  chrome.exe --trace-startup= --trace-startup-file=test.json ^
    --trace-startup-duration=7 --enable-features=TracingPerfettoBackend

This was originally landed in crrev.com/c/1064371

Bug:  844421 ,  844379 
Change-Id: I2f4049b8623003d6c64ae942f0c334b1fc83875d
Reviewed-on: https://chromium-review.googlesource.com/1090756
Commit-Queue: oysteine <oysteine@chromium.org>
Reviewed-by: oysteine <oysteine@chromium.org>
Cr-Commit-Position: refs/heads/master@{#576590}
[modify] https://crrev.com/d9ef9da999d3bc199c848754e648e8fec912556f/services/tracing/BUILD.gn
[modify] https://crrev.com/d9ef9da999d3bc199c848754e648e8fec912556f/services/tracing/public/cpp/BUILD.gn
[modify] https://crrev.com/d9ef9da999d3bc199c848754e648e8fec912556f/services/tracing/public/cpp/trace_event_agent.cc
[modify] https://crrev.com/d9ef9da999d3bc199c848754e648e8fec912556f/services/tracing/tracing_service.h

Sign in to add a comment