[0919/191218:FATAL:persistent_histogram_allocator.cc(188)] Check failed: !user_. |
|||
Issue descriptionIn trying to roll UMA-using-Crashpad into Chrome (https://codereview.chromium.org/2350943003/) I got this on the Windows bot: https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/295922/steps/context_lost_tests%20on%20NVIDIA%20GPU%20on%20Windows%20%28with%20patch%29%20on%20Windows-2008ServerR2-SP1/logs/stdio A crash was triggered in the GPU process, and it looks like crashpad_handler DCHECK'd due to PersistentHistogramAllocator. [0919/191218:FATAL:persistent_histogram_allocator.cc(188)] Check failed: !user_. Backtrace: base::debug::StackTrace::StackTrace [0x00ECEAF7+23] logging::LogMessage::~LogMessage [0x00EBFDE1+49] base::PersistentSparseHistogramDataManager::UseSampleMapRecords [0x00EDCAD6+118] base::PersistentHistogramAllocator::UseSampleMapRecords [0x00EDCA54+20] base::PersistentSampleMap::ImportSamples [0x00EDDB5C+44] base::PersistentSampleMap::GetOrCreateSampleCountStorage [0x00EDDA38+88] base::PersistentSampleMap::Accumulate [0x00EDD5BF+15] base::SparseHistogram::AddCount [0x00ED1F28+120] base::SparseHistogram::Add [0x00ED4EBD+13] crashpad::Metrics::ExceptionCode [0x01177EA3+83] crashpad::CrashReportExceptionHandler::ExceptionHandlerServerException [0x00EEAD3D+218] crashpad::ExceptionHandlerServer::OnCrashDumpEvent [0x011745FB+52] RtlRegisterWait [0x77A01C6A+788] From looking at that DCHECK, I'm not quite clear on what it's expressing. Brian, do you have any insight? From looking at the log there, I would guess that it'd be maybe the first or second time that UMA_ macro would have been used for that particular key, but I didn't dig into the details of that GPU test. That UMA is using UMA_HISTOGRAM_SPARSE_SLOWLY whereas there was a UMA that rolled in last week in https://codereview.chromium.org/2349083002/ that's a plain (?) UMA_HISTOGRAM_CUSTOM_COUNTS, in case that might be relevant.
,
Sep 21 2016
I'm looking at the trace. The DCHECK indicates that there are two instances of this histogram using the same PersistentHistogramAllocator object. That can happen if you iterate over histograms in that allocator within the same process that created them in there. I'm trying to understand the stack. Where is crashpad::Metrics::ExceptionCode?
,
Sep 21 2016
crashpad::Metrics::ExceptionCode is implemented in third_party/crashpad/crashpad/util/misc/metrics.cc in the CL linked above https://codereview.chromium.org/2350943003/diff/120001/third_party/crashpad/crashpad/util/misc/metrics.cc . I'm not sure what 'two instances of the histogram' means here (i.e. I'm not sure how a histogram instance gets created I guess)... The UMA macro It could be getting called from different threads; the exception is handled via a thread pool https://cs.chromium.org/chromium/src/third_party/crashpad/crashpad/util/win/exception_handler_server.cc?q=ExceptionHandlerServer::OnCrashDumpEvent&sq=package:chromium&l=565&dr=CSs . Is that a problem?
,
Sep 21 2016
After some offline discussion, one possible explanation for this could be that the ExceptionHandlerServer gets created before the GlobalHistogramAllocator in main(). https://cs.chromium.org/chromium/src/third_party/crashpad/crashpad/handler/handler_main.cc?q=ExceptionHandlerServer+exception_handler_server&sq=package:chromium&l=350&dr=C If there's a crash right away, the UMA macros get used which cause a default GlobalHistogramAllocator to get created, and then main comes along and creates another one. (maybe? sound plausible?) We didn't really want to move the GlobalHistogramAllocator creation earlier because on Windows that would mean delaying the handshake response (which is on browser startup path) until after the file operations the create the persistent histogram file(s).
,
Sep 21 2016
Or not! I'm going to see if I can get context_lost to run locally now and try reproduce, maybe that'll make it more obvious what's wrong.
,
Sep 21 2016
One other random thought (haven't looked at this very deeply), is it possible that StatisticsRecorder::Initialize() doesn't get called in crashpad before the first UMA metric is logged? I think that could cause the histogram object to not be cached and so two of them could exist...
,
Sep 21 2016
There's no usage of StatisticsRecorder in the crashpad_handler process. Is it required to work correctly, even if there's only one place where GlobalHistogramAllocator gets created (in main())? But that definitely seems like the sort of thing it could be.
,
Sep 21 2016
If both the histogram AND the GlobalHistogramAllocator were created before Initialize() then that could cause the problem. Perhaps having the GlobalHistogramAllocator call StatisticsRecorder::Initialize() as part of the former's ctor would be a safe thing to do. You could try it and see if it makes the test stop failing.
,
Sep 21 2016
Fortunately, I can repro locally. StatisticsRecorder::Initialize() is never used in the crashpad_handler process. It seems to be on the second call to UMA_HISTOGRAM_SPARSE_SLOWLY, but at least while in the debugger (where it still reproduces) they're both called much after GlobalHistogramAllocator has been initialized as expected in main. Debugging more...
,
Sep 21 2016
Great! Try adding a call to StatisticsRecorder::Initialize() in the GlobalHistogramAllocator constructor. That'll make it safe and multiple calls to Initialize() are okay.
,
Sep 21 2016
OK, that works, I'll send a patch to do that separately from the roll. Thanks! I debugged a bit more, because given that "simple" explanation, I didn't understand why the other UMAs that are called earlier and more frequently didn't crash. However, the expansion of the other macros e.g. UMA_HISTOGRAM_COUNTS have a static atomic word that early outs before relooking up the histogram, so never get to their factory again. UMA_HISTOGRAM_SPARSE_SLOWLY however doesn't do this and so always loads from FactoryGet(), so causing the DCHECK. (Should/could it do the same? The sparseness of UMA_HISTOGRAM_SPARSE_SLOWLY values seems unrelated to the frequency with which values are tracked.)
,
Sep 21 2016
Only sparse histograms have this DCHECK. Regular histograms will just end up creating duplicates with only the last one (the one in the StatisticsRecorder against which any further creations are de-dup'd) getting uploaded.
,
Sep 21 2016
Oh, yes... Sparse histograms could keep a local pointer for speed but since sparse histograms are already "slow", this wasn't deemed necessary. Typically, they're rarely called. Note that even with such, the problem would continue to exist if the same histogram (by name) were updated in another place since that other place would have it's own cached reference to the histogram that starts null.
,
Sep 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0ad9ed389930c020268e251c454e004f5a9538be commit 0ad9ed389930c020268e251c454e004f5a9538be Author: bcwhite <bcwhite@chromium.org> Date: Thu Sep 22 18:26:17 2016 The GHA won't function properly if the SR hasn't been initialized so do that. Also: Fix problem with initialization of the global SR when a local SR is already active for testing purposes. BUG= 648494 , crashpad:100 Review-Url: https://codereview.chromium.org/2358023003 Cr-Commit-Position: refs/heads/master@{#420402} [modify] https://crrev.com/0ad9ed389930c020268e251c454e004f5a9538be/base/metrics/persistent_histogram_allocator.cc [modify] https://crrev.com/0ad9ed389930c020268e251c454e004f5a9538be/base/metrics/statistics_recorder.cc
,
Sep 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0ad9ed389930c020268e251c454e004f5a9538be commit 0ad9ed389930c020268e251c454e004f5a9538be Author: bcwhite <bcwhite@chromium.org> Date: Thu Sep 22 18:26:17 2016 The GHA won't function properly if the SR hasn't been initialized so do that. Also: Fix problem with initialization of the global SR when a local SR is already active for testing purposes. BUG= 648494 , crashpad:100 Review-Url: https://codereview.chromium.org/2358023003 Cr-Commit-Position: refs/heads/master@{#420402} [modify] https://crrev.com/0ad9ed389930c020268e251c454e004f5a9538be/base/metrics/persistent_histogram_allocator.cc [modify] https://crrev.com/0ad9ed389930c020268e251c454e004f5a9538be/base/metrics/statistics_recorder.cc
,
Sep 22 2016
|
|||
►
Sign in to add a comment |
|||
Comment 1 by scottmg@chromium.org
, Sep 20 2016