New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 771818 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug
Hotlist-MemoryInfra



Sign in to add a comment

Measuring error rate of memory-infra.

Project Member Reported by erikc...@chromium.org, Oct 4 2017

Issue description

I recall that hjd added the metrics:

Memory.Experimental.Debug.FailedProcessDumpsPerGlobalDump
Memory.Experimental.Debug.GlobalDumpDuration
Memory.Experimental.Debug.GlobalDumpQueueLength

Given that I ran into a memory-infra error, I figured I'd take a look at these metrics to see if they caught the problem: https://bugs.chromium.org/p/chromium/issues/detail?id=771805

It turns out that only Memory.Experimental.Debug.GlobalDumpQueueLength is emitted. The other metrics are not emitted if the memory-infra process is hung. Looking at this metric on beta, ~1% of the time the queue length is >0
https://uma.googleplex.com/p/chrome/histograms/?endDate=20171003&dayCount=1&histograms=Memory.Experimental.Debug.GlobalDumpQueueLength&fixupData=true&showMax=true&filters=platform%2Ceq%2CW%2Cchannel%2Ceq%2C3%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

Given that there's currently only 2 users of queue length [tracing, memory-uma], it seems likely that every instance for queue length > 0 is indicative of something gone wrong. Since this stat is emitted every time we go from 0->1 queue length, the number of 1-length queues is indicative of the prevalence of the problem. ~0.1%.
 

Sign in to add a comment