New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 754850 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: 2017-10-23
OS: Chrome
Pri: 3
Type: Bug

Blocking:
issue 193614



Sign in to add a comment

Logging.CrashCounter metric is out-of-date

Project Member Reported by ddavenp...@chromium.org, Aug 11 2017

Issue description

The enumeration defined in crash_reporter.cc is inconsistent with the enumeration in UMA histograms.xml, and not all CrashTypes are sending histogram data.
 
Cc: semenzato@chromium.org

Comment 2 by vapier@chromium.org, Aug 11 2017

as i mentioned in the thread, i'm not sure we want to expand the granularity of this metric (short of adding an "other" bucket), so i would hold off on doing any work along those lines

certainly if the reporter & the xml file are out of sync, that should be fixed now
Currently the ones that get reported are:
 unclean shutdown
 ec crash
 user crash
 kernel crash

Are you thinking something like:
 user
 kernel
 other
?

Of course, if this metric is obsolete and no one intends on using it, then let's just remove the code. Given the state of it, I'd be pretty surprised anyone's using it at the moment. I'll hold off any work until there's a clear consensus.
Cc: abodenha@chromium.org vapier@chromium.org
I am OK with either fixing or removing this metric.

Note that there are several other crash-related
metrics sent by the metrics daemon, in the Platform.* group.  Most of
those have also been reported incorrectly until recently, because of a
race at start up.

One benefit of keeping the redundant metric is that, if maintained
correctly, it can help detect crash reporter failures.

The run-time overhead of keeping it is tiny.  If the current issues
are easy to fix, then keeping it shouldn't hurt (unless it is prone to
diverging again).

The metrics daemon sends samples for these:

Platform.DailyUseTime
Platform.KernelCrashInterval
Platform.UncleanShutdownInterval
Platform.UserCrashInterval

Platform.AnyCrashesDaily
Platform.AnyCrashesWeekly
Platform.UserCrashesDaily
Platform.UserCrashesWeekly
Platform.KernelCrashesDaily
Platform.KernelCrashesWeekly
Platform.KernelCrashesSinceUpdate
Platform.UncleanShutdownsDaily
Platform.UncleanShutdownsWeekly


If we chose to keep this, I think the risk of diverging can be mitigated by improving documentation about what the enumeration is used for and keeping it in sync with the corresponding .xml file. It took a bit of digging on my part to track it down (though someone more familiar with the system may have figured it out more quickly).
No one has piped up about removal of Logging.CrashCounter, so I'm going to go ahead with that. I will simply remove it rather than neuter it and wait for the next release.
Project Member

Comment 8 by bugdroid1@chromium.org, Aug 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/314ebe1a8384305bf69c375ea1d8aae351009239

commit 314ebe1a8384305bf69c375ea1d8aae351009239
Author: Drew Davenport <ddavenport@chromium.org>
Date: Fri Aug 25 19:09:05 2017

crash: Remove Logging.CrashCounter

This UMA histogram is no longer correct and seems to be no
longer in use. Disable it for now, and remove the related
code completely if no one complains about it.

BUG=chromium:754850
TEST=FEATURES=test emerge-snappy crash-reporter

Change-Id: Ic2a8eb9d2825375d9bb45d68e729776056a30542
Reviewed-on: https://chromium-review.googlesource.com/621247
Commit-Ready: Drew Davenport <ddavenport@chromium.org>
Tested-by: Drew Davenport <ddavenport@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/314ebe1a8384305bf69c375ea1d8aae351009239/crash-reporter/crash_reporter.cc

I've stubbed out the function that sends the UMA histogram data for now. If no one notices that it's gone, the next step is to remove the supporting code as well.
NextAction: 2017-10-23
you probably don't want to hear this, but i think this sort of change requires waiting a few releases before we purge it completely.  basically, wait until the current release (R62) hits stable, as then we'll have confidence that TPMs aren't using these metrics to determine when to do canary->dev->beta->stable promotion.

looking at the current CrOS release cal, R62 is roughly slated to hit stable mid-October.  the NextAction field should generate a reminder for us.
I kind of expected we'd have to wait a while :)

Thanks for pointing out the NextAction field. I'll just try to forget about this in the meantime, then.
The NextAction date has arrived: 2017-10-23

Comment 13 by oka@chromium.org, Dec 18 2017

Cc: oka@chromium.org
Blocking: 193614
Components: -Internals>CrashReporting -Internals>Logging OS>Systems>CrashReporting

Sign in to add a comment