New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 631138 link

Starred by 2 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug

Blocking:
issue 631114



Sign in to add a comment

Pseudocrashes: Teams using crash/ to upload non user-crash reports

Reported by mimee@chromium.org, Jul 25 2016

Issue description

Each platform has some crashes that are not real user crashes, sometimes as intended, because the teams want to gather data.

For instance, Clank uploads strictMode violations to crash, which go through analysis and eventually surface to crash front end, Fracas dashboard, and in the end sheriff queue after TEs file bugs. Then the sheriffs triage them.

It would be nice if the sheriffs don't need to triage them, since they are not real user crashes.

Potential solutions:

1) Don't upload them. They are not real user crashes, so it makes sense not to upload them to crash, but rather use an UMA metric instead.

Yet not uploading them may prevent more information from being gathered, since they may be used for more ambiguous cases that can benefit from full crash-like reports. If the pseudo crashes trigger the crash logic, then it is also more convenient for the teams to just upload them.

2) Have sheriffs not triage them. This is easy because each team has only a smaller team of sheriffs who are familiar with the platform specific issues.

However, this does not solve the problem that they get assigned to people after bugs are filed. A lot of attention and hours and compute resources have been used at this point.

3) Similarly, having TEs not to look at them, or having Fracas filtering them out requires the intermediate parties to maintain the list of hard-coded things to filter out.

It is not particularly maintainable, since it would be the teams that know which signatures/types of things to filter out and they may change from release to release as new features are implemented.

4) Upload them with a bit that indicates if they are pseudo crashes. So that the teams get their full crash reports, but the downstream analysis and tools can choose to deal with them differently without having to maintain a list.

This does, however, require changes on the whole pipeline of bug triaging and sheriffing, hopefully only as an 1-time overhead.

5) Have crash backend or Magic Signature analysis filter them out. Magic signature code already has a list of magic signatures.
 
Cc: wnwen@chromium.org yfried...@chromium.org amineer@chromium.org

Comment 2 by mimee@chromium.org, Jul 25 2016

Blocking: 631114
4 is a good approach, but I do envision some gray areas, where non-user crashes need to still be surfaced (like forced crashes, for example), and vice versa - user crashes that we don't want to surface.

so IMO
3 - could ideally be configuration based, so that teams could manage it themselves based on platform specific issues. I'd rather keep the Magic signature analysis (the Chrome custom processor) unaware of the triage logic, so it does need to be part of Fracas/Cracas.

Comment 4 by mimee@chromium.org, Jul 25 2016

1) would be nice, too. Sometimes pseudo crashes really mess up metrics even when the teams only want a tiny bit of information, such as the count. In those cases, they should be encouraged to not overuse crash resources but use UMA instead.
Is it possible to upload the same quantity of data to UMA that we're able to upload as part of a crash report?  IIRC you can get loads more system state with a crash report than you can with a (giant) UMA histogram.

To make filtering easier, it seems like we ought to know when we are uploading something that doesn't necessarily need to be triaged (e.g. strict mode violations, dump without crashing) - perhaps we could set a bit in the report prior to upload that indicates it's not something we should triage?
Strictmode reports come with stack traces, so (unless this stack traces is useless) i don't think it can be replaced with a histogram.

Setting a bit in the uploaded report is what Mimee proposed in solution 4 in the original comment. I'm not convinced that client is the right place for deciding whether the crash report needs to be triaged, but I can be wrong.
I clearly didn't skim well enough - consider that a +1 to mimee@'s idea then.

Based on what I can think of off the top of my head, we ought to know up front what types of crashes we do and don't care about, but I could be wrong - do we have any types of crashes that aren't DumpWithoutCrashing / Strict Mode that we've identified as not a real user crash?
Components: Internals>CrashReporting
Cc: -amineer@chromium.org
No longer on the Chrome team, e-mail me @google.com if any attention still required from me here, otherwise good luck!

Sign in to add a comment