Issue metadata
Sign in to add a comment
|
[SOM] Automatic issue merging is too unreliable (need more manual confirmation/override) |
||||||||||||||||||||||||
Issue descriptionThe general problem is that failures in SOM are automatically bundled together into "issues" using a useful but imperfect heuristic. And it is so aggressive that it often defeats any manual management. Here are a few practical problems with this: 1. When a new issue comes up, it will try to link it to existing issues and find bugs that are related. That's helpful, but if I've manually triaged an issue (set its bug number to a bug that precisely describes that issue), I don't want new bug IDs being added that are unrelated. 2. The comment thread on an issue often includes unrelated comments from the last time SOM thought this was the same issue (but actually isn't). 3. The heuristic seems generally pretty bad about WebKit layout tests; often it just groups all layout test failures together. Here is a screenshot of what I'm currently dealing with. The top issue there had 5 bugs associated with it. I had to open each of them and they were all various different WebKit layout test failures from the past week that were irrelevant, except the last one which was exactly this issue. The previous sheriff had manually tagged this with the right bug ID, and then when the issue (a flake) resurfaced, the system went and undid his work by tying in a bunch of unrelated bugs. The bottom issue there has 3 messages on it (in the orange speech bubble icon). They are from about a week ago with old sheriffs who fixed old problems, but because SOM thinks this is the same issue, I am seeing these old comments that say "I reverted this CL". That's confusing because I have to check whether they're reverting for my bug, or a different bug. My proposal: There should be a more persistent/reliable concept of an "issue". A "confirmed issue" must be created by a real human and linked to a SINGLE crbug (if there are multiple relevant bug numbers, they should be duped or linked in crbug to a single master bug). The system should NEVER add new bug IDs to a confirmed issue, or consolidate confirmed issues together. Confirmed issues would automatically go away when the linked bug is marked Fixed. Then there are "potential issues" which is what SOM currently calls an "issue". SOM should automatically create a potential issue if it sees a new failure, speculatively link bugs to it, etc. Potential issues should not have comments attached to them. SOM could offer suggestions like "This looks like <an existing confirmed issue>... link?" to allow a human to manually link them in. Of course there would also be a "This is a new issue" button which converts the potential issue into a confirmed issue. Basically the sheriff's job would be to: 1. Look at any potential issues and either snooze them, combine them into an existing confirmed issue, or upgrade them to a new confirmed issue (perhaps filing a new bug report to link). 2. Deal with the confirmed issues. Chat with other sheriffs using the conversation attached to the confirmed issue. 3. Mark the bug as fixed to make the confirmed issue go away. This would mean the annoying jumping around of issues and constantly having random spurious bugs attached to issues doesn't affect the set of "confirmed issues" that I am working on.
,
Feb 16 2017
Oh yeah -- the issue I was dealing with was a flake. It passed a few times so it's either disappeared now or moved into the flakes section. Once a sheriff has tagged an issue as a real problem, it shouldn't disappear if it happens to pass a few times. Under my proposal, "confirmed issues" would not disappear until the bug is marked fixed. Also confirmed issues would be titled after the bug, not the automatically assigned name. (I am just so sick of not having any proper handle on SOM issues as they bounce around while I am trying to look at them. I'm trying to get some stability in there.)
,
Feb 21 2017
cc-ing main SOM people. There's been some work on deleting old annotations, like the bugs and comments already attached to the issues, which you're finding annoying. I think zhangtiff@ is working on that?
,
Feb 21 2017
Comments are currently set to be hidden if they are over 10 days old. I could lower that to something like 4 days perhaps if there are still too many old comments showing up. I could try to do the same for bugs as well. I'm not actively working on a more complete solution to clearing old annotations outside of just hiding them. This would require some changes to how alerts are represented (improved keys for alerts, a better representation of where one "instance" of an alert ends), which is a lot of what you are suggesting. Thanks for the bug! I appreciate the feedback and believe you are right; Sheriff-o-Matic could be greatly improved by changing the way we model what an alert is. One semi-related thing I wanted to bring up here. Chrome OS is using Sheriff-o-Matic and wanted to change the way alerts were modeled to fit their specific use case as well. Design doc here: https://docs.google.com/document/d/1m5uyrxVZcy-t0sCynSwpBpMjdvSzfqAHNFq2JlB_Yxk/edit#heading=h.dlyxh8a9erib I thought some of their ideas such as the ideas of marking an alert as being "Investigated" or "Resolved" seem similar to the proposal here of marking an alert as "Confirmed". Their proposal requires changing the Datastore representation of alerts so that every alert is its own entity, and I think the things proposed here would required that as well. I think this is a good thing which will allow us to do more complex interactions with alerts, and I will think about ways we can redesign the way alerts are modeled to better resolve different user pain points (instead of just creating quick bandaid fixes on top of the overall problem, something I have been extremely guilty of doing).
,
Feb 21 2017
Yeah, having "Investigated" (or "Investigating") and "Resolved" sounds like a good step. I think SOM needs to not get *smarter* but get more of a tool to assist you semi-automatically do sheriffing.
,
Jun 29 2017
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by mgiuca@chromium.org
, Feb 16 2017