New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jan 22
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: ----



Sign in to add a comment

No recent Chrome OS alerts in Sheriff-o-Matic

Project Member Reported by bmgordon@chromium.org, Jan 22

Issue description

Problem with Sheriff-o-Matic: go/som doesn't show any new alerts for chrome os even though builds have been failing all night.  Example: https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/17547

The header shows "Last updated: Unknown".

 
Cc: akes...@chromium.org dgarr...@chromium.org
Owner: davidri...@chromium.org
The dispatcher is failing: https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fsom-dispatcher%2F23979%2F%2B%2Frecipes%2Fsteps%2FSomDispatcher__chromeos_%2F0%2Fstdout
Cc: bhthompson@chromium.org
Change that causes the problem:
5465002ac926216c4bb9b46288912627704e8367 is the first bad commit
commit 5465002ac926216c4bb9b46288912627704e8367
Author: Bernie Thompson <bhthompson@google.com>
Date:   Fri Jan 12 10:03:20 2018 -0800

    Add mst Android PFQ Configuration for master-arc-dev Android branch

    BUG=b:71722810
    TEST=preupload passes
    chromeos_config_unittest

    Change-Id: Ic8c73aea4178c0a41a46cf21a35121219b1ad776
    Reviewed-on: https://chromium-review.googlesource.com/865003
    Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
    Tested-by: Bernie Thompson <bhthompson@chromium.org>
    Reviewed-by: Bernie Thompson <bhthompson@chromium.org>

:040000 040000 c39aac25bb97b64acbc32e8ea01297ad026e78ca a514ae1018b230612be0a9d4a21bcb56ac1732cd M	cbuildbot
:040000 040000 8465981e0a92c15bea4591dd6cf455ec5a4b2f83 eb161e1c54a41cf8982ca950f1e61e5f3a14ebd4 M	lib
:040000 040000 d287b11c3a90816797ae5f04d638e817542a36b1 724017184bb5805be1c9a165719a7918d1a6758f M	scripts
Potential fix: http://crrev.com/c/879107
Status: Started (was: Untriaged)
Labels: Milestone-Reliability
Status: Fixed (was: Started)
From code review dgarrett wrote:
> I'm not sure that's right.
> 
> This will cause it to not crash, but we should really update the list of builds to watch, wherever that is.
> 
> And do we want crashes to force us to keep that list up to date, in future? And/or convince us to automate it's generation somehow?

Having an outage for an important tool is not the proper way to ensure this, especially when we can/should do it in other means and have partial coverage.

In this case, the build needs a waterfall restart so we had a bit of a race.  So maybe the change should have been split into two, one to add the new builds, and one to add to SoM once it's done.  

I think the proper fix is to have SoM do the best it can, and also attempt to generate a self-alert in such a failure.  Or generate some metrics which turn into alerts.
I agree that's a good approach, but we need someone to make that happen. My fear is that we paper over the crash and never add the alert.
I'm not sure how to respond to the staffing issues.

Currently we don't have anyone staffed to make SoM improvements, and SoM work is not high on my list of current priorities (and theres a long list of more important SoM features I would prioritize higher than this).  I'm not sure when that is going to change.

The best I can offer is to open a bug and add it to the queue.


Also, in practice, this only should be happening:
- new builds are added to SoM
- there's some major CIDB issue resulting in builds not being returned

For the former, I think the contributor of the code should be on the look out for failures.  For the latter, I don't think this is the ideal way to detect CIDB issues.
Project Member

Comment 11 by bugdroid1@chromium.org, Jan 23

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/5e98a2c5d1cd7e1eff2c7b5e797f46097c444eb6

commit 5e98a2c5d1cd7e1eff2c7b5e797f46097c444eb6
Author: David Riley <davidriley@chromium.org>
Date: Tue Jan 23 09:48:48 2018

som_alerts_dispatcher: Fix logs when unable to find build.

The previous fix was rushed and logged the wrong thing.

BUG= chromium:804372 
TEST=som_alerts_dispatcher

Change-Id: I2557ff3f4c75c188c4bde09fa27e1e837be4255f
Reviewed-on: https://chromium-review.googlesource.com/879221
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: David Riley <davidriley@chromium.org>
Reviewed-by: Jacob Kopczynski <jkop@chromium.org>

[modify] https://crrev.com/5e98a2c5d1cd7e1eff2c7b5e797f46097c444eb6/scripts/som_alerts_dispatcher.py

Sign in to add a comment