New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 644594 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Buildbot emails cry wolf

Project Member Reported by lgar...@chromium.org, Sep 7 2016

Issue description

I often get buildbot emails, even if there is practically no chance I've caused the issue.

dnj@ told me a while back (on the bus to Monterey) that I should report them so Infra knows of some concrete issues.

I've attached some recent examples from my inbox.

## android_fed504f8.eml (2016-09-06)

Compile error on Android buildbot.
My change only modified existing .png icons that are only used on desktop: https://crrev.com/2269513003

## chromium_e5a4a06e.eml (2016-09-02)

Compile error on Chrome OS buildbot. I received 5 similar emails about various Chromium OS bots for this breakage; this is just one of them.
My change only modified a single file that is only used on OSX: https://crrev.com/2283753003

## chromium_fed4d2d9.eml (2016-08-25)

Compile error on Chrome OS buildbot. There were only four CLs in the range, which I guess is nice.
But my change literally just added 4 characters to a comment: https://codereview.chromium.org/2140103002

##

These are examples of why buildbot emails are rarely worth my time, and why I'm unlikely to pay attention even if the regression range is small and my CL has any chance of causing the issue. I haven't yet reached the step of filtering them out of my inbox, though I'm tempted to do it soon. But I don't want to give up hope. ;-)

Assigning to dnj@ for triage or whatever you'd like to do with these reports. :-)
 
android_fed504f8.eml
85.3 KB Download
chromium_e5a4a06e.eml
45.1 KB Download
chromium_fed4d2d9.eml
23.0 KB Download

Comment 1 by d...@chromium.org, Sep 7 2016

Components: Tools>Test>FindIt
Owner: ----
Status: Untriaged (was: Assigned)
I'm going to remove myself as owner lest I give the impression that I'm working on this right now. Thanks for the post!

Adding FindIt here too, since it's their mission to help with this.
Which list are these going to? Just chromium-reviews? 
Components: -Infra>Platform>Buildbot
Status: Available (was: Untriaged)
This seems to be flaky builds/tests and not particular to buildbot. Findit folks, is there anything that can be done to help here?

Comment 4 by st...@chromium.org, Apr 25 2017

Cc: st...@chromium.org
IIUC, those email notifications are sent out immediately right after a compile failures or other critical failures. Since Findit will take time (6+ minutes) to run a trybot job to identify the real culprit, it might not work well here at the moment.

We plan to use fine-grained deps from ninja to filter out unrelated CLs for compile failures. Since this approach is based on heuristic, it would be much faster (less than 1 minutes I would expect). However, this needs substantial work and might need change to ninja/recipes besides Findit. Once that is done, I will revisit this.
Another one. I made a 1-line routine config change, and the error seems to bear no possible relation to it.
buildbot warning in chromium.webkit on WebKit Win x64 Builder (dbg), revision f238a1a8f2f73d226bdb38394a6e085a59cf0e41.eml
52.9 KB Download

Comment 6 by st...@chromium.org, Aug 28 2017

Cc: dpranke@chromium.org
Update on this: We have implemented the basic version ninja-dependency-based heuristic analysis, but it is not good enough to make a call with 100 % accuracy that a CL is not related to a compile failure and it only supports failures in CXX/CC build edges for now.

Is it possible to use gn analyze to sort this out? With the ninja wrapper script, we've already known which build edges fail and which output nodes they are.
Any thought on this direction?
Cc: seanmccullough@chromium.org
Owner: st...@chromium.org
the buildbot blame logic is very simple, it only knows about what CLs are new since the last build, and has no knowledge of what's actually in those CLs or whether they'd affect the build.

It is possible that we could use either or both of the ninja-dependency stuff or gn analyze to narrow down possible culprits, but trying to integrate that directly into the email-generating code paths would be difficult at best and would need to change completely in the LUCI world as well.

I think a better question is: are these emails even worth it? They date from a time before we had most of the tools we have now, like sheriff-o-matic, findit, etc.

I would suggest that we actually poll chromium-dev and see what people think. As part of that, you'd want to focus specifically on the main waterfalls (not fyi, where the emails are usually configured differently).

stgao@, do you feel like tackling this?

Comment 8 by st...@chromium.org, Aug 31 2017

Status: Assigned (was: Available)
Yes, this is culprit finding which Findit is for.
I will try to figure out possible options to reduce noise here and then poll chromium-dev@ for feedback.

Sign in to add a comment