Buildbot emails cry wolf |
||||||
Issue descriptionI often get buildbot emails, even if there is practically no chance I've caused the issue. dnj@ told me a while back (on the bus to Monterey) that I should report them so Infra knows of some concrete issues. I've attached some recent examples from my inbox. ## android_fed504f8.eml (2016-09-06) Compile error on Android buildbot. My change only modified existing .png icons that are only used on desktop: https://crrev.com/2269513003 ## chromium_e5a4a06e.eml (2016-09-02) Compile error on Chrome OS buildbot. I received 5 similar emails about various Chromium OS bots for this breakage; this is just one of them. My change only modified a single file that is only used on OSX: https://crrev.com/2283753003 ## chromium_fed4d2d9.eml (2016-08-25) Compile error on Chrome OS buildbot. There were only four CLs in the range, which I guess is nice. But my change literally just added 4 characters to a comment: https://codereview.chromium.org/2140103002 ## These are examples of why buildbot emails are rarely worth my time, and why I'm unlikely to pay attention even if the regression range is small and my CL has any chance of causing the issue. I haven't yet reached the step of filtering them out of my inbox, though I'm tempted to do it soon. But I don't want to give up hope. ;-) Assigning to dnj@ for triage or whatever you'd like to do with these reports. :-)
,
Sep 27 2016
Which list are these going to? Just chromium-reviews?
,
Feb 1 2017
This seems to be flaky builds/tests and not particular to buildbot. Findit folks, is there anything that can be done to help here?
,
Apr 25 2017
IIUC, those email notifications are sent out immediately right after a compile failures or other critical failures. Since Findit will take time (6+ minutes) to run a trybot job to identify the real culprit, it might not work well here at the moment. We plan to use fine-grained deps from ninja to filter out unrelated CLs for compile failures. Since this approach is based on heuristic, it would be much faster (less than 1 minutes I would expect). However, this needs substantial work and might need change to ninja/recipes besides Findit. Once that is done, I will revisit this.
,
Aug 28 2017
Another one. I made a 1-line routine config change, and the error seems to bear no possible relation to it.
,
Aug 28 2017
Update on this: We have implemented the basic version ninja-dependency-based heuristic analysis, but it is not good enough to make a call with 100 % accuracy that a CL is not related to a compile failure and it only supports failures in CXX/CC build edges for now. Is it possible to use gn analyze to sort this out? With the ninja wrapper script, we've already known which build edges fail and which output nodes they are. Any thought on this direction?
,
Aug 31 2017
the buildbot blame logic is very simple, it only knows about what CLs are new since the last build, and has no knowledge of what's actually in those CLs or whether they'd affect the build. It is possible that we could use either or both of the ninja-dependency stuff or gn analyze to narrow down possible culprits, but trying to integrate that directly into the email-generating code paths would be difficult at best and would need to change completely in the LUCI world as well. I think a better question is: are these emails even worth it? They date from a time before we had most of the tools we have now, like sheriff-o-matic, findit, etc. I would suggest that we actually poll chromium-dev and see what people think. As part of that, you'd want to focus specifically on the main waterfalls (not fyi, where the emails are usually configured differently). stgao@, do you feel like tackling this?
,
Aug 31 2017
Yes, this is culprit finding which Findit is for. I will try to figure out possible options to reduce noise here and then poll chromium-dev@ for feedback. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by d...@chromium.org
, Sep 7 2016Owner: ----
Status: Untriaged (was: Assigned)