New issue
Advanced search Search tips

Issue 702423 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Don't mark cancelled builds as "infrastructure failure"

Project Member Reported by philwright@google.com, Mar 16 2017

Issue description

(Inspired by http://o/e/m250d93bec8000152, with corresponding discussion in troopers weekly meeting - https://docs.google.com/a/google.com/document/d/12SNu0lSgijd5PkwOzaKZ7jNmoUV74MD5yvbT3s17aUc/edit?disco=AAAABCB8lQU)

Cancelled builds shouldn't count as "infrastructure failures".

Suggestion is to create a new Build status of "Cancelled" or "Aborted" or something similar (if possible)

Note that builds can be cancelled via Buildbucket as well as through Buildbot WebUI.
 
Components: Infra>Platform>Buildbot
Labels: -Pri-3 Pri-2
Labels: -OS-Mac
Status: Untriaged (was: Unconfirmed)

Comment 3 by estaab@chromium.org, Mar 20 2017

Components: Infra>Platform>Milo
This issues just triggered a page - http://o/e/m2513081f9000002e

Comment 5 by no...@chromium.org, Mar 27 2017

Components: -Infra>Platform>Milo Infra>Platform>Milo>Buildbot
Another page caused by this issue: http://o/e/m25141021c8000001

Comment 7 by no...@chromium.org, Mar 29 2017

Labels: -Pri-2 Pri-1
Owner: no...@chromium.org
Status: Assigned (was: Untriaged)
I'm not sure how easy this is to solve with buildbot. Nodir, since you increased the priority do you have an idea? Should we do this now or once buildbot is out of the code path?

Comment 9 by no...@chromium.org, Apr 7 2017

Blockedon: 708395
Status: Started (was: Assigned)

Comment 10 by no...@chromium.org, Apr 19 2017

Components: -Infra>Platform>Milo>Buildbot
Owner: ----
Status: Available (was: Started)
I've misinterpreted this bug. Buildbot does not distinguish builds failed due to an EXCEPTION from cancelled builds because cancelling is implemented via an exception. The root cause has nothing to do with Milo or Swarmbucket. The metric is implemented as a part of mastermon, I think.
another page triggered

http://o/e/m2525183748000009

Comment 12 by no...@chromium.org, May 11 2017

Blockedon: -708395
Owner: dpranke@chromium.org
Status: Assigned (was: Available)
Hi Dirk, can you see if you can find someone to look into this?
Cc: dpranke@chromium.org
Owner: estaab@chromium.org
Erik, I'll let you triage / prioritize this.
Cc: efoo@chromium.org
Components: -Infra>Platform>Buildbot Infra>Platform>Buildbucket
Adding this a concept to buildbot will be significant work given cancellations are implemented as exceptions (purple). We can do this much more easily in buildbucket and should make sure we support it when we port monitoring to kitchen.

I'm going to put this under buildbucket since I think that's most appropriate.

Eric, how do we want to handle incoming bugs that we want to add to our schedule?

Comment 17 by no...@chromium.org, Jun 14 2017

there is nothing to do in swarmbucket case. The bug is specific to buildbot. This defect does not exist outside of buildbot
Is that because you can't cancel a swarmbucket build, or because we have a different way of reporting the build as cancelled?

Comment 19 by no...@chromium.org, Jun 14 2017

neither.
you can cancel a swarmbucket build.
the of way of reporting a build as cancelled is same for swarmbucket and buildbucket builds that are executed by buildbot.

let me rephrase/correct myself: this bug is in master monitoring. It does not distinguish a cancelled build from a true status=EXCEPTION build. This monitoring code runs on master machines, thus changes to buildbucket, swarmbucket or any part of LUCI won't help here.
The LOC in question is
https://chromium.googlesource.com/chromium/tools/build/+/6fbefca00bc22d0e950f92a0e0cb945d16e6ebf4/scripts/master/status_logger.py#540
currently the value of 'result' is 'exception' for cancelled builds. This is incorrect and should be fixed.

How to determine that a build with result=exception is actually as cancelled build? Something like this
https://chromium.googlesource.com/chromium/tools/build/+/6fbefca00bc22d0e950f92a0e0cb945d16e6ebf4/scripts/master/buildbucket/integration.py#529

In my opinion, whoever owns buildbot master monitoring should own this bug.
Sorry, I realize I could have been more clear in comment 16. I wanted to make sure cancelled builds are a concept in the LUCI/buildbucket world. We're going to have a similar alert after monitoring is implemented and we should make sure this problem is addressed.

I'm refocussing this bug to LUCI from buildbot since I don't think it's worth building these concepts into buildbot at this point but I also don't want us to keep having it going forward.
Project Member

Comment 21 by sheriffbot@chromium.org, Jul 25 2017

Labels: Hotlist-Google

Comment 22 by no...@chromium.org, Apr 30 2018

Status: WontFix (was: Assigned)
this becomes increasingly irrelevant due to LUCI

Sign in to add a comment