New issue
Advanced search Search tips

Issue 903494 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: 2
NextAction: ----
OS: Chrome
Pri: 1
Type: Feature

Blocking:
issue 903488


Show other hotlists

Hotlists containing this issue:
CrOSParallelCQ


Sign in to add a comment

Mark the builders that are tree-closers in the console

Project Member Reported by jclinton@chromium.org, Nov 8

Issue description

CrOS go/legoland currently “grays out” builders that are non-critical. We don’t expect LUCI to know or care about that status. However, it would be nice if Milo offered this gray-out feature based on a Buildbucket property/tag. Then, we can export the tag that says “non-critical” and have the builder/build rendered as grayed out.
 
Blocking: 903488

Comment 2 Deleted

Cc: jbudorick@chromium.org
What is "critical builder"?
This might be generalizable outside of CrOS as Browser also has a concept of "tree closers", which might be the same thing. John, how does CCI solve this problem for Browser?
Yea, it's the same thing as a tree closer except that we decided that we're willing to accept reduced coverage and allow the CQ to keep working even though one of the 88 builders is down/broken.
Labels: Disable-Nags
I believe Browser solves this problem by having separate consoles for critical and non-critical builders. Consoles with "fyi" in the name are not critical https://ci.chromium.org/p/chromium
Does this solve the problem well enough?

John, how does this work for you?
That could work. Would be interested to hear CCI/John perspective.
my only concern is that, from the DD, it seems that that criticality/stability of a builder is quite dynamic. Updating consoles all the time (via CLs) wouldn't be easy, but if we mark a builder as non-critical only temporarily, during an outage, then perhaps consoles do not have to updated.
#4: I'm not sure what distinction you're intending to draw between typical tree closers and your concept; can you clarify that a bit?

Browser has three different priority tiers: primary (i.e., tree closers), secondary, and fyi. These are currently loosely and inconsistently configured: 
 - the gatekeeper config (https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/gatekeeper.json) distinguishes between tree closers (primary) and otherwise (secondary + fyi), and determines what closes the tree (and thus blocks the CQ)
 - the milos consoles generally distinguish between fyi and not-fyi. The gatekeeper config does this to some extent, too, with sheriff classes. The former are basically just aesthetic, while the latter determines what shows up on SoM.

Browser's builders are of relatively static criticality/stability; the most frequent change would be to start a builder on FYI and graduate it to seconadary, but something like that would typically happen once in the lifetime of a builder.

Encoding priority/importance/criticality as a buildbucket property or tag would be pretty useful from the browser side. I suspect it'd let us clean up the muddled configuration story above.

#8: is there a DD that I'm missing...?
oh, thanks. I'll read through the rest of that one.
Cc: no...@chromium.org
Owner: hinoka@chromium.org
Status: Assigned (was: Available)
Milo currently understands the "experimental" tag (which we use for the buildbot migration).  Maybe we can reuse it.
“experimental” in this context has a different meaning. It is closer to whether a builder is a tree closer
Summary: Mark the builders that are tree-closers in the console (was: Implement rendering of Buildbucket property/tag)
EstimatedDays: 2
So just to clarify, there are two problems this could potentially solve:
* Marking FYI builders.  In browser we tend to graduate FYI -> Tree Closer as a one time even in its life.  We usually use a CL change to indicate this graduation.
* Snoozing builder failures.  This is generally dynamic, and in SoM we have this feature.  This is done via a button press.

Which of these two features would you say is important for ChromeOS?
Labels: Pri-1

Comment 19 by jclinton@chromium.org, Jan 16 (6 days ago)

Just to ensure that we're on the same page, this bug is tracking the Milo rendering of this status only.

Snoozing (first time I've heard this term) is tracked in an SoM feature request in issue 903428. The idea is that SoM would submit a config fragment and that fragment would trigger a regeneration of LUCI configuration. We can have that LUCI config have anything you want in it based on that config fragment. What property can Milo render?

With regard to FYI vs Tree Closer (we call this non-critical vs critical), we can use the property for the SoM config fragment or we can use a different one if you think that you can support rendering the subtle difference between the two.

You can see how we handle this now at go/legoland-cq. Builders are slightly fadded if they are currently marked as non-critical. We don't currently have a way to render the snoozed status but would like one.

So, in short, we need both.

Comment 20 by no...@chromium.org, Jan 18 (5 days ago)

Jason, can you remind me why we need this bit to propagate through config system? Wouldn’t it be simpler to make an RPC to SOM to ask wether a builder is critical?

Comment 21 by jclinton@chromium.org, Jan 18 (4 days ago)

Cc: la...@chromium.org aga...@chromium.org seanmccullough@chromium.org
Do Milo and SoM support making/receiving such an RPC? I must have missed that in the discussions.

All builders are added to CrOS in a non-critical state as they go through the process of being "brought up" and proving that they are stable.

So, I guess that there are 3 design options are:

* Nothing related to critical/non-critical or Snoozed stored in config: everything is assumed to be non-critical unless SoM says that it is critical.
* Only critical/non-critical stored in config. Milo, Orchestrator, Post-submit all read this config AND make an RPC to SoM to get Snoozed status.
* Everything is stored in config and SoM submits a config fragment per feature request above. (Current plan.)

Which are you proposing?

Comment 22 by aga...@chromium.org, Jan 18 (4 days ago)

I believe that SoM should not be the arbiter of whether builders are "critical" or not.

It makes sense for it to have the ability to "snooze" builders which are currently misbehaving for (lower-level) reasons that are outside the purview of the sheriffs, but doing so should be inherently temporary (and therefore probably not a CL).

It does not make sense for SoM to constantly contain data saying "these builders are critical". It should not be a repository of configuration information. It exists -- and should continue to exist -- primarily as a store of data generated *by* the build system *for* humans, not the other way around.

My first-pass proposal would be:
* criticality is stored in a LUCI config somewhere. (Long term, I believe that both CrOS and Browser should converge to use this same system, deprecating the current gatekeeper configs.)
* criticality can be temporarily overridden by a human clicking a button in SoM -- this would set a bit in SoM snoozing alerts related to that builder, and SoM would expose this bit over an API interface
* systems which happen to care about criticality would primarily read the luci configs, and if they want to get a more nuanced view, could fire off an RPC to SoM to see if anything is temporarily suppressed.

I believe this coincides most closely with the second bullet-pointed proposal in jclinton's Comment #21.

Sign in to add a comment