Mark the builders that are tree-closers in the console |
||||||||
Issue descriptionCrOS go/legoland currently “grays out” builders that are non-critical. We don’t expect LUCI to know or care about that status. However, it would be nice if Milo offered this gray-out feature based on a Buildbucket property/tag. Then, we can export the tag that says “non-critical” and have the builder/build rendered as grayed out.
,
Nov 9
What is "critical builder"? This might be generalizable outside of CrOS as Browser also has a concept of "tree closers", which might be the same thing. John, how does CCI solve this problem for Browser?
,
Nov 9
Yea, it's the same thing as a tree closer except that we decided that we're willing to accept reduced coverage and allow the CQ to keep working even though one of the 88 builders is down/broken.
,
Nov 13
,
Nov 15
I believe Browser solves this problem by having separate consoles for critical and non-critical builders. Consoles with "fyi" in the name are not critical https://ci.chromium.org/p/chromium Does this solve the problem well enough? John, how does this work for you?
,
Nov 16
That could work. Would be interested to hear CCI/John perspective.
,
Nov 16
my only concern is that, from the DD, it seems that that criticality/stability of a builder is quite dynamic. Updating consoles all the time (via CLs) wouldn't be easy, but if we mark a builder as non-critical only temporarily, during an outage, then perhaps consoles do not have to updated.
,
Nov 19
#4: I'm not sure what distinction you're intending to draw between typical tree closers and your concept; can you clarify that a bit? Browser has three different priority tiers: primary (i.e., tree closers), secondary, and fyi. These are currently loosely and inconsistently configured: - the gatekeeper config (https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/gatekeeper.json) distinguishes between tree closers (primary) and otherwise (secondary + fyi), and determines what closes the tree (and thus blocks the CQ) - the milos consoles generally distinguish between fyi and not-fyi. The gatekeeper config does this to some extent, too, with sheriff classes. The former are basically just aesthetic, while the latter determines what shows up on SoM. Browser's builders are of relatively static criticality/stability; the most frequent change would be to start a builder on FYI and graduate it to seconadary, but something like that would typically happen once in the lifetime of a builder. Encoding priority/importance/criticality as a buildbucket property or tag would be pretty useful from the browser side. I suspect it'd let us clean up the muddled configuration story above. #8: is there a DD that I'm missing...?
,
Nov 19
oh, thanks. I'll read through the rest of that one.
,
Dec 19
,
Jan 14
Milo currently understands the "experimental" tag (which we use for the buildbot migration). Maybe we can reuse it.
,
Jan 15
“experimental” in this context has a different meaning. It is closer to whether a builder is a tree closer
,
Jan 15
,
Jan 15
,
Jan 15
So just to clarify, there are two problems this could potentially solve: * Marking FYI builders. In browser we tend to graduate FYI -> Tree Closer as a one time even in its life. We usually use a CL change to indicate this graduation. * Snoozing builder failures. This is generally dynamic, and in SoM we have this feature. This is done via a button press. Which of these two features would you say is important for ChromeOS?
,
Jan 15
,
Jan 16
(6 days ago)
Just to ensure that we're on the same page, this bug is tracking the Milo rendering of this status only. Snoozing (first time I've heard this term) is tracked in an SoM feature request in issue 903428. The idea is that SoM would submit a config fragment and that fragment would trigger a regeneration of LUCI configuration. We can have that LUCI config have anything you want in it based on that config fragment. What property can Milo render? With regard to FYI vs Tree Closer (we call this non-critical vs critical), we can use the property for the SoM config fragment or we can use a different one if you think that you can support rendering the subtle difference between the two. You can see how we handle this now at go/legoland-cq. Builders are slightly fadded if they are currently marked as non-critical. We don't currently have a way to render the snoozed status but would like one. So, in short, we need both.
,
Jan 18
(5 days ago)
Jason, can you remind me why we need this bit to propagate through config system? Wouldn’t it be simpler to make an RPC to SOM to ask wether a builder is critical?
,
Jan 18
(4 days ago)
Do Milo and SoM support making/receiving such an RPC? I must have missed that in the discussions. All builders are added to CrOS in a non-critical state as they go through the process of being "brought up" and proving that they are stable. So, I guess that there are 3 design options are: * Nothing related to critical/non-critical or Snoozed stored in config: everything is assumed to be non-critical unless SoM says that it is critical. * Only critical/non-critical stored in config. Milo, Orchestrator, Post-submit all read this config AND make an RPC to SoM to get Snoozed status. * Everything is stored in config and SoM submits a config fragment per feature request above. (Current plan.) Which are you proposing?
,
Jan 18
(4 days ago)
I believe that SoM should not be the arbiter of whether builders are "critical" or not. It makes sense for it to have the ability to "snooze" builders which are currently misbehaving for (lower-level) reasons that are outside the purview of the sheriffs, but doing so should be inherently temporary (and therefore probably not a CL). It does not make sense for SoM to constantly contain data saying "these builders are critical". It should not be a repository of configuration information. It exists -- and should continue to exist -- primarily as a store of data generated *by* the build system *for* humans, not the other way around. My first-pass proposal would be: * criticality is stored in a LUCI config somewhere. (Long term, I believe that both CrOS and Browser should converge to use this same system, deprecating the current gatekeeper configs.) * criticality can be temporarily overridden by a human clicking a button in SoM -- this would set a bit in SoM snoozing alerts related to that builder, and SoM would expose this bit over an API interface * systems which happen to care about criticality would primarily read the luci configs, and if they want to get a more nuanced view, could fire off an RPC to SoM to see if anything is temporarily suppressed. I believe this coincides most closely with the second bullet-pointed proposal in jclinton's Comment #21. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by jclinton@chromium.org
, Nov 8