Mojo FYI builders still aren't running test as the same priority as main builders |
|||||
Issue descriptionback in bug 842940, we last had the discussion as to whether the mojo team's builders on FYI should be seeing tasks expire before tasks were expiring on the main waterfall, i.e., what "FYI" actually meant. That bug ended up having a couple of possible resolutions, one of which was to deprioritize the LUCI migration traffic so that we'd have tasks expiring there before the Mojo bots. Another was to make sure that the Mojo tasks ran at the same priority as the main waterfall bots. Apparently I fixed the first thing, but forgot to fix the second thing. The net result is that https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20Tests/71907 was green but https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mojo%20Linux/17772 was not; the latter had expired tasks and was affecting the mojo sheriffing rotation. Unfortunately, we also didn't get alerting for this, and perhaps didn't notice because we didn't have tasks expiring on the main waterfall (so I'm told, I haven't yet verified this). We need to fix the monitoring / alerting so this doesn't happen; we'll track that in other bugs like bug 722533. This bug is intended to track fixing the priorities of the Mojo bots and any other "important but not main" bots to have the same priority as the main bots, task-scheduling-wise.
,
Aug 22
> Another was to make sure that the Mojo tasks ran at the same priority as the main waterfall bots. Are you referring to https://chromium-review.googlesource.com/c/chromium/src/+/1097600? Note that that works for its trybot mirror but not for the actual CI bot. See my comment at https://bugs.chromium.org/p/chromium/issues/detail?id=869114#c7
,
Aug 22
No, I'm not referring to that CL, though that CL *also* helped with bug 842940, so maybe there were three possible resolutions, not two.
,
Aug 22
Maybe? pm 91 is kind of a bundle of stuff. The shortage in MP is the same symptom, but I'm not sure it's the same cause (I think the extra cores configuration we were discussing in the LUCI hangout may be the real cause, which is different than PM 91)
,
Aug 22
good point, it's more likely due to a misconfiguration on our side. Okay, separate incident :).
,
Aug 22
,
Oct 18
,
Oct 18
,
Nov 28
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/5641c8b6567eb9aba1b81dd78838012ef5681c5c commit 5641c8b6567eb9aba1b81dd78838012ef5681c5c Author: Dirk Pranke <dpranke@chromium.org> Date: Wed Nov 28 02:39:59 2018 Change the Mojo chromium.fyi bots to run swarming tasks at pri=25. Currently the Mojo bots on chromium.fyi run tasks at the default swarming priority for .fyi bots (35); this means that they are lower priority than main waterfall bots (25) and optional CQ bots (30), and may get starved before the others. We don't want that, we want them to have the same priority as the main waterfall bots. Longer-term, we should move them to a new "master" so that we don't have multiple builders on the same master with different default priorities, but this is a simple short term fix that doesn't require coordinating changes across multiple repos. Bug: 876570 Change-Id: Iac34f03f8fff2f9201529dda8943f58ae7eead46 Reviewed-on: https://chromium-review.googlesource.com/c/1184381 Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: Dirk Pranke <dpranke@chromium.org> [modify] https://crrev.com/5641c8b6567eb9aba1b81dd78838012ef5681c5c/scripts/slave/README.recipes.md [modify] https://crrev.com/5641c8b6567eb9aba1b81dd78838012ef5681c5c/scripts/slave/recipe_modules/chromium_swarming/api.py [modify] https://crrev.com/5641c8b6567eb9aba1b81dd78838012ef5681c5c/scripts/slave/recipe_modules/chromium_tests/chromium_fyi.py [modify] https://crrev.com/5641c8b6567eb9aba1b81dd78838012ef5681c5c/scripts/slave/recipe_modules/chromium_tests/api.py [modify] https://crrev.com/5641c8b6567eb9aba1b81dd78838012ef5681c5c/scripts/slave/recipe_modules/chromium_swarming/tests/configure_swarming.py
,
Nov 29
It looks like this is done now unless there are other builders we need to increase the priority on. Dirk, want to mark this as fixed?
,
Nov 29
Yup, I had just left it open until I had verified that it was working and it had stuck for a while. I think we're at that point now. Sorry for the very long delay on this :(. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by dpranke@chromium.org
, Aug 22Labels: cit-pm-91