New issue
Advanced search Search tips

Issue 876570 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 29
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Mojo FYI builders still aren't running test as the same priority as main builders

Project Member Reported by dpranke@chromium.org, Aug 22

Issue description

back in bug 842940, we last had the discussion as to whether the mojo team's builders on FYI should be seeing tasks expire before tasks were expiring on the main waterfall, i.e., what "FYI" actually meant.

That bug ended up having a couple of possible resolutions, one of which was to deprioritize the LUCI migration traffic so that we'd have tasks expiring there before the Mojo bots. Another was to make sure that the Mojo tasks ran at the same priority as the main waterfall bots.

Apparently I fixed the first thing, but forgot to fix the second thing.

The net result is that 

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20Tests/71907

was green but

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mojo%20Linux/17772

was not; the latter had expired tasks and was affecting the mojo sheriffing rotation.

Unfortunately, we also didn't get alerting for this, and perhaps didn't notice because we didn't have tasks expiring on the main waterfall (so I'm told, I haven't yet verified this).

We need to fix the monitoring / alerting so this doesn't happen; we'll track that in other bugs like bug 722533.

This bug is intended to track fixing the priorities of the Mojo bots and any other "important but not main" bots to have the same priority as the main bots, task-scheduling-wise.

 
Cc: iannucci@chromium.org
Labels: cit-pm-91
@iannucci - I think the outage this morning might've been due to cit-pm-91 as well, but not sure?
> Another was to make sure that the Mojo tasks ran at the same priority as the main waterfall bots.

Are you referring to https://chromium-review.googlesource.com/c/chromium/src/+/1097600?

Note that that works for its trybot mirror but not for the actual CI bot. See my comment at https://bugs.chromium.org/p/chromium/issues/detail?id=869114#c7
No, I'm not referring to that CL, though that CL *also* helped with bug 842940, so maybe there were three possible resolutions, not two.
Maybe? pm 91 is kind of a bundle of stuff. The shortage in MP is the same symptom, but I'm not sure it's the same cause (I think the extra cores configuration we were discussing in the LUCI hangout may be the real cause, which is different than PM 91)
good point, it's more likely due to a misconfiguration on our side. Okay, separate incident :).
Labels: -cit-pm-91 chops-pm-93
Cc: iannu...@google.com
Cc: -iannucci@chromium.org
Project Member

Comment 9 by bugdroid1@chromium.org, Nov 28

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/5641c8b6567eb9aba1b81dd78838012ef5681c5c

commit 5641c8b6567eb9aba1b81dd78838012ef5681c5c
Author: Dirk Pranke <dpranke@chromium.org>
Date: Wed Nov 28 02:39:59 2018

Change the Mojo chromium.fyi bots to run swarming tasks at pri=25.

Currently the Mojo bots on chromium.fyi run tasks at the default
swarming priority for .fyi bots (35); this means that they are
lower priority than main waterfall bots (25) and optional CQ
bots (30), and may get starved before the others. We don't want that,
we want them to have the same priority as the main waterfall bots.

Longer-term, we should move them to a new "master" so that we don't
have multiple builders on the same master with different default
priorities, but this is a simple short term fix that doesn't require
coordinating changes across multiple repos.

Bug:  876570 
Change-Id: Iac34f03f8fff2f9201529dda8943f58ae7eead46
Reviewed-on: https://chromium-review.googlesource.com/c/1184381
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Dirk Pranke <dpranke@chromium.org>

[modify] https://crrev.com/5641c8b6567eb9aba1b81dd78838012ef5681c5c/scripts/slave/README.recipes.md
[modify] https://crrev.com/5641c8b6567eb9aba1b81dd78838012ef5681c5c/scripts/slave/recipe_modules/chromium_swarming/api.py
[modify] https://crrev.com/5641c8b6567eb9aba1b81dd78838012ef5681c5c/scripts/slave/recipe_modules/chromium_tests/chromium_fyi.py
[modify] https://crrev.com/5641c8b6567eb9aba1b81dd78838012ef5681c5c/scripts/slave/recipe_modules/chromium_tests/api.py
[modify] https://crrev.com/5641c8b6567eb9aba1b81dd78838012ef5681c5c/scripts/slave/recipe_modules/chromium_swarming/tests/configure_swarming.py

It looks like this is done now unless there are other builders we need to increase the priority on. Dirk, want to mark this as fixed?
Status: Fixed (was: Started)
Yup, I had just left it open until I had verified that it was working and it had stuck for a while. I think we're at that point now.

Sorry for the very long delay on this :(.

Sign in to add a comment