New issue
Advanced search Search tips

Issue 904903 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug


Participants' hotlists:
chrome-client-infra-backlog


Sign in to add a comment

Need better UI presentation for infra failing builds due to lack of capacity in isolated tests

Project Member Reported by erikc...@chromium.org, Nov 13

Issue description

As per go/top-cq-flakes:
https://datastudio.google.com/c/reporting/12dYEpcepJ5_6ZOhprbd5GpDNooiUJONV/page/AYfX

Looking at a sample build:
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/cast_shell_linux/178938

We see many errors of the form:
"""
Invalid Swarming task state: NO_RESOURCE
"""

Is this due to insufficient swarming capacity? If so I would expect a clearer message, and perhaps even a different presentation color just so we can distinguish that from other problems we need to investigate.

+ sergeyberezin, current CCI trooper
+ jbudorick, maruel, stgao
 
NO_RESOURCE means there are 0 bots that can possibly run this task, typically it's due to inconsistent dimensions (misconfiguration). However, a sample task https://chromium-swarm.appspot.com/task?id=411e18ab894edc10 shows 2420 bots that can run it... I wonder if it's a swarming issue?
I have a strong suspicion that it was due to the network maintenance at that time: https://groups.google.com/a/google.com/forum/#!topic/chrome-infrastructure-announce/d_4zm3cY6ls

It affected GCE bots, and therefore, may have affected large pools of bots after they were respawned by the Machine Provider (which happens every 24h).
Cc: sergeybe...@chromium.org
Components: Infra>Client>Chrome
Labels: -Infra-Troopers
Owner: ----
Status: Available (was: Assigned)
Summary: Need better UI presentation for infra failing builds due to lack of capacity in isolated tests (was: Spike in instances of INVALID_TEST_RESULTS)
The actual outage should now be over, and I don't see any suspicious failures in the recent history of the builder.

Since the bug talks about UX more than the actual outage, I'll keep it open and rephrase the title. It's not really a trooper issue - adding it to chrome-client-infra-backlog list for tracking.
#2 is correct. See go/chops-pm-105 (internal) for context.

I would definitely agree about the UI needing to be better.

Sign in to add a comment