New issue
Advanced search Search tips

Issue 908989 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner: ----
Closed: Nov 28
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 908515



Sign in to add a comment

Intermittent BOT_DIED happening to mac-10_13_laptop_high_end-perf

Project Member Reported by crouleau@chromium.org, Nov 27

Issue description

See https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/mac-10_13_laptop_high_end-perf

For many of the builds like this one https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/mac-10_13_laptop_high_end-perf/1954

we are getting one of the swarming bots for one of the shards for performance_test_suite
to die:

"shard #0 had an internal swarming failure"

https://chrome-swarming.appspot.com/task?id=416f28179a594e10&refresh=10&show_raw=1

The swarming bot page says "BOT_DIED", but I can't find any other detail.

It seems to either happen on shard 0 or shard 16.
https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/mac-10_13_laptop_high_end-perf/1949

bots build140-a7 build153-a7

bot for shard 0: https://chrome-swarming.appspot.com/bot?id=build153-a7&sort_stats=total%3Adesc
bot for shard 16: https://chrome-swarming.appspot.com/bot?id=build140-a7&sort_stats=total%3Adesc

See issue 908515 for initial investigation. I suspected that one of the recently added test cases was to blame, but the dying kept happening after I reverted that change.


 
Cc: -nednguyen@chromium.org nedngu...@google.com jbudorick@chromium.org
Labels: Foundation-Troopers
If a trooper could simply quarantine build140-a7 and build153-a7 so that my Telemetry's soft affinity selects other bots instead, that would quickly solve the issue.
This is likely bug 894421. (See vadim's comment in #6 for a summary of what's going on.)
Mergedinto: 894421
Status: Duplicate (was: Untriaged)

Sign in to add a comment