New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 820190 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Mac
Pri: 1
Type: Bug

Blocking:
issue 627636
issue 815092



Sign in to add a comment

Some GPU FYI LUCI bots timeout after 3 hours

Project Member Reported by cwallez@chromium.org, Mar 8 2018

Issue description

The three following bots fail with "Infra failure" after 3hours in what looks like a global timeout.
 - https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20(AMD%20R7%20240)
 - https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Pro%20FYI%20Release%20(AMD)
 - https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Experimental%20Retina%20Release%20(NVIDIA)

Looking at this run, https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29/5 the WebGL 2 CTS tests stop midway with some tasks still pending https://chromium-swarm.appspot.com/task?id=3c1fab0e53ac7510&refresh=10&show_raw=1

What's weird is that the context_lost_tests took almost two hours, but looking at the task, all but a minute of that was in the "pending" state. https://chromium-swarm.appspot.com/task?id=3c1f2f3e3e17fd10&refresh=10

I'm not sure I understand all that's happening but would avoiding to count the "pending" state of tasks in the global LUCI timeout be possible?
 

Comment 1 by kbr@chromium.org, Mar 8 2018

Blocking: 815092
I think this is happening because we have very little hardware for these machines (1-2 machines each) and they're being used by both the Buildbot and LUCI versions of the bots right now.

The Buildbot versions were OK with the current timeouts so I think that if they're flipped to LUCI in prod (they aren't part of any tryserver) and we decommission the Buildbot versions that they should go green.

timeout that hits your **builds** (!= triggered tests on swarming) is timeout set here: https://chromium.googlesource.com/chromium/src/+blame/master/infra/config/global/cr-buildbucket.cfg#241

Consider increasing it per your builder(s), similar to this bot: https://chromium.googlesource.com/chromium/src/+blame/master/infra/config/global/cr-buildbucket.cfg#264
Thanks for the pointers, I'll increase it on all the GPU FYI bots for now via the mixins.

Comment 4 by kbr@chromium.org, Mar 9 2018

Owner: cwallez@chromium.org
Status: Assigned (was: Available)
Thanks Corentin for picking this up!

I think it would be better to make a new mixin like "gpu-bot-slow" or similar which has the larger execution_timeout_secs, and use that mixin only on the affected builders, with a comment.

Project Member

Comment 5 by bugdroid1@chromium.org, Mar 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/de7e8361b0c2973e4d624f2425ca9431b8f88fcd

commit de7e8361b0c2973e4d624f2425ca9431b8f88fcd
Author: Corentin Wallez <cwallez@chromium.org>
Date: Fri Mar 09 15:04:14 2018

Double the timeout on some GPU FYI bots during LUCI migration

BUG= 820190 

Change-Id: I8a84d1bc2d8635b7d76b438c938eb1c70874e8a2
Reviewed-on: https://chromium-review.googlesource.com/956923
Reviewed-by: Andrii Shyshkalov <tandrii@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Corentin Wallez <cwallez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#542110}
[modify] https://crrev.com/de7e8361b0c2973e4d624f2425ca9431b8f88fcd/infra/config/global/cr-buildbucket.cfg

Comment 6 by kbr@chromium.org, Mar 10 2018

Blocking: 627636
Project Member

Comment 7 by bugdroid1@chromium.org, Mar 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/50678d17cb35525633c11239d801bbd077382ff3

commit 50678d17cb35525633c11239d801bbd077382ff3
Author: Geoff Lang <geofflang@chromium.org>
Date: Thu Mar 15 19:27:10 2018

Increase the timeout for the Win10 FYI Debug (NVIDIA) bot.

TBR=iannucci@chromium.org
NOTRY=true

BUG= 820190 

Change-Id: I926d479b6c2051ba93842174d1dc468524f0e158
Reviewed-on: https://chromium-review.googlesource.com/964784
Reviewed-by: Geoff Lang <geofflang@chromium.org>
Commit-Queue: Geoff Lang <geofflang@chromium.org>
Cr-Commit-Position: refs/heads/master@{#543469}
[modify] https://crrev.com/50678d17cb35525633c11239d801bbd077382ff3/infra/config/global/cr-buildbucket.cfg

Project Member

Comment 8 by bugdroid1@chromium.org, Mar 16 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e50caa802c62e5672f9f099b37a60a72533442b7

commit e50caa802c62e5672f9f099b37a60a72533442b7
Author: Geoff Lang <geofflang@chromium.org>
Date: Fri Mar 16 15:55:12 2018

Increase the timeout of all LUCI Windows GPU FYI bots.

Timeouts are seen accross most of the builders.

TBR=iannucci@chromium.org
NOTRY=true

BUG= 820190 

Change-Id: Id7122242be4fcfe34893a305c32fa4d05f49a80b
Reviewed-on: https://chromium-review.googlesource.com/966588
Reviewed-by: Geoff Lang <geofflang@chromium.org>
Commit-Queue: Geoff Lang <geofflang@chromium.org>
Cr-Commit-Position: refs/heads/master@{#543711}
[modify] https://crrev.com/e50caa802c62e5672f9f099b37a60a72533442b7/infra/config/global/cr-buildbucket.cfg

Status: Fixed (was: Assigned)

Sign in to add a comment