New issue
Advanced search Search tips

Issue 920423 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug


Participants' hotlists:
chrome-client-infra-backlog


Sign in to add a comment

win10_chromium_x64_rel_ng swarming tests pool is over capacity

Project Member Reported by steve...@chromium.org, Jan 9

Issue description

Build:
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win10_chromium_x64_rel_ng/171845

Lots of output spam like:
[D2019-01-09T14:53:02.605678-08:00 11940 0 collect.go:343] Waiting task_id: 424d77e426afdd10

 
Maybe related to  issue 917813 ?

Cc: kbr@chromium.org
Components: -Build Infra>Client>Chrome
Cc: -kbr@chromium.org
Components: Infra>Platform>Swarming
Owner: sergeybe...@chromium.org
Status: Assigned (was: Untriaged)
Taking as a trooper. The problem seems to be not with this particular test, but with the swarming pool running the tests, since many other isolated tests are pending for a long time (just like this one), adding to the runtime of the builder and ultimately making it time out at 3h. Either the pool is insufficient, or some tests are taking too much resources and exhausting the pool.
Labels: -Pri-3 Pri-2
Summary: win10_chromium_x64_rel_ng swarming tests pool is over capacity (was: network_service_browser_tests took 47 minutes on win10_chromium_x64_rel_ng)
Confirmed the capacity problem: http://shortn/_BYeyA1oRKg (internal link).

I'll see if we have the resources to add the capacity. Raising priority, since it paged me today because of this issue.
Components: -Infra>Platform>Swarming
I'll continue the internal investigation in http://irm/incidents/a.e9HzQJiAMQfdkVp9zKGQ

Is this also causing issues for bots in the luci.flex.try pool, or is that a separate issue? I have https://ci.chromium.org/p/pdfium/builders/luci.pdfium.try/win/12840 here that's been pending for half an hour.
Issue 921827 has been merged into this issue.
#9 - no, it shouldn't interfere with luci.flex.try pool - it's a completely separate pool of machines.
Issue 921943 has been merged into this issue.
From issue 921943 and CL https://crrev.com/i/776610 - we should justify any further capacity increase. Chances are, some tests were added over time without capacity considerations. We should check if those tests are indeed absolutely needed.
Who will be an approver of capacity increase?
Note: network_service_browser_tests and browser_tests are top 2 resource consumer.
More than half of win10 pool seems used by these 2 tests.
https://plx.corp.google.com/scripts2/script_5c._3cf21e_0000_2bcb_926c_089e0832afdc

Cc: dpranke@chromium.org jbudorick@chromium.org s...@google.com bradhall@chromium.org hinoka@chromium.org
Cc: -bradhall@chromium.org bradhall@google.com

Comment 18 by sergeybe...@chromium.org, Jan 17 (6 days ago)

Issue 922336 has been merged into this issue.
Project Member

Comment 19 by bugdroid1@chromium.org, Yesterday (42 hours ago)

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/5744e7432530c5f09e50bd8f2839227f60018dfc

commit 5744e7432530c5f09e50bd8f2839227f60018dfc
Author: Takuto Ikuta <tikuta@google.com>
Date: Mon Jan 21 11:52:27 2019

Sign in to add a comment