New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 914104 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocked on:
issue 916770



Sign in to add a comment

swarming test pending time of win7_chromium_rel_ng builder exceeds 20 mins sometimes

Project Member Reported by tikuta@chromium.org, Dec 11

Issue description

pending time of win7_chromium_rel_ng builder's swarming test sometimes exceeds 20 mins in peak time.
http://shortn/_UYoRjmdsBs
Can we add more capacity for win7_chromium_rel_ng's swarming test pool?


There are some other builder having large pending duration, but win7_chromium_rel_ng has higher priority because this builder tends to be slowest in CQ.
http://shortn/_p7OZQUVVjf
 
Owner: sergeybe...@chromium.org
Status: Assigned (was: Untriaged)
Let me take a look.
It seems the isolated tests run on VMs based in Golo, which is usually harder to provision than GCE VMs (e.g. for Win10).
Sample task: https://chromium-swarm.appspot.com/task?id=41bc877bc523ad10

Seeing 20 min pending times in
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/151674
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/151675

If we cannot add more capacity for Golo VMs, is it possible to run some time consuming test (e.g. webkit_layout_test, browser_tests) only on GCE based VMs?
#1/2 - I'll point out, while it takes more time to deploy Win7 VMs, we do have capacity to do this if that turns out being the right course of action.
Labels: -Pri-2 Pri-1
Dirk, John how do you think?
I think adding capacity to win7_chromium_rel_ng is reasonable thing if we can do.

Ping?
Max pending time of win7_chromium_rel_ng builder always becomes more than 20 mins during MTV business time.
http://shortn/_TB9J3ij0nd

Adding capacity seems worth doing.
Cc: bradhall@chromium.org
Adding more capacity here seems reasonable, though I'm not immediately sure how many we should add.

#2: win7 isn't supported on gce-based VMs.

#3: how many win7 VMs can we deploy?
> how many win7 VMs can we deploy?

How many would you estimate are needed?
> How many would you estimate are needed?

https://plx.corp.google.com/scripts2/script_5b._720d13_0000_240a_a6f4_001a11c04a1c

Considering peak usage of swarming tasks, I expect adding 100 vms, 496-> around 600, will improve the situation in most cases.

Agree w/ #8, I think something in the 50-100 range would be good here if possible.
Owner: d...@chromium.org
I'll verify we have the capacity for the 100 and if not, whats the max we can deploy this week.
We do have the capacity to meet all 100. Going to take more than a week most likely to give it all. So we'll give updates as chunks are delivered.
Project Member

Comment 13 by bugdroid1@chromium.org, Dec 19

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/478d8287debc37a83dfcb9f7c960eee39289bf78

commit 478d8287debc37a83dfcb9f7c960eee39289bf78
Author: Garrett Beaty <gbeaty@chromium.org>
Date: Wed Dec 19 01:14:37 2018

Increase the timeout for win7_chromium_rel_ng to mitigate timeouts.

Bug: 914104
Change-Id: I65d262ad1ee57be59d9e7c79efe95a30c379b0c5
Reviewed-on: https://chromium-review.googlesource.com/c/1383531
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Garrett Beaty <gbeaty@chromium.org>
Cr-Commit-Position: refs/heads/master@{#617700}
[modify] https://crrev.com/478d8287debc37a83dfcb9f7c960eee39289bf78/infra/config/global/cr-buildbucket.cfg

Project Member

Comment 14 by bugdroid1@chromium.org, Dec 19

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chrome-golo/chrome-golo/+/f81df0f688e4f9aa29a3ef718dfeaba7270be565

commit f81df0f688e4f9aa29a3ef718dfeaba7270be565
Author: Bryce Albritton <dba@google.com>
Date: Wed Dec 19 21:35:20 2018

Project Member

Comment 15 by bugdroid1@chromium.org, Dec 19

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/5014e112f146383905281d0ee56d11f26a82422b

commit 5014e112f146383905281d0ee56d11f26a82422b
Author: Bryce Albritton <dba@google.com>
Date: Wed Dec 19 22:17:07 2018

49 bots added to Pool: Chrome (2 of the bots added were already in the config, #14 delivers the remaining 47).

Will continue with the remaining 51 to give a total of 100.
Blockedon: 916770
Project Member

Comment 18 by bugdroid1@chromium.org, Dec 20

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/18d41718d0e0b5e3719d16141075c2e5e2f03a55

commit 18d41718d0e0b5e3719d16141075c2e5e2f03a55
Author: Bryce Albritton <dba@google.com>
Date: Thu Dec 20 22:28:07 2018

Project Member

Comment 19 by bugdroid1@chromium.org, Dec 20

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chrome-golo/chrome-golo/+/bef758817b216c71240af29b679565383c55216d

commit bef758817b216c71240af29b679565383c55216d
Author: Bryce Albritton <dba@google.com>
Date: Thu Dec 20 22:28:18 2018

51 more bots are now in Pool: Chrome.
NextAction: 2019-01-14
Status: Verified (was: Assigned)
Thanks, dba!

4 week stats showed good improvement in max pending time.
http://shortn/_j2KIM2a2wl
https://screenshot.googleplex.com/KVKht3gtvLw

I will mark this bug fixed if this week has similar stats.
Status: Assigned (was: Verified)
Hmm, 600 bots may not sufficient yet.
http://shortn/_HZGdC6uchh
That's pretty crazy that 600 bots isn't enough.
The NextAction date has arrived: 2019-01-14
Cc: sergeybe...@google.com
NextAction: ----
Hmm, adding capacity does not improve pending time well.
http://shortn/_KoHGgL6PWI

But time consuming tasks are sent from 'Win7 Tests (dbg)(1)' bots.
Seems more than 50% of win7 pool is consumed by non-optimized dbg test.
https://plx.corp.google.com/scripts2/script_5c._3cf21a_0000_2bcb_926c_089e0832afdc

Can we enable some optimization for dbg tester?
I think one of the biggest roll of dbg tester is to confirm the behavior of component build binary rather than confirm the behavior of non-optimized binary.
Sorry, query in #26 is wrong, updated.

browser_tests and webkit_layout_tests in win7_chromium_rel_ng builder is most time consuming test in the pool.

I have some ideas for this
* add more capacity if possible
* move some test to win10 gce pool as gpu related tests already run on win10, and we can resize gce pool dynamically, so we can somewhat control the cost.
* do nothing, put up with in peak time

Comment 29 by bradhall@google.com, Jan 16 (6 days ago)

How hard would it be to move tests to the win10 gce pool?

Comment 30 by tikuta@chromium.org, Jan 16 (6 days ago)

Technically, it can be done like
https://chromium-review.googlesource.com/c/chromium/src/+/1404907

We also need to increase win10 pool capacity in this case.

Comment 31 by bpastene@chromium.org, Jan 17 (5 days ago)

Cc: kbr@chromium.org
> move some test to win10 gce pool as gpu related tests already run on win10, and we can resize gce pool dynamically, so we can somewhat control the cost.

I think it would make more sense to move the win10 GPU tests on win7_chromium_rel_ng to win10_chromium_x64_rel_ng. Doesn't decrease load at all, just makes the naming scheme more consistent/less surprising.

Comment 32 by bpastene@chromium.org, Jan 17 (5 days ago)

Owner: ----
Status: Available (was: Assigned)
And I don't think we have deployments currently up in the air, so removing bryce as owner until we have more concrete plans.

Comment 33 by kbr@chromium.org, Jan 17 (5 days ago)

It'd be fine to move the Win10 GPU tests off of win7_chromium_rel_ng -- the only difference being that they'd be tested in 64-bit rather than 32-bit builds. Is there capacity on the win10_chromium_x64_rel_ng bot to simply switch these tests over?

Note that unfortunately this mirroring is currently done in the tools/build workspace:
https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/chromium_tests/trybots.py?q=trybots.py&sq=package:chromium&dr

so the switchover would have to be done without any tryjobs.

Also, the GPU Win Builder is currently a 32-bit builder, so we would need to make it build 64-bit first.

Sign in to add a comment