New issue
Advanced search Search tips

Issue 772335 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug

Blocked on:
issue 774161



Sign in to add a comment

WPT Import: mac*_blink_rel trybots need more shards

Project Member Reported by raphael....@intel.com, Oct 6 2017

Issue description

(Part of the KR to reduce WPT import latency in Q4)

Right now, when we have new WPT changes to import, we spend a large amount of time on the first stage where we run the changes through the *_blink_rel trybots. Surprisingly, the Mac bots tend to take longer than the Windows.

For example, mac10.12_blink_rel takes around 40min to run the layout tests. If one or more tests fail, it will run everything again without the imported changes, so that's another 40min running tests before we even get to the point where the import process rebaselines the tests and runs the CQ.

The non-Mac trybots aren't the fastest either, but they tend to spend more time on other steps (bot_update and archive_webkit_tests_results are very slow on Windows, for example).
 
Ping? Is there a procedure to ask for more mac*_blink_rel builders?
Blockedon: 774161
Labels: OS-Mac
Ah, sorry for the delay!

Yep, the procedure is:

 1. File an issue with component Infra>Labs to request new slaves. Filed bug 774161 for this.
 2. Commit a change to master.tryserver.blink/slaves.cfg in the build repo (https://cs.chromium.org/chromium/build/masters/master.tryserver.blink/slaves.cfg) to add those slaves to the relevant pools.
 3. File an issue for a master restart, following https://g.co/bugatrooper.
Got it, thanks!
Cc: -qyears...@chromium.org
Owner: qyears...@chromium.org
Status: Started (was: Available)
CL for all except 10.11 retina: https://crrev.com/c/721626
Project Member

Comment 6 by bugdroid1@chromium.org, Oct 16 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/5415e5e60aa335ed9433c8e3de9e1c1ea90373da

commit 5415e5e60aa335ed9433c8e3de9e1c1ea90373da
Author: Quinten Yearsley <qyearsley@chromium.org>
Date: Mon Oct 16 23:17:13 2017

Add slaves to mac10.{10,11,12}_blink_rel

Bug:  772335 
Change-Id: I69c9fa15063dff2eb6bc7c187ed80e1130a27863
Reviewed-on: https://chromium-review.googlesource.com/721626
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Commit-Queue: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/5415e5e60aa335ed9433c8e3de9e1c1ea90373da/masters/master.tryserver.blink/slaves.cfg

Looking at issue 774161, it looks like we're still waiting for the Retina machine(s). Should we wait for all slaves before asking for the master to be restarted or can we do that now for the existing ones?
We can do that now with existing ones; filed bug 777887 for restart today.
Update: now the non-retina mac bots have 3 slaves each.
Thank you. Does that also mean the layout tests will run with more shards or does it only mean we have more slaves available to pick up new builds?
It just means more slaves available to pick up new builds (reducing scheduled/pending time, but not build run time) -- Number of shards for layout tests is separate.
Ah, maybe I misinterpreted this bug from the start...! is time waiting to start a build an issue, or just run time? Increasing the number of shards for layout tests should be a change to //testing/buildbot/chromium.webkit.json.
I see. What's the process for increasing the number of shards? In addition to having few slaves, the time it takes to run the layout tests is quite big as well (as I said in the bug description, it can take 40*2 + epsilon minutes for mac10.12_blink_rel to run before we even get to the stage of rebaselining and triggering the CQ bots).
How does one pick the right value for the builders in chromium.webkit.json? The slowest Mac bots all have |shards| set to 2. Should I raise it to 5? 10?
Cc: dpranke@chromium.org
I believe that 2 was just an initial value to see whether it works OK (relevant CL: https://chromium-review.googlesource.com/c/chromium/src/+/616483).

I think that the setting to some extend is flexible, but the best number probably depends on the number of swarming bots available per platform, as well as the total cumulative test time and desired clock time.

For linux, it's set to 6, but there are a lot more linux swarming bots available to run tasks (https://chromium-swarm.appspot.com/botlist?c=id&c=task&c=status&f=os%3AUbuntu-14.04&l=100&s=os%3Aasc). 6 would probably be acceptable, but then you'd more often run out of swarming bots and the jobs would be waiting on that anyway, perhaps.

Maybe 4 would be OK?
My access to the swarming pages is rather limited, so I'll trust you and send a CL bumping the number to 4 :)
Project Member

Comment 17 by bugdroid1@chromium.org, Oct 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/23c1566fcff5ea60c0d3fc2de833e4af09ebaaa0

commit 23c1566fcff5ea60c0d3fc2de833e4af09ebaaa0
Author: Quinten Yearsley <qyearsley@chromium.org>
Date: Sat Oct 28 10:25:08 2017

Increase shard count for Mac Blink builders

Reason: Currently of the blink_rel try bots, the mac bots are the slowest.

Bug:  772335 
Change-Id: I087fbd652c6e11d2d884905859f13e1601d0dbe5
Reviewed-on: https://chromium-review.googlesource.com/737415
Reviewed-by: Raphael Kubo da Costa (rakuco) <raphael.kubo.da.costa@intel.com>
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Commit-Queue: Raphael Kubo da Costa (rakuco) <raphael.kubo.da.costa@intel.com>
Cr-Commit-Position: refs/heads/master@{#512397}
[modify] https://crrev.com/23c1566fcff5ea60c0d3fc2de833e4af09ebaaa0/testing/buildbot/chromium.webkit.json

Update: Now mac10.{10,11,12}_blink_rel are using 4 shards for layout test runs, and the step duration seems to be closer to 25 minutes, e.g.:
https://build.chromium.org/p/tryserver.blink/builders/mac10.12_blink_rel/builds/2371

mac10.11_retina_blink_rel is still not using swarming, I think, and is still slower, e.g.:
https://build.chromium.org/p/tryserver.blink/builders/mac10.11_retina_blink_rel/builds/4579

Still left to do:
If mac10.11_retina_blink_rel is now the bottleneck, then that platform should also use swarming and should have enough shards.
Right now, a job needs to wait 3~4hrs to get a chance to run on mac10.{10,11,12}_blink_rel. Perhaps we still need more shards...
I am planning to add more bots, which will help. We can probably add more shards for the tests as well.
BTW, I've noticed that each of the mac10.{10,11_retina,12}_blink_rel builders has one bot that's actually disconnected for at least a few days, which brings the number of available slaves back to 2 again.
Project Member

Comment 22 by bugdroid1@chromium.org, Nov 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/82538a9a747bb968b4b0b5443e894c46fb196a64

commit 82538a9a747bb968b4b0b5443e894c46fb196a64
Author: Dirk Pranke <dpranke@chromium.org>
Date: Wed Nov 08 02:59:22 2017

Re-add mac bots to tryserver.blink.

Now that the six bots have been upgraded from 10.10 and 10.11
to 10.12, we can re-add them to the pool giving us hopefully
plenty of capacity.

TBR=qyearsley@chromium.org
BUG= 772335 , 780950

Change-Id: Iaec59adc18c850179b58973a57445a7ffb80ac95
Reviewed-on: https://chromium-review.googlesource.com/757981
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Commit-Queue: Dirk Pranke <dpranke@chromium.org>

[modify] https://crrev.com/82538a9a747bb968b4b0b5443e894c46fb196a64/masters/master.tryserver.blink/slaves.cfg

Status: Fixed (was: Started)
We should have enough capacity now (buildbot-bot-wise). Let me know if you see pending builds with any frequency going forward. We can also add more shards to the tests to decrease cycle time if need be.


Actually, they might not be fully provisioned yet, but we're working on it. See crbug.com/780950 for additional details.

Sign in to add a comment