New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 706653 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
User never visited
Closed: Apr 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: iOS
Pri: 1
Type: Bug

Blocking:
issue 706949



Sign in to add a comment

Huge pending queue spikes on ios-simulator try bot

Project Member Reported by st...@chromium.org, Mar 30 2017

Issue description

For this patch https://codereview.chromium.org/2787633002/, in https://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/builds/182738, for all tests with the patch, only "ios_net_unittests (ipad air 2 ios 10.0)" failed due to Swarming task timeout; but during retry, all the tests are run without the patch, including "components_unittests (iphone 5 ios 9.0)" which was successful with patch but failed without patch.

In my understanding, only those failed tests should be retried without patch even they are quick because of cached result on Swarming.

I might missed some context to rerun all tests again without patch.
But at least in this case, the rerun of "components_unittests (iphone 5 ios 9.0)" gave a false signal.
 

Comment 1 by st...@chromium.org, Mar 30 2017

As a compare, in https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng/builds/419614, only the failed step browser_side_navigation_browser_tests was rerun without patch.
Cc: smut@chromium.org
+ Sana, since she reverted a CL to help the net unittest flake.

Comment 3 by st...@chromium.org, Mar 30 2017

Cc: thakis@chromium.org
Labels: -Pri-2 Pri-0
try-jobs on this trybot are queuing up https://screenshot.googleplex.com/08wK2PC2cA3.png
10 out of 28 slaves are retrying all test steps instead of just the failed one

https://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator

Comment 4 by st...@chromium.org, Mar 30 2017

Cc: -smut@chromium.org
Components: Infra>Client>iOS
Owner: smut@chromium.org
Status: Assigned (was: Untriaged)
I believe this change to the ios/try recipe introduce the retry of tests
https://chromium-review.googlesource.com/c/452808/

Assign to smut@.

Is there a quick fix, or should we revert first?
I think we should revert the change now and fix later.

Comment 6 by st...@chromium.org, Mar 30 2017

Owner: s...@google.com

Comment 8 by st...@chromium.org, Mar 30 2017

Labels: -Pri-0 Pri-1
revert landed, lower the priority to P1

Comment 9 by thakis@chromium.org, Mar 31 2017

The bot still has many pending builds: https://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator Maybe the revert didn't help?

Comment 10 by s...@google.com, Mar 31 2017

Though a few builds are still taking 45+ minutes (when they have to recompile everything), many are taking 5-8, which isn't too bad. Maybe we just need more capacity here.

Comment 11 by s...@google.com, Mar 31 2017

Orange is pending builds:
https://screenshot.googleplex.com/pHiREc5n2cs.png

First major spike was the 22nd, around 5pm pacific.

Comment 12 by s...@google.com, Mar 31 2017

Er, 21st not 22nd.

Comment 13 by jam@chromium.org, Mar 31 2017

Issue 706673 has been merged into this issue.

Comment 14 by s...@google.com, Mar 31 2017

I was going to suggest temporarily swiping capacity from the other iOS try bots that don't run tests or anything and have much shorter cycle times, but for some reason ios-device (compile-only) spiked to 26 pending on 3/28 at 3pm (it has 16 configured slaves).

Still, in the last 30 days the more typical peak for ios-device is 8 concurrent builds. 26 was likely an anomaly. I think there was an issue with buildbucket scheduling builds at that time? So I think we can discount that spike, which means we could take 6 VMs from ios-device and still have 10, which should be enough.

ios-device-xcode-clang also usually peaks at 8-9 judging by the last 30 days, and ios-simulator-xcode-clang generally peaks at 8-9 based on the last 30 days, however in the last 3 days it jumped to 12 concurrent builds twice. Both are compile-only.

I suspect we could steal 6 VMs each from ios-device and ios-device-xcode-clang, and maybe 4 from ios-simulator-xcode-clang, which would increase ios-simulator's capacity by about half.

In the sgirt term this would hopefully help the throughput on ios-simulator while we figure out what the problem is.

Comment 15 by s...@google.com, Mar 31 2017

s/sgirt/short/

I forgot about ios-simulator-eg, which seems to peak 6 typically, but has spiked to 10 pending. It's not even part of the CQ so we can definitely steal some VMs from it.
Project Member

Comment 16 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/cdc4eaec35e92e088e9ff010c3c8a966809e8eeb

commit cdc4eaec35e92e088e9ff010c3c8a966809e8eeb
Author: smut <smut@google.com>
Date: Fri Mar 31 03:25:52 2017

Shuffle iOS try slaves to give more capacity to ios-simulator

ios-device:                16 -> 10
ios-device-xcode-clang:    16 -> 10
ios-simulator:             30 -> 52
ios-simulator-cronet:       2 ->  2
ios-simulator-eg:          16 -> 10
ios-simulator-xcode-clang: 16 -> 12

BUG= 706653 

Change-Id: I0446d3542c2b6abf8e8aeb52d658a76bf9e6200f
Reviewed-on: https://chromium-review.googlesource.com/464466
Reviewed-by: smut <smut@chromium.org>

[modify] https://crrev.com/cdc4eaec35e92e088e9ff010c3c8a966809e8eeb/masters/master.tryserver.chromium.mac/slaves.cfg

Project Member

Comment 17 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager/+/0c8bb20d15b9f4754807177482ef97c13e72500f

commit 0c8bb20d15b9f4754807177482ef97c13e72500f
Author: smut <smut@google.com>
Date: Fri Mar 31 03:30:52 2017

Comment 18 by s...@google.com, Mar 31 2017

Labels: Infra-Troopers
Added 22 slaves, increasing capacity by 73%. It's possible that capacity may be needed on Swarming to handle the task load of 52 concurrent ios-simulator builds.

Comment 19 by s...@google.com, Mar 31 2017

Summary: Huge pending queue spikes on ios-simulator try bot (was: On ios-simulator trybot, successful steps were also retried while only one step failed)
Now more than 50% of trybots on ios-simulator fail base_unittests: https://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator?numbuilds=200 (looks like a recent development; might be unrelated to the changes in this bug, but it's related to the broader "ios-simulator has been very unreliable the last few days" theme)

Comment 21 by st...@chromium.org, Mar 31 2017

Those base_unittests failures were due to a recipe change which has been reverted https://chromium.googlesource.com/chromium/tools/build/+/328305ad806f3bc88fe190f754463ee007d0c040

Comment 22 by s...@google.com, Mar 31 2017

Status: Started (was: Assigned)
I'm not seeing any pending queues on the graphs today.

Comment 23 by s...@google.com, Apr 4 2017

Status: Fixed (was: Started)
No pending queues seen since the capacity increase.

Comment 24 by s...@google.com, Apr 4 2017

Blocking: 706949

Comment 25 by s...@google.com, Jun 23 2017

Owner: smut@chromium.org

Sign in to add a comment