New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 718524 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner: ----
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: iOS
Pri: 2
Type: Bug

Blocking:
issue 722583



Sign in to add a comment

Fix capacity problems with ios-simulator

Project Member Reported by baxley@chromium.org, May 4 2017

Issue description

ios-simulator is operating at pretty close to max capacity during peak load.

The current load may be sustainable, but it will prevent adding EarlGrey tests to the CQ.

Some options:
- See if anything changed making the tests run slower or more frequently and fix it.
- Run less configurations
- Add more capacity
 
Capacity graphs show that we jumped from ~50% peak capacity to close to 100% on Apr 17:

https://goto.google.com/wzhmy

IMHO it's worth investigating.

Comment 2 by mmoss@chromium.org, May 4 2017

Components: -Infra Infra>CQ Infra>Platform>Swarming
Project Member

Comment 3 by bugdroid1@chromium.org, May 5 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2f3ad3b7b77d3a36922e42d3d3e5b1ad08a2a074

commit 2f3ad3b7b77d3a36922e42d3d3e5b1ad08a2a074
Author: baxley <baxley@chromium.org>
Date: Fri May 05 19:06:19 2017

Remove some iOS 9 CQ configurations to reduce load.

During peak load, maximum capacity is reached. Remove redundant
iOS 9 configurations from the CQ and main waterfall. This still
runs tablet, iPhone, 32-bit, and 64-bit on the CQ, it just doesn't
run every combination.

BUG= 718524 

Review-Url: https://codereview.chromium.org/2855423004
Cr-Commit-Position: refs/heads/master@{#469744}

[modify] https://crrev.com/2f3ad3b7b77d3a36922e42d3d3e5b1ad08a2a074/ios/build/bots/chromium.mac/ios-simulator.json

Components: -Infra>CQ Infra>Client>iOS
Status: (was: Available)
Status: Available
Pending queues are much more sane today, but the utilization is still in the 80%s, which is still too high. Looks like we've been quite a bit over capacity.

Did anyone look what happened around Apr 17? I think we should still investigate what caused the usage to spike.
How can we go about investigating the 4/17 change?  Can you point me in the right direction to find graphs that would show:
- The number of jobs run per task per day
- The total running time of each type of task
I'd start with the graph in #1: https://goto.google.com/wzhmy

It links to the console for the largest pool, which in turn lists recent jobs (also with links to the corresponding consoles). I don't think we have runtimes in this dashboard, but we do report them as /chrome/infra/jobs/durations metric. E.g.: http://shortn/_DxnFF3Rajv
I'm looking through logs from recent runs to see if anything has started to take longer.
We landed the change to re-run XCTests on failure on April 12: https://codereview.chromium.org/2814453007/. It was a few days after the fact. April 17 was Monday, so if we're sure that is when the problem started, it likely landed between April 13 and 17.

What could be going wrong?
- tests are taking longer
- more CQ jobs are running on ios-simulator
- capacity is reduced
I looked at Sergey's second link. There was a definite jump in runtime for net_unittests. From about 650 to 1000. This jump started at about 2017-04-13 at about 12:00.
Cc: thestig@chromium.org
+cc thestig for https://codereview.chromium.org/2816893003

I diffed the bot output and it looks like 600 SpdyFramerTests went from ~2ms to ~200ms.   I haven't looked at the tests themselves to see why that might be.

I'll also try looking at a Mac bot, to see if there was any jump in test times there.
Will take a quick look and see if I can eyeball this one.
I locally reverted the change from char[] to std::vector for header_buffer_ and that seems to be it.
Uploaded https://codereview.chromium.org/2861393005 - does someone want to try that on iOS and see if it helps? I currently don't have a Mac.
One thing to add regarding capacity. xctest retries was added earlier in the week (on April 12) that we saw the regression, so this could have had an effect.

Based on analysis from rohitrao@, the CL in comment 15 looks to have a large positive impact on our tests.
Project Member

Comment 18 by bugdroid1@chromium.org, May 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4b3c5489ed733182391be9f8cfb9cae5af1f1b3f

commit 4b3c5489ed733182391be9f8cfb9cae5af1f1b3f
Author: thestig <thestig@chromium.org>
Date: Wed May 10 05:04:40 2017

Partial revert of a spdy_framer_test.cc clean up.

This reverts part of r464481. While std::vectors are easier to manage
than C arrays, they can also be much slower in debug mode. When used
intensively in test-only code, speed becomes more important than
manageability.

BUG= 718524 

Review-Url: https://codereview.chromium.org/2861393005
Cr-Commit-Position: refs/heads/master@{#470466}

[modify] https://crrev.com/4b3c5489ed733182391be9f8cfb9cae5af1f1b3f/net/spdy/core/spdy_framer_test.cc

Blocking: 722583
Components: Infra>Client>Chrome
Components: -Infra>Client>iOS
Moved all Infra>Client>iOS bugs to Infra>Client>Chrome + OS-iOS.
Status: Fixed (was: Available)
I believe this should be fixed by #c18 - closing. Please reopen if it's still an issue.

Sign in to add a comment