Fix capacity problems with ios-simulator |
|||||||||
Issue descriptionios-simulator is operating at pretty close to max capacity during peak load. The current load may be sustainable, but it will prevent adding EarlGrey tests to the CQ. Some options: - See if anything changed making the tests run slower or more frequently and fix it. - Run less configurations - Add more capacity
,
May 4 2017
,
May 5 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2f3ad3b7b77d3a36922e42d3d3e5b1ad08a2a074 commit 2f3ad3b7b77d3a36922e42d3d3e5b1ad08a2a074 Author: baxley <baxley@chromium.org> Date: Fri May 05 19:06:19 2017 Remove some iOS 9 CQ configurations to reduce load. During peak load, maximum capacity is reached. Remove redundant iOS 9 configurations from the CQ and main waterfall. This still runs tablet, iPhone, 32-bit, and 64-bit on the CQ, it just doesn't run every combination. BUG= 718524 Review-Url: https://codereview.chromium.org/2855423004 Cr-Commit-Position: refs/heads/master@{#469744} [modify] https://crrev.com/2f3ad3b7b77d3a36922e42d3d3e5b1ad08a2a074/ios/build/bots/chromium.mac/ios-simulator.json
,
May 7 2017
,
May 7 2017
,
May 8 2017
Pending queues are much more sane today, but the utilization is still in the 80%s, which is still too high. Looks like we've been quite a bit over capacity. Did anyone look what happened around Apr 17? I think we should still investigate what caused the usage to spike.
,
May 8 2017
How can we go about investigating the 4/17 change? Can you point me in the right direction to find graphs that would show: - The number of jobs run per task per day - The total running time of each type of task
,
May 8 2017
I'd start with the graph in #1: https://goto.google.com/wzhmy It links to the console for the largest pool, which in turn lists recent jobs (also with links to the corresponding consoles). I don't think we have runtimes in this dashboard, but we do report them as /chrome/infra/jobs/durations metric. E.g.: http://shortn/_DxnFF3Rajv
,
May 8 2017
I'm looking through logs from recent runs to see if anything has started to take longer. We landed the change to re-run XCTests on failure on April 12: https://codereview.chromium.org/2814453007/. It was a few days after the fact. April 17 was Monday, so if we're sure that is when the problem started, it likely landed between April 13 and 17. What could be going wrong? - tests are taking longer - more CQ jobs are running on ios-simulator - capacity is reduced
,
May 9 2017
I looked at Sergey's second link. There was a definite jump in runtime for net_unittests. From about 650 to 1000. This jump started at about 2017-04-13 at about 12:00.
,
May 9 2017
You can see this in ios-simulator logs as well, starting at about 12:00 on 2017-04-13: Before: ~250s https://build.chromium.org/p/chromium.mac/builders/ios-simulator/builds/16570 https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.mac%2Fios-simulator%2F16570%2F%2B%2Frecipes%2Fsteps%2Fnet_unittests__iPhone_6s_Plus_iOS_10.0_%2F0%2Fstdout After: ~550s https://build.chromium.org/p/chromium.mac/builders/ios-simulator/builds/16571 https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.mac%2Fios-simulator%2F16571%2F%2B%2Frecipes%2Fsteps%2Fnet_unittests__iPhone_6s_Plus_iOS_10.0_%2F0%2Fstdout
,
May 9 2017
+cc thestig for https://codereview.chromium.org/2816893003 I diffed the bot output and it looks like 600 SpdyFramerTests went from ~2ms to ~200ms. I haven't looked at the tests themselves to see why that might be. I'll also try looking at a Mac bot, to see if there was any jump in test times there.
,
May 9 2017
Will take a quick look and see if I can eyeball this one.
,
May 9 2017
I locally reverted the change from char[] to std::vector for header_buffer_ and that seems to be it.
,
May 9 2017
Uploaded https://codereview.chromium.org/2861393005 - does someone want to try that on iOS and see if it helps? I currently don't have a Mac.
,
May 9 2017
One thing to add regarding capacity. xctest retries was added earlier in the week (on April 12) that we saw the regression, so this could have had an effect. Based on analysis from rohitrao@, the CL in comment 15 looks to have a large positive impact on our tests.
,
May 9 2017
From CQ dry run, net_unittests seem to run faster now: https://chromium-swarm.appspot.com/task?id=360480d16ade7910 - 5m22s https://chromium-swarm.appspot.com/task?id=360480d53ff62410 - 5m19s https://chromium-swarm.appspot.com/task?id=360480d6fcad5c10 - 11m33s https://chromium-swarm.appspot.com/task?id=360480d8d4b15b10 - 5m26s https://chromium-swarm.appspot.com/task?id=360480dab5abe210 - 11m30s https://chromium-swarm.appspot.com/task?id=360480dca5b34d10 - 10m46s https://chromium-swarm.appspot.com/task?id=360480de6c678110 - 4m48s
,
May 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4b3c5489ed733182391be9f8cfb9cae5af1f1b3f commit 4b3c5489ed733182391be9f8cfb9cae5af1f1b3f Author: thestig <thestig@chromium.org> Date: Wed May 10 05:04:40 2017 Partial revert of a spdy_framer_test.cc clean up. This reverts part of r464481. While std::vectors are easier to manage than C arrays, they can also be much slower in debug mode. When used intensively in test-only code, speed becomes more important than manageability. BUG= 718524 Review-Url: https://codereview.chromium.org/2861393005 Cr-Commit-Position: refs/heads/master@{#470466} [modify] https://crrev.com/4b3c5489ed733182391be9f8cfb9cae5af1f1b3f/net/spdy/core/spdy_framer_test.cc
,
May 15 2017
,
Dec 22 2017
,
Dec 22 2017
Moved all Infra>Client>iOS bugs to Infra>Client>Chrome + OS-iOS.
,
Jan 10 2018
I believe this should be fixed by #c18 - closing. Please reopen if it's still an issue. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by sergeybe...@chromium.org
, May 4 2017