webrtc.peerconnection.reference fails because of not enough capacity |
||||||
Issue descriptionExample failure: https://build.chromium.org/p/chromium.perf/builders/Linux%20Perf/builds/264 webrtc.peerconnection.reference on (102b) GPU on Linux on Ubuntu-14.04 Bot id: 'build150-m1' Run on OS: 'Ubuntu-14.04' ( 0 secs ) stdio [stdout] outdir_json [logdog] no_results_exc [logdog] invalid_results_exc [logdog] chartjson_info [logdog] shard #0 expired, not enough capacity
,
Jan 19 2017
Stephen, can you take a look? Is there documentation about what sheriffs should do here?
,
Jan 19 2017
These bugs are hard to diagnose, sadly. Usually the cause of this is that an earlier test started failing or taking longer. In this case, what happened is that the triggered task (https://chromium-swarm.appspot.com/task?id=33cd772be045dc10&refresh=10&show_raw=1) just barely hit the expiration time set for it. It's unclear why this task didn't end up getting run; you have look at the previous tasks to figure out why. I want to make a little script that grabs run times from swarming, to allow us to see what tests get worse over time. I'll bring this up in the speed/infra sync in about an hour. I'm probably the correct owner for this.
,
Jan 20 2017
I did some spelunking today. Actually a lot of spelunking. The root problem here is that the tests on the bot which this test needs to run on are taking too long. This is problematic because we trigger all tests at the beginning of the run, and we give it a timeout of about 6 hours. This is generally sufficient, but sometimes is not enough. This is a case where it isn't enough. You can see our test runtime for this bot has gone up over time. Compare the pending time for the task run directly before it from two different builds. https://chromium-swarm.appspot.com/task?id=3378cd2375e7aa10&refresh=10&show_raw=1 is from about 2 weeks ago. It had a pending time of 5h 14m 4s. https://chromium-swarm.appspot.com/task?id=33d02ad39455a010&refresh=10&show_raw=1 is from today. It had a pending time of 5h 52m 17s. So, the total pending time has gone up by about 45 minutes. If it goes up another 45 minutes, that task will start expiring as well. I'm trying to gather data about times for all tests, so that we can start to track which tests are getting slower over time. There is some viceroy data available, but it is fairly hard to use (http://shortn/_T8OEtp1pFQ is an example for Linux Perf, I can't see any obvious cause of the time increase.) I also made a spreadsheet (https://docs.google.com/spreadsheets/d/1S-bt-2XhmbLlCEtYS9wKJbwhKKzzpQq6gm8WWLGqpA8/edit#gid=1804831563) which contains times for all the tests. Right now it includes disabled benchmarks in its mean calculations, so it's not super useful, but I'm getting more accurate data right now, so hopefully by tomorrow this will be useful. Generally, this problem needs to be solved for swarming. Increasing the test timeout would solve it, but then we would just increase the cycle time for builders, which will cause more problems for us. The real solution is to have monitoring on these test times, alert when they get longer, and work on making the test times quicker and quicker.
,
Feb 13 2017
,
Mar 10 2017
Noticed a few more tests running into this pretty regularly: https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf/builds/446/steps/v8.runtime_stats.top_25.reference%20on%20%28102b%29%20GPU%20on%20Linux https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf/builds/446/steps/webrtc.datachannel%20on%20%28102b%29%20GPU%20on%20Linux https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf/builds/446/steps/webrtc.datachannel.reference%20on%20%28102b%29%20GPU%20on%20Linux https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf/builds/446/steps/v8.todomvc%20on%20%28102b%29%20GPU%20on%20Linux https://uberchromegw.corp.google.com/i/chromium.perf/builders/Linux%20Perf/builds/446/steps/v8.todomvc.reference%20on%20%28102b%29%20GPU%20on%20Linux
,
Mar 24 2017
This is passing after increased timeouts. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by skyos...@chromium.org
, Jan 19 2017