net_unittests failing on multiple builders |
|||||
Issue descriptionFiled by sheriff-o-matic@appspot.gserviceaccount.com on behalf of rogerm@chromium.org net_unittests failing on multiple builders Builders failed on: - linux-chromeos-dbg: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/linux-chromeos-dbg
,
Jun 12 2018
net_unittests alone are indeed taking >1h to run on swarming, and therefore timeout (by design). Example build: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/linux-chromeos-dbg/6254 Task: https://chromium-swarm.appspot.com/task?id=3e0d609706986410 (See the purple "net_unittests" step, link "shard #0 timed out, took too much time to complete"). This usually happens when something bad gets committed causing the test to run very long. As such, I don't think it's an infra failure.
,
Jun 12 2018
,
Jun 12 2018
That was of course my first thought, but the output logs do not suggest that. Here is a spreadsheet that I put together showing that the total, average, and max times of the tests run were similar: https://docs.google.com/spreadsheets/d/183U1bxbIOgkhrGxpSp7gWKNZ8KDE4qce8ZM5VuWNqcY/edit#gid=0
,
Jun 12 2018
Re #c4: Were the runtimes taken directly from the log? Are we sure any overhead between the tests is accounted for? Two points: 1) 26 min is still a long time for a swarming shard; CQ will be much happier if it was closer to 10-15 min tops; 2) Swarming task has 2 timeouts: one is total runtime (1h), another is 20m without any logs. If the tests indeed finished in 26 min, this would mean the task was stuck for the remaining 34 min (apparently without output, since there is basically nothing interesting printed after the tests). This in turn suggests that it should've been killed sooner - 46 min into the task. Since it timed out at 1h, it's more likely that the tests actually took longer to run, even if they reported the same runtimes in the logs. This is why I'm wondering about the test overhead that the logs might not be reporting.
,
Jun 12 2018
It looks like we are having the same problem on non CrOS linux builders also: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20Tests%20%28dbg%29%281%29%2832%29/50605 I agree that it seems likely that the test is taking longer to run than the sum of the output times, which is why I suspect something with the builder, but the fact that it is failing on linux also is suspicious. I'll try to look at the chrome changes in the respective initial failures.
,
Jun 12 2018
Looking more closely at the linux builder, net_unittests have been timing out very frequently, just more frequently recently.
,
Jun 12 2018
Sampling a few tasks over the course of the month, you can see how the total amount of test cases being run have grown: https://chromium-swarm.appspot.com/task?id=3db52ebf4c9eb010 37k https://chromium-swarm.appspot.com/task?id=3e0cf0b7f7263410 40k Prob just natural growth. We really shouldn't be running such a large test in a single shard. The least we can do is double shard it, which'll cut its runtime down to half.
,
Jun 12 2018
That sounds reasonable to me. As I mentioned, it's been flakey on linux(dbg) for a very long time, we should do the same there.
,
Jun 12 2018
stevenjb: would you take the bug then to increase sharding, or triage it to someone who owns the builder or the test? If there is anything else a trooper can help here, please let us know, otherwise let's remove Infra-Troopers label and component:Infra. Thanks!
,
Jun 12 2018
We own the testing specs (https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/OWNERS), so I'll handle it.
,
Jun 12 2018
Thanks Ben! I have no idea how to increase sharding :)
,
Jun 13 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8bf205b836e9e218987cbd9aa9f5f91ea37bab24 commit 8bf205b836e9e218987cbd9aa9f5f91ea37bab24 Author: Ben Pastene <bpastene@chromium.org> Date: Wed Jun 13 04:00:33 2018 Double shard dbg builds of net_unittests on linux(-chromos) testers. Bug: 851996 Change-Id: Id1b01007c65fe417dc9f3beac1aac158e5c97bd0 Reviewed-on: https://chromium-review.googlesource.com/1097701 Reviewed-by: John Budorick <jbudorick@chromium.org> Reviewed-by: Dirk Pranke <dpranke@chromium.org> Cr-Commit-Position: refs/heads/master@{#566712} [modify] https://crrev.com/8bf205b836e9e218987cbd9aa9f5f91ea37bab24/testing/buildbot/chromium.chromiumos.json [modify] https://crrev.com/8bf205b836e9e218987cbd9aa9f5f91ea37bab24/testing/buildbot/chromium.linux.json [modify] https://crrev.com/8bf205b836e9e218987cbd9aa9f5f91ea37bab24/testing/buildbot/test_suite_exceptions.pyl
,
Jun 13 2018
linux-chromeos-dbg and Linux Tests (dbg) both appear happier since #13. Anything else to do here?
,
Jun 13 2018
Looks great to me, thanks! Overall test times are better also!
,
Jun 14 2018
Closing for now. Feel free to reopen if I missed a bot |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by steve...@chromium.org
, Jun 12 2018Components: Infra
Labels: -Pri-2 OS-Chrome Pri-1
Status: Untriaged (was: Available)