Host scheduler is running extremely slow |
|||
Issue description
We are observing timeouts across many canary suite jobs. The root cause is related to host scheduler running slow. Take chromeos4-row4-rack6-host18 for example:
./site_utils/host_history.py -l 10 -v --hosts chromeos4-row4-rack6-host18
2016-03-17 05:58:42 : 2016-03-17 05:59:12 Running 56966194 29s
2016-03-17 05:59:12 : 2016-03-17 06:03:40 Ready 56966194 267s
2016-03-17 06:03:40 : 2016-03-17 06:04:52 Resetting 52335608 72s
2016-03-17 06:04:52 : 2016-03-17 06:05:32 Pending 56965990 39s
2016-03-17 06:05:32 : 2016-03-17 06:07:15 Running 56965990 103s
2016-03-17 06:07:15 : 2016-03-17 06:07:58 Running 56965990 42s
2016-03-17 06:07:58 : 2016-03-17 06:14:12 Ready 56965990 374s
2016-03-17 06:14:12 : 2016-03-17 06:18:11 Resetting 52336032 238s
2016-03-17 06:18:11 : 2016-03-17 06:21:30 Pending 56966052 199s
2016-03-17 06:21:30 : 2016-03-17 06:24:39 Running 56966052 188s
2016-03-17 06:24:39 : 2016-03-17 06:27:54 Running 56966052 195s
2016-03-17 06:27:54 : 2016-03-17 06:55:00 Ready 56966052 1625s
2016-03-17 06:55:00 : 2016-03-17 07:16:05 Provisioning 52337865 1264s
2016-03-17 07:16:05 : 2016-03-17 07:16:43 Pending 56985279 38s
2016-03-17 07:16:43 : 2016-03-17 07:24:39 Running 56985279 475s
2016-03-17 07:24:39 : 2016-03-17 07:25:23 Running 56985279 43s
2016-03-17 07:25:23 : 2016-03-17 07:27:21 Ready 56985279 118s
2016-03-17 07:27:21 : 2016-03-17 07:28:50 Resetting 52339250 88s
2016-03-17 07:28:50 : 2016-03-17 07:33:22 Pending 56985561 272s
2016-03-17 07:33:22 : 2016-03-17 08:02:32 Running 56985561 1750s
2016-03-17 08:02:32 : 2016-03-17 08:03:16 Running 56985561 43s
2016-03-17 08:03:16 : 2016-03-17 08:05:52 Ready 56985561 155s
2016-03-17 08:05:52 : 2016-03-17 08:12:29 Resetting 52340559 397s
2016-03-17 08:12:29 : 2016-03-17 08:13:16 Pending 56985311 46s
./site_utils/host_history.py -l 10 -v --hosts chromeos4-row4-rack6-host18 | grep Ready
2016-03-17 03:43:21 : 2016-03-17 03:45:24 Ready 56965869 123s
2016-03-17 03:51:50 : 2016-03-17 03:53:46 Ready 56965916 116s
2016-03-17 04:16:15 : 2016-03-17 04:18:27 Ready 56968163 131s
2016-03-17 04:26:43 : 2016-03-17 04:30:51 Ready 56965975 247s
2016-03-17 04:38:05 : 2016-03-17 04:39:39 Ready 56966037 94s
2016-03-17 04:46:51 : 2016-03-17 04:49:01 Ready 56966099 130s
2016-03-17 04:58:48 : 2016-03-17 05:00:35 Ready 56965930 107s
2016-03-17 05:14:35 : 2016-03-17 05:18:55 Ready 56966080 259s
2016-03-17 05:42:19 : 2016-03-17 05:44:38 Ready 56966175 139s
2016-03-17 05:59:12 : 2016-03-17 06:03:40 Ready 56966194 267s
2016-03-17 06:07:58 : 2016-03-17 06:18:11 Ready 56965990 613s
2016-03-17 06:27:54 : 2016-03-17 07:16:05 Ready 56966052 2890s (this is likely in between suite jobs)
2016-03-17 07:25:23 : 2016-03-17 07:27:21 Ready 56985279 118s
2016-03-17 08:03:16 : 2016-03-17 08:05:52 Ready 56985561 155s
2016-03-17 08:15:54 : 2016-03-17 08:20:45 Ready 56985311 291s
2016-03-17 08:27:40 : 2016-03-17 08:29:42 Ready 56985321 121s
Note that there are around 2-4 minutes of gap between each jobs. To make it even worse, there is also a gap between reset and the test job.
Strange thing is that host scheduler tick is always below 1 min:
http://104.154.79.237/grafana/#/dashboard/db/autotest-deputy-view?panelId=4&fullscreen
,
Mar 19 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/132297632ee94d99a445a2ca57825609fec03f8f commit 132297632ee94d99a445a2ca57825609fec03f8f Author: Dan Shi <dshi@google.com> Date: Fri Mar 18 20:11:20 2016 Increase timeout for bvt-inline suite and priority for au suite This is a temporary fix before the lab performance issue is fixed ( crbug.com/595833 ). BUG= chromium:595833 TEST=None Change-Id: I5765583149c2c0791f613481f2dc766cbb0f2aae Reviewed-on: https://chromium-review.googlesource.com/333964 Commit-Ready: Dan Shi <dshi@google.com> Tested-by: Dan Shi <dshi@google.com> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/132297632ee94d99a445a2ca57825609fec03f8f/cbuildbot/config_dump.json [modify] https://crrev.com/132297632ee94d99a445a2ca57825609fec03f8f/cbuildbot/chromeos_config.py
,
Apr 26 2016
,
Aug 15 2016
shards have resolved this issue. |
|||
►
Sign in to add a comment |
|||
Comment 1 Deleted