New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 595833 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Aug 2016
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Host scheduler is running extremely slow

Project Member Reported by dshi@chromium.org, Mar 17 2016

Issue description

We are observing timeouts across many canary suite jobs. The root cause is related to host scheduler running slow. Take chromeos4-row4-rack6-host18 for example:

./site_utils/host_history.py -l 10  -v --hosts chromeos4-row4-rack6-host18

    2016-03-17 05:58:42  :  2016-03-17 05:59:12 Running         56966194   29s
    2016-03-17 05:59:12  :  2016-03-17 06:03:40 Ready           56966194   267s
    2016-03-17 06:03:40  :  2016-03-17 06:04:52 Resetting       52335608   72s
    2016-03-17 06:04:52  :  2016-03-17 06:05:32 Pending         56965990   39s
    2016-03-17 06:05:32  :  2016-03-17 06:07:15 Running         56965990   103s
    2016-03-17 06:07:15  :  2016-03-17 06:07:58 Running         56965990   42s
    2016-03-17 06:07:58  :  2016-03-17 06:14:12 Ready           56965990   374s
    2016-03-17 06:14:12  :  2016-03-17 06:18:11 Resetting       52336032   238s
    2016-03-17 06:18:11  :  2016-03-17 06:21:30 Pending         56966052   199s
    2016-03-17 06:21:30  :  2016-03-17 06:24:39 Running         56966052   188s
    2016-03-17 06:24:39  :  2016-03-17 06:27:54 Running         56966052   195s
    2016-03-17 06:27:54  :  2016-03-17 06:55:00 Ready           56966052   1625s
    2016-03-17 06:55:00  :  2016-03-17 07:16:05 Provisioning    52337865   1264s
    2016-03-17 07:16:05  :  2016-03-17 07:16:43 Pending         56985279   38s
    2016-03-17 07:16:43  :  2016-03-17 07:24:39 Running         56985279   475s
    2016-03-17 07:24:39  :  2016-03-17 07:25:23 Running         56985279   43s
    2016-03-17 07:25:23  :  2016-03-17 07:27:21 Ready           56985279   118s
    2016-03-17 07:27:21  :  2016-03-17 07:28:50 Resetting       52339250   88s
    2016-03-17 07:28:50  :  2016-03-17 07:33:22 Pending         56985561   272s
    2016-03-17 07:33:22  :  2016-03-17 08:02:32 Running         56985561   1750s
    2016-03-17 08:02:32  :  2016-03-17 08:03:16 Running         56985561   43s
    2016-03-17 08:03:16  :  2016-03-17 08:05:52 Ready           56985561   155s
    2016-03-17 08:05:52  :  2016-03-17 08:12:29 Resetting       52340559   397s
    2016-03-17 08:12:29  :  2016-03-17 08:13:16 Pending         56985311   46s

./site_utils/host_history.py -l 10  -v --hosts chromeos4-row4-rack6-host18 | grep Ready
    2016-03-17 03:43:21  :  2016-03-17 03:45:24 Ready           56965869   123s
    2016-03-17 03:51:50  :  2016-03-17 03:53:46 Ready           56965916   116s
    2016-03-17 04:16:15  :  2016-03-17 04:18:27 Ready           56968163   131s
    2016-03-17 04:26:43  :  2016-03-17 04:30:51 Ready           56965975   247s
    2016-03-17 04:38:05  :  2016-03-17 04:39:39 Ready           56966037   94s
    2016-03-17 04:46:51  :  2016-03-17 04:49:01 Ready           56966099   130s
    2016-03-17 04:58:48  :  2016-03-17 05:00:35 Ready           56965930   107s
    2016-03-17 05:14:35  :  2016-03-17 05:18:55 Ready           56966080   259s
    2016-03-17 05:42:19  :  2016-03-17 05:44:38 Ready           56966175   139s
    2016-03-17 05:59:12  :  2016-03-17 06:03:40 Ready           56966194   267s
    2016-03-17 06:07:58  :  2016-03-17 06:18:11 Ready           56965990   613s
    2016-03-17 06:27:54  :  2016-03-17 07:16:05 Ready           56966052   2890s (this is likely in between suite jobs)
    2016-03-17 07:25:23  :  2016-03-17 07:27:21 Ready           56985279   118s
    2016-03-17 08:03:16  :  2016-03-17 08:05:52 Ready           56985561   155s
    2016-03-17 08:15:54  :  2016-03-17 08:20:45 Ready           56985311   291s
    2016-03-17 08:27:40  :  2016-03-17 08:29:42 Ready           56985321   121s

Note that there are around 2-4 minutes of gap between each jobs. To make it even worse, there is also a gap between reset and the test job. 

Strange thing is that host scheduler tick is always below 1 min:
http://104.154.79.237/grafana/#/dashboard/db/autotest-deputy-view?panelId=4&fullscreen
 

Comment 1 Deleted

Project Member

Comment 2 by bugdroid1@chromium.org, Mar 19 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/132297632ee94d99a445a2ca57825609fec03f8f

commit 132297632ee94d99a445a2ca57825609fec03f8f
Author: Dan Shi <dshi@google.com>
Date: Fri Mar 18 20:11:20 2016

Increase timeout for bvt-inline suite and priority for au suite

This is a temporary fix before the lab performance issue is fixed
( crbug.com/595833 ).

BUG= chromium:595833 
TEST=None

Change-Id: I5765583149c2c0791f613481f2dc766cbb0f2aae
Reviewed-on: https://chromium-review.googlesource.com/333964
Commit-Ready: Dan Shi <dshi@google.com>
Tested-by: Dan Shi <dshi@google.com>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/132297632ee94d99a445a2ca57825609fec03f8f/cbuildbot/config_dump.json
[modify] https://crrev.com/132297632ee94d99a445a2ca57825609fec03f8f/cbuildbot/chromeos_config.py

Comment 3 by benhenry@google.com, Apr 26 2016

Components: Infra>Client>ChromeOS
Labels: -Infra-ChromeOS

Comment 4 by dshi@chromium.org, Aug 15 2016

Status: WontFix (was: Assigned)
shards have resolved this issue.

Sign in to add a comment