New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 593084 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: Mar 2016
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 0
Type: Bug

Blocked on:
issue 593104



Sign in to add a comment

Lab load issue caused by too many suite jobs

Project Member Reported by dshi@chromium.org, Mar 8 2016

Issue description

We experienced a system wide outage between 1AM-4AM due to heavy load across the lab. The root cause is that we have a group of weekly task scheduled for last night, and it created 1096 suite jobs to be run within 4 hours. For following 5 suites running on suites pool:
control.experimental
control.kernel_per-build_benchmarks
control.kernel_per-build_regression
control.network3g_pseudomodem
control.network_ui
control.regression

The lab (mostly devservers) was overloaded between 1AM to 4AM, and led to many job failures.
The devserver load can be tracked in this dashboard:
http://104.154.79.237/grafana/#/dashboard/db/autotest-devserver-load

We are working on several approaches to fix this issue:
1. Scheduler weekly suite jobs more evenly. (CL 331441)
2. Add more devserver b/27047069

 
Project Member

Comment 1 by bugdroid1@chromium.org, Mar 8 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/f1774ee0ca65fb7a1782230611fb48912bc98767

commit f1774ee0ca65fb7a1782230611fb48912bc98767
Author: Dan Shi <dshi@google.com>
Date: Tue Mar 08 18:35:38 2016

[autotest] Disctribute weekly runs across the week.

This change distributes all weekly runs for pool:suites to each day
across the week.

Also, change MAX_DELAY_MINUTES from 4 hours to 24 hours. This allows the
test jobs for weekly suites to be distributed more evenly.

BUG= chromium:593084 
TEST=suite_scheduler --sanity

Change-Id: Id96368ab0eb3ec797de6667202a98d3eb6391573
Reviewed-on: https://chromium-review.googlesource.com/331441
Commit-Queue: Dan Shi <dshi@google.com>
Tested-by: Dan Shi <dshi@google.com>
Reviewed-by: Shuqian Zhao <shuqianz@chromium.org>

[modify] https://crrev.com/f1774ee0ca65fb7a1782230611fb48912bc98767/site_utils/suite_scheduler/deduping_scheduler.py
[modify] https://crrev.com/f1774ee0ca65fb7a1782230611fb48912bc98767/suite_scheduler.ini

Comment 2 by dshi@chromium.org, Mar 8 2016

Blockedon: 593104

Comment 3 by dshi@chromium.org, Mar 11 2016

Status: Fixed (was: Assigned)

Comment 4 by benhenry@google.com, Apr 27 2016

Components: Infra>Client>ChromeOS
Labels: -Infra-ChromeOS
Status: Verified (was: Fixed)
Closing. please reopen if its not fixed.

Sign in to add a comment