New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 760652 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocked on:
issue 767171
issue 767211
issue 774597
issue 774627

Blocking:
issue 751788



Sign in to add a comment

Provision throttling caused a large spike in provision failures

Reported by jrbarnette@chromium.org, Aug 30 2017

Issue description

When provision payload copy throttling rolled out to the lab, there was
a dramatic spike in the provision failure rate:
    https://viceroy.corp.google.com/chromeos/provision?duration=6h&utc_end=1504058400#_VG_huYBJmlb

Looking at the throttling workqueue logs, the cause seems
to have been principally a large number of aborts in copy
requests, presumably caused by long queue wait times.

The exact cause needs to get sorted out, and fixed, so that
we can try (again) to deploy the feature.

 
Blockedon: 767171
Blockedon: 767211
Some issues were discovered by perusing the logs at the time
of the failure; see the two blocking bugs.  However, not all
anomalies in the logs were satisfactorily explained.  Unfortunately,
the logs that would show the history got wiped out because of bug
774597.  So, future problem solving will be dependent on reproducing
the failures again.

Current strategy for moving this forward is this:
  * Early debug indicated that devservers with only 2x1000 ethernet
    interfaces are too slow to be useful, so all such servers need
    to be upgraded.
  * Fix the other known bugs.
  * Implement (somehow) the ability to selectively enable the
    throttling feature on some devservers, but not all of them.
  * Enable throttling on selected servers, and watch for failures.
  * Debug and fix the failures as necessary.
  * If the system fails to reproduce the problem, gradually increase
    the number of enabled devservers, until the problem is reproducible,
    or all servers are throttling without failures.

Blockedon: 774597
Blockedon: 774627
Cc: johndhong@chromium.org
Labels: -Pri-1 Pri-2
Status: WontFix (was: Assigned)
We're not doing  bug 751788 , so this is moot.

Sign in to add a comment