New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 628701 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: Aug 2016
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Provide a metric to track test job completion rates in the lab

Reported by jrbarnette@chromium.org, Jul 15 2016

Issue description

We need a metric that will allow us to count how many jobs
complete in the lab, and what percentage of them are being
aborted.

 
https://chromium-review.googlesource.com/#/c/360227/

The metric counts HQE completions, distinguished by the
termination status.  HQE -> "Host Queue Entry", the
database table that tracks work assigned to specific DUTs.
Jobs that abort are recorded with an associated HQE that
has a final status of "Aborted".

The metric can reliably indicate the abort rate, and can
distinguish job aborts by the master/shard that was doing
the work.  So, we'll be able to know reliably whether a
given server is heavily impacted.

It should be noted that the metric has some limitations:
  * It doesn't directly distinguish the reason for abort.
    However, in some cases, the reason can be inferred.
  * The metric can report the board and pool for jobs, but
    for most abort events, that information isn't available.
  * There is no information distinguishing suite jobs from
    other jobs, although some information could be added.

Project Member

Comment 2 by bugdroid1@chromium.org, Jul 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/a31ab77bb3c746833d1ea5a44b9ece1dffe68ca9

commit a31ab77bb3c746833d1ea5a44b9ece1dffe68ca9
Author: Richard Barnette <jrbarnette@chromium.org>
Date: Thu Jul 14 00:50:17 2016

[autotest] Report HQE completions with a metric.

This adds adds a counter metric to report HQE completions.
This only captures completions reported through set_status();
transitions directly to STOPPED state via Job.stop_if_necessary()
are not captured.

BUG= chromium:628701 
TEST=run push_to_prod suite on a local instance

Change-Id: I2585dd47655b7e8eaa7db6f383de8d21db6630d9
Reviewed-on: https://chromium-review.googlesource.com/360227
Commit-Ready: Richard Barnette <jrbarnette@chromium.org>
Tested-by: Richard Barnette <jrbarnette@chromium.org>
Reviewed-by: Paul Hobbs <phobbs@google.com>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/a31ab77bb3c746833d1ea5a44b9ece1dffe68ca9/scheduler/scheduler_models.py

The metric is in, and generating data:
    http://shortn/_Yl9OUMz638

Status: Fixed (was: Started)
Labels: VerifyIn-54
Status: Verified (was: Fixed)
bulk verified

Sign in to add a comment