New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 884375 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

[Findit] Flake Analyzer - Robust surfacing of representative swarming task

Project Member Reported by lijeffrey@chromium.org, Sep 14

Issue description

Need a mechanism to identify a swarming task within a data point that best represents that task's pass/failure rate, or surfaces test failures so developers have something to reference when debugging.

The current approach is to surface the last-run swarming task of each data point, which works for the majority of cases, but in some edge cases this may not be sufficient.

For example, a task can fail consistently due to a misconfigured bot, then a subsequent task is run against a different but correctly configured bot. https://bugs.chromium.org/p/pdfium/issues/detail?id=1151 for reference as a historical example.

3 possible approaches:
1. Maintain a field within each data point that keeps track of a representative swarming task that has at least some failures as they come in.
2. When surfacing a representative task, check each task's output and surface one that has failures.
3. When generating data points, track/store task metadata rather than aggregating them and throwing the task results themselves away.

Method 1 is the easiest to implement, however introduces another field that serves only a singular purpose that may clutter the data model.

Method 2 requires querying the swarming server and recomputing each task's pass/fail count, which was already done but thrown away at runtime. Swarming could be unreachable or down so would be susceptible to network errors.

Method 3 is perhaps the most reliable, however seems like overkill unless other metadata about each task can prove to be useful
 

Sign in to add a comment