Need a mechanism to identify a swarming task within a data point that best represents that task's pass/failure rate, or surfaces test failures so developers have something to reference when debugging.
The current approach is to surface the last-run swarming task of each data point, which works for the majority of cases, but in some edge cases this may not be sufficient.
For example, a task can fail consistently due to a misconfigured bot, then a subsequent task is run against a different but correctly configured bot. https://bugs.chromium.org/p/pdfium/issues/detail?id=1151 for reference as a historical example.
3 possible approaches:
1. Maintain a field within each data point that keeps track of a representative swarming task that has at least some failures as they come in.
2. When surfacing a representative task, check each task's output and surface one that has failures.
3. When generating data points, track/store task metadata rather than aggregating them and throwing the task results themselves away.
Method 1 is the easiest to implement, however introduces another field that serves only a singular purpose that may clutter the data model.
Method 2 requires querying the swarming server and recomputing each task's pass/fail count, which was already done but thrown away at runtime. Swarming could be unreachable or down so would be susceptible to network errors.
Method 3 is perhaps the most reliable, however seems like overkill unless other metadata about each task can prove to be useful