New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 822078 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 21
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Add a "is_regression" field to the JSON test results format

Project Member Reported by robertma@chromium.org, Mar 15 2018

Issue description

Background: this is the output format used by Blink layout tests (and perhaps others) https://chromium.googlesource.com/chromium/src/+/master/docs/testing/json_test_results_format.md

There are many kinds of results for a test (especially for a layout test). It's not immediately clear what are "failures" in the common sense (the fact that we allow, or expect, some failures further complicates the situation).

For layout tests at least, we usually care about "regressions" only, in other words "unexpected failures". Although the result has a field called is_unexpected, it's just a necessary but not sufficient condition for "unexpected failures". The full conditions are repeated in a few different places, e.g.:

* in webkitpy: https://cs.chromium.org/chromium/src/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/models/test_run_results.py?l=235&rcl=ec24a4c2485eee13e1f66f59f67361fd7ba168bd
* in results.html: https://cs.chromium.org/chromium/src/third_party/WebKit/LayoutTests/fast/harness/results.html?l=939&rcl=1d6a6dfd7130802e20ceee6bc323ed270ab213fb
* in recipe modules: https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/test_utils/util.py?l=116&rcl=546a13ec5959964fda7750bac483bc5efd891f85

(Ignore this I was wrong:) <del> They are all slightly different, but considering when is_unexpected is true, I believe the most concise boolean expression for "regression" is: is_unexpected && !(PASS || SLOW) </del>


For better code health, we can add a new field to the JSON format called is_regression to make the decision in one central place. Since test_run_results.py is already calculating the number of regressions, it makes a lot of sense to just set that field for each test there.
 
Brandon, you asked me about this earlier.

Could you use the condition `is_unexpected && !(PASS || SLOW)` for now in FindIt? So that this code health refactor doesn't block your new feature.

Dirk, WDYT? Am I missing anything? (Also I recall you mentioned about the intention to use this JSON format more broadly; this might be something worth to be improved before broader uses.)

Comment 2 by wylieb@chromium.org, Mar 15 2018

For Findit, we need to establish a PASS or a FAIL for each test run. For many webkit layout tests, the results can't be easily bucketed into just pass/fail (at least from what I can tell).

See this (takes about 4 seconds to run): https://bigquery.cloud.google.com/savedquery/407861147276:5a57da8619fe4b17ab11b90cba1a5780

This is test runs compared to the expected. One example is {IMAGE, TIMEOUT, IMAGE, PASS} and the expected is {PASS}. Would your boolean expression consider this 3 fails followed by a pass?

(I can't run that BigQuery; permission denied. In fact, I've never looked at ChOps data, so I'm not sure if there's any setup I missed.)

So yeah run-webkit-tests may retry the failures a few times and the "actual results" would look like that. Let me try harder with the boolean expressions...

Based on test_utils in recipe modules (which is the code finally responsible for deciding the state of the step) with some simplifications (see below for details):

actual_results := actual.split(' ')
unexpected_failures (a.k.a. regressions; shown as red FAILURE) := is_unexpected && actual_results[-1] != PASS
unexpected_flakes (shown as yellow WARNING) := is_unexpected && len(actual_results > 1) && actual_results[-1] == PASS

Notes:
1. The major takeaway is that actual_results is a list (not a set) of retry results, and we stop retrying once we have a PASS, so it suffices to check the last result.
2. test_utils/util.py checks if a result is passing by testing if it is among a set of "passing statuses" (https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/test_utils/util.py?type=cs&sq=package:chromium&l=70), which isn't really necessary because the other passing statuses are only used in expectations ("expected" results). Even the SLOW type I mentioned earlier won't appear in actual results.

Cc: chanli@chromium.org nednguyen@chromium.org jbudorick@chromium.org seanmccullough@chromium.org martiniss@chromium.org st...@chromium.org
I think it depends on what you mean by "care" and "common sense", i.e., how you'll actually use this :).

Tests can pass or fail, and test results can be expected (one of the expected types) or not. If you want to ignore unexpected passes, then `is_unexpected and not actual.endswith('PASS')`, then use the logic you see in util.py is what you want, though it's probably a bit more conservative than you really need these days (it checks for codes you won't see).

I would be fine w/ adding this to the format. 

+ a few others for additional thoughts / comments.
I feel a bit worried about adding extra fields that are derived from existing fields. One important part of this format is it should be minimal to save space. 

For code sharing, can we make s.t like a JSON test results parser library that takes a json test results dict & return a python objects with many utility fields? 
Re c#4:

> I think it depends on what you mean by "care" and "common sense", i.e., how you'll actually use this :).

Right, I should be more specific; in fact, I myself care about non-regression type of failures/flakes as they represent the health of the suite. When an engineer can't submit a CL because layout tests fail (perhaps irrelevantly), they care about the failures that make the bot red, which is hence also what tools like FindIt are interested in.

> though it's probably a bit more conservative than you really need these days (it checks for codes you won't see).

I went through webkitpy and am pretty sure those extra statuses won't appear in actual results, so quite a few places were written  conservatively (or defensively). Also for the purpose of healthier code, we can change the document to differentiate actual vs. expected result types to clarify that some types won't be in actual results, and then simplify the code accordingly. I'll put it on my code-health TODO list :)

Re c#5:

That's a fair concern, especially now that I know we don't need to check against many different pass statuses and the boolean expression isn't overly complicated. Writing a Python util lib sounds great, but we'd need to make it minimal and hopefully self-contained. Otherwise, it'd be quite messy to use the lib in webkitpy, recipes, and other infra Python codebases...
This format is far from minimal now, and (apart from the trie structure) has never been meant to be compact.

In addition, I prefer having clearer formats over relying on common libraries for this, because we'd have to make sure we have a library in every language (Go, Python, JS at least) and that users only use those libraries to parse things.

The `is_unexpected` field is specced to be optional (missing=False). We can make `is_regression` be the same, in which case the overhead will only be O(failing tests) and not O(all tests), which shouldn't be too bad.

We should also update the spec to change the definition of `is_unexpected`, because it's currently quite misleading, i.e., it's not necessarily a regression.
We should move all of the result-handling code from webkitpy to typ at some point Real Soon Now, which would at least consolidate the python code, but that doesn't help us with the Go and JS parsing.
Components: Infra>Client>Chrome
Labels: Build-Tools-TYP
Dirk: the fact that there are multiple languages involve making the library approach cumbersome is a fair point.

Making these computed utilities fields optional SGTM. I worried about these type of use cases being unbounded & blow up the complexity of the format, but I guess we can wait to see how it evolves.
Description: Show this description
Components: -Infra>Client>Chrome Build
Owner: dpranke@chromium.org
Status: Assigned (was: Available)
Status: Started (was: Assigned)
Project Member

Comment 14 by bugdroid1@chromium.org, Jun 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b5f530d4fbf2083d511467906156a4891c38cf5c

commit b5f530d4fbf2083d511467906156a4891c38cf5c
Author: Dirk Pranke <dpranke@chromium.org>
Date: Tue Jun 12 22:51:32 2018

Update json_test_results_format docs.

This CL adds the `is_regression` per-test field (though it won't
yet be set by anything), and adds a bunch of text clarifying the
differences between failures, unexpected results, and regressions.

It also does some general editorial cleanup, fixing typos and
re-sorting some of the enumerated values.

BUG= 822078 ,  837047 

Change-Id: If78f6ef66c01d7c2924bfd98589832fe9299968f
Reviewed-on: https://chromium-review.googlesource.com/1096486
Commit-Queue: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: Robert Ma <robertma@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Reviewed-by: Chan Li <chanli@chromium.org>
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Cr-Commit-Position: refs/heads/master@{#566615}
[modify] https://crrev.com/b5f530d4fbf2083d511467906156a4891c38cf5c/docs/testing/json_test_results_format.md

Labels: -Pri-3 Pri-1
Project Member

Comment 16 by bugdroid1@chromium.org, Jun 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/53235652f36d355532c65674a91968cedc197e56

commit 53235652f36d355532c65674a91968cedc197e56
Author: Dirk Pranke <dpranke@chromium.org>
Date: Fri Jun 22 00:17:36 2018

Update json test results for webkit_layout_tests.

This brings the test results more into compliance with the
current test result standard in bit.ly/chromium-json-test-results-format.

Notable changes:
- If a test is run multiple times, we explicitly return every
  `actual` result. Previously, if the test produced the same result every
  time, we'd only return a single value for `actual`
- If a test is skipped unexpectedly, that will be considered a regression
  and an unexpected result.
- The test results will contain `is_unexpected`, `is_flaky`, and
  `is_regression` fields for the matching conditions.

Bug:  837047 ,  822078 
Change-Id: I4896e61469d3b576ea9e7dbbe16fac709f74b6b9
Reviewed-on: https://chromium-review.googlesource.com/1103611
Commit-Queue: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: Robert Ma <robertma@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
Cr-Commit-Position: refs/heads/master@{#569466}
[modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/models/test_expectations.py
[modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/models/test_expectations_unittest.py
[modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/models/test_run_results.py
[modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/models/test_run_results_unittest.py
[modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/run_webkit_tests_unittest.py

Project Member

Comment 17 by bugdroid1@chromium.org, Jun 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/687768e979b009ac2bb90ebca5d4d1af6777b0d6

commit 687768e979b009ac2bb90ebca5d4d1af6777b0d6
Author: Dirk Pranke <dpranke@chromium.org>
Date: Fri Jun 22 18:27:07 2018

Revert "Update json test results for webkit_layout_tests."

This reverts commit 53235652f36d355532c65674a91968cedc197e56.

Reason for revert: Build is no longer reporting failures correctly.

Original change's description:
> Update json test results for webkit_layout_tests.
> 
> This brings the test results more into compliance with the
> current test result standard in bit.ly/chromium-json-test-results-format.
> 
> Notable changes:
> - If a test is run multiple times, we explicitly return every
>   `actual` result. Previously, if the test produced the same result every
>   time, we'd only return a single value for `actual`
> - If a test is skipped unexpectedly, that will be considered a regression
>   and an unexpected result.
> - The test results will contain `is_unexpected`, `is_flaky`, and
>   `is_regression` fields for the matching conditions.
> 
> Bug:  837047 ,  822078 
> Change-Id: I4896e61469d3b576ea9e7dbbe16fac709f74b6b9
> Reviewed-on: https://chromium-review.googlesource.com/1103611
> Commit-Queue: Dirk Pranke <dpranke@chromium.org>
> Reviewed-by: Robert Ma <robertma@chromium.org>
> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#569466}

TBR=qyearsley@chromium.org,dpranke@chromium.org,seanmccullough@chromium.org,robertma@chromium.org

Change-Id: Icf1882e8eea328b115a458afa6378b35bb11a638
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  837047 ,  822078 
Reviewed-on: https://chromium-review.googlesource.com/1112178
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Commit-Queue: Dirk Pranke <dpranke@chromium.org>
Cr-Commit-Position: refs/heads/master@{#569703}
[modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/models/test_expectations.py
[modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/models/test_expectations_unittest.py
[modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/models/test_run_results.py
[modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/models/test_run_results_unittest.py
[modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/run_webkit_tests_unittest.py

Project Member

Comment 18 by bugdroid1@chromium.org, Jul 31

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3

commit 6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3
Author: Dirk Pranke <dpranke@chromium.org>
Date: Tue Jul 31 20:43:13 2018

Re-land "Update json test results for webkit_layout_tests."

This re-lands r569466 (reverting 569703) with the needed fixes.

This brings the test results more into compliance with the
current test result standard in bit.ly/chromium-json-test-results-format.

Notable changes:
 - If a test is run multiple times, we explicitly return every
   `actual` result. Previously, if the test produced the same result every
   time, we'd only return a single value for `actual`
 - If a test is skipped unexpectedly, that will be considered a regression
   and an unexpected result.
 - The test results will contain `is_unexpected`, `is_flaky`, and
   `is_regression` fields for the matching conditions.

Bug:  837047 ,  822078 
Change-Id: I98a80deb2e7e095fe471507a04e80b8b583ed5cc
Reviewed-on: https://chromium-review.googlesource.com/1154107
Commit-Queue: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Reviewed-by: Robert Ma <robertma@chromium.org>
Cr-Commit-Position: refs/heads/master@{#579556}
[modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/models/test_expectations.py
[modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/models/test_expectations_unittest.py
[modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/models/test_run_results.py
[modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/models/test_run_results_unittest.py
[modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/run_webkit_tests_unittest.py

Cc: -wylieb@chromium.org
Status: Fixed (was: Started)
I think this is fixed now.

Sign in to add a comment