Add a "is_regression" field to the JSON test results format |
|||||||||
Issue descriptionBackground: this is the output format used by Blink layout tests (and perhaps others) https://chromium.googlesource.com/chromium/src/+/master/docs/testing/json_test_results_format.md There are many kinds of results for a test (especially for a layout test). It's not immediately clear what are "failures" in the common sense (the fact that we allow, or expect, some failures further complicates the situation). For layout tests at least, we usually care about "regressions" only, in other words "unexpected failures". Although the result has a field called is_unexpected, it's just a necessary but not sufficient condition for "unexpected failures". The full conditions are repeated in a few different places, e.g.: * in webkitpy: https://cs.chromium.org/chromium/src/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/models/test_run_results.py?l=235&rcl=ec24a4c2485eee13e1f66f59f67361fd7ba168bd * in results.html: https://cs.chromium.org/chromium/src/third_party/WebKit/LayoutTests/fast/harness/results.html?l=939&rcl=1d6a6dfd7130802e20ceee6bc323ed270ab213fb * in recipe modules: https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/test_utils/util.py?l=116&rcl=546a13ec5959964fda7750bac483bc5efd891f85 (Ignore this I was wrong:) <del> They are all slightly different, but considering when is_unexpected is true, I believe the most concise boolean expression for "regression" is: is_unexpected && !(PASS || SLOW) </del> For better code health, we can add a new field to the JSON format called is_regression to make the decision in one central place. Since test_run_results.py is already calculating the number of regressions, it makes a lot of sense to just set that field for each test there.
,
Mar 15 2018
For Findit, we need to establish a PASS or a FAIL for each test run. For many webkit layout tests, the results can't be easily bucketed into just pass/fail (at least from what I can tell). See this (takes about 4 seconds to run): https://bigquery.cloud.google.com/savedquery/407861147276:5a57da8619fe4b17ab11b90cba1a5780 This is test runs compared to the expected. One example is {IMAGE, TIMEOUT, IMAGE, PASS} and the expected is {PASS}. Would your boolean expression consider this 3 fails followed by a pass?
,
Mar 15 2018
(I can't run that BigQuery; permission denied. In fact, I've never looked at ChOps data, so I'm not sure if there's any setup I missed.)
So yeah run-webkit-tests may retry the failures a few times and the "actual results" would look like that. Let me try harder with the boolean expressions...
Based on test_utils in recipe modules (which is the code finally responsible for deciding the state of the step) with some simplifications (see below for details):
actual_results := actual.split(' ')
unexpected_failures (a.k.a. regressions; shown as red FAILURE) := is_unexpected && actual_results[-1] != PASS
unexpected_flakes (shown as yellow WARNING) := is_unexpected && len(actual_results > 1) && actual_results[-1] == PASS
Notes:
1. The major takeaway is that actual_results is a list (not a set) of retry results, and we stop retrying once we have a PASS, so it suffices to check the last result.
2. test_utils/util.py checks if a result is passing by testing if it is among a set of "passing statuses" (https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/test_utils/util.py?type=cs&sq=package:chromium&l=70), which isn't really necessary because the other passing statuses are only used in expectations ("expected" results). Even the SLOW type I mentioned earlier won't appear in actual results.
,
Mar 15 2018
I think it depends on what you mean by "care" and "common sense", i.e., how you'll actually use this :).
Tests can pass or fail, and test results can be expected (one of the expected types) or not. If you want to ignore unexpected passes, then `is_unexpected and not actual.endswith('PASS')`, then use the logic you see in util.py is what you want, though it's probably a bit more conservative than you really need these days (it checks for codes you won't see).
I would be fine w/ adding this to the format.
+ a few others for additional thoughts / comments.
,
Mar 15 2018
I feel a bit worried about adding extra fields that are derived from existing fields. One important part of this format is it should be minimal to save space. For code sharing, can we make s.t like a JSON test results parser library that takes a json test results dict & return a python objects with many utility fields?
,
Mar 15 2018
Re c#4: > I think it depends on what you mean by "care" and "common sense", i.e., how you'll actually use this :). Right, I should be more specific; in fact, I myself care about non-regression type of failures/flakes as they represent the health of the suite. When an engineer can't submit a CL because layout tests fail (perhaps irrelevantly), they care about the failures that make the bot red, which is hence also what tools like FindIt are interested in. > though it's probably a bit more conservative than you really need these days (it checks for codes you won't see). I went through webkitpy and am pretty sure those extra statuses won't appear in actual results, so quite a few places were written conservatively (or defensively). Also for the purpose of healthier code, we can change the document to differentiate actual vs. expected result types to clarify that some types won't be in actual results, and then simplify the code accordingly. I'll put it on my code-health TODO list :) Re c#5: That's a fair concern, especially now that I know we don't need to check against many different pass statuses and the boolean expression isn't overly complicated. Writing a Python util lib sounds great, but we'd need to make it minimal and hopefully self-contained. Otherwise, it'd be quite messy to use the lib in webkitpy, recipes, and other infra Python codebases...
,
Mar 15 2018
This format is far from minimal now, and (apart from the trie structure) has never been meant to be compact. In addition, I prefer having clearer formats over relying on common libraries for this, because we'd have to make sure we have a library in every language (Go, Python, JS at least) and that users only use those libraries to parse things. The `is_unexpected` field is specced to be optional (missing=False). We can make `is_regression` be the same, in which case the overhead will only be O(failing tests) and not O(all tests), which shouldn't be too bad. We should also update the spec to change the definition of `is_unexpected`, because it's currently quite misleading, i.e., it's not necessarily a regression.
,
Mar 15 2018
We should move all of the result-handling code from webkitpy to typ at some point Real Soon Now, which would at least consolidate the python code, but that doesn't help us with the Go and JS parsing.
,
Mar 15 2018
,
Mar 15 2018
Dirk: the fact that there are multiple languages involve making the library approach cumbersome is a fair point. Making these computed utilities fields optional SGTM. I worried about these type of use cases being unbounded & blow up the complexity of the format, but I guess we can wait to see how it evolves.
,
Apr 26 2018
,
Apr 28 2018
,
Jun 12 2018
,
Jun 12 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b5f530d4fbf2083d511467906156a4891c38cf5c commit b5f530d4fbf2083d511467906156a4891c38cf5c Author: Dirk Pranke <dpranke@chromium.org> Date: Tue Jun 12 22:51:32 2018 Update json_test_results_format docs. This CL adds the `is_regression` per-test field (though it won't yet be set by anything), and adds a bunch of text clarifying the differences between failures, unexpected results, and regressions. It also does some general editorial cleanup, fixing typos and re-sorting some of the enumerated values. BUG= 822078 , 837047 Change-Id: If78f6ef66c01d7c2924bfd98589832fe9299968f Reviewed-on: https://chromium-review.googlesource.com/1096486 Commit-Queue: Dirk Pranke <dpranke@chromium.org> Reviewed-by: Robert Ma <robertma@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> Reviewed-by: Chan Li <chanli@chromium.org> Reviewed-by: Stephen Martinis <martiniss@chromium.org> Cr-Commit-Position: refs/heads/master@{#566615} [modify] https://crrev.com/b5f530d4fbf2083d511467906156a4891c38cf5c/docs/testing/json_test_results_format.md
,
Jun 16 2018
,
Jun 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/53235652f36d355532c65674a91968cedc197e56 commit 53235652f36d355532c65674a91968cedc197e56 Author: Dirk Pranke <dpranke@chromium.org> Date: Fri Jun 22 00:17:36 2018 Update json test results for webkit_layout_tests. This brings the test results more into compliance with the current test result standard in bit.ly/chromium-json-test-results-format. Notable changes: - If a test is run multiple times, we explicitly return every `actual` result. Previously, if the test produced the same result every time, we'd only return a single value for `actual` - If a test is skipped unexpectedly, that will be considered a regression and an unexpected result. - The test results will contain `is_unexpected`, `is_flaky`, and `is_regression` fields for the matching conditions. Bug: 837047 , 822078 Change-Id: I4896e61469d3b576ea9e7dbbe16fac709f74b6b9 Reviewed-on: https://chromium-review.googlesource.com/1103611 Commit-Queue: Dirk Pranke <dpranke@chromium.org> Reviewed-by: Robert Ma <robertma@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Cr-Commit-Position: refs/heads/master@{#569466} [modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/models/test_expectations.py [modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/models/test_expectations_unittest.py [modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/models/test_run_results.py [modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/models/test_run_results_unittest.py [modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/run_webkit_tests_unittest.py
,
Jun 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/687768e979b009ac2bb90ebca5d4d1af6777b0d6 commit 687768e979b009ac2bb90ebca5d4d1af6777b0d6 Author: Dirk Pranke <dpranke@chromium.org> Date: Fri Jun 22 18:27:07 2018 Revert "Update json test results for webkit_layout_tests." This reverts commit 53235652f36d355532c65674a91968cedc197e56. Reason for revert: Build is no longer reporting failures correctly. Original change's description: > Update json test results for webkit_layout_tests. > > This brings the test results more into compliance with the > current test result standard in bit.ly/chromium-json-test-results-format. > > Notable changes: > - If a test is run multiple times, we explicitly return every > `actual` result. Previously, if the test produced the same result every > time, we'd only return a single value for `actual` > - If a test is skipped unexpectedly, that will be considered a regression > and an unexpected result. > - The test results will contain `is_unexpected`, `is_flaky`, and > `is_regression` fields for the matching conditions. > > Bug: 837047 , 822078 > Change-Id: I4896e61469d3b576ea9e7dbbe16fac709f74b6b9 > Reviewed-on: https://chromium-review.googlesource.com/1103611 > Commit-Queue: Dirk Pranke <dpranke@chromium.org> > Reviewed-by: Robert Ma <robertma@chromium.org> > Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> > Cr-Commit-Position: refs/heads/master@{#569466} TBR=qyearsley@chromium.org,dpranke@chromium.org,seanmccullough@chromium.org,robertma@chromium.org Change-Id: Icf1882e8eea328b115a458afa6378b35bb11a638 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 837047 , 822078 Reviewed-on: https://chromium-review.googlesource.com/1112178 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Commit-Queue: Dirk Pranke <dpranke@chromium.org> Cr-Commit-Position: refs/heads/master@{#569703} [modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/models/test_expectations.py [modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/models/test_expectations_unittest.py [modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/models/test_run_results.py [modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/models/test_run_results_unittest.py [modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/run_webkit_tests_unittest.py
,
Jul 31
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3 commit 6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3 Author: Dirk Pranke <dpranke@chromium.org> Date: Tue Jul 31 20:43:13 2018 Re-land "Update json test results for webkit_layout_tests." This re-lands r569466 (reverting 569703) with the needed fixes. This brings the test results more into compliance with the current test result standard in bit.ly/chromium-json-test-results-format. Notable changes: - If a test is run multiple times, we explicitly return every `actual` result. Previously, if the test produced the same result every time, we'd only return a single value for `actual` - If a test is skipped unexpectedly, that will be considered a regression and an unexpected result. - The test results will contain `is_unexpected`, `is_flaky`, and `is_regression` fields for the matching conditions. Bug: 837047 , 822078 Change-Id: I98a80deb2e7e095fe471507a04e80b8b583ed5cc Reviewed-on: https://chromium-review.googlesource.com/1154107 Commit-Queue: Dirk Pranke <dpranke@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> Reviewed-by: Robert Ma <robertma@chromium.org> Cr-Commit-Position: refs/heads/master@{#579556} [modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/models/test_expectations.py [modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/models/test_expectations_unittest.py [modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/models/test_run_results.py [modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/models/test_run_results_unittest.py [modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/run_webkit_tests_unittest.py
,
Jul 31
,
Sep 21
I think this is fixed now. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by robertma@chromium.org
, Mar 15 2018