Clarify the "is_unexpected" field in the JSON test result format |
|||||||
Issue descriptionhttps://chromium.googlesource.com/chromium/src/+/master/docs/testing/json_test_results_format.md says that "is_unexpected" > If present and true, the failure was unexpected (a regression). If false (or if the key is not present at all), the failure was expected and will be ignored. It doesn't cover a bunch of edge cases. 1. Unexpected passes currently also have is_unexpected set to true, which I think is sensible and should be documented. 2. Things are more complicated when we retry. For example, if a test is expected to pass but has a 'FAIL PASS' actual result, should is_unexpected be true? In issue 822078 , I looked at recipe_modules' util.py, especially the logics related to "unexpected flake", and thought is_unexpected would be true in this case, but apparently that's not the case in blinkpy: https://cs.chromium.org/chromium/src/third_party/blink/tools/blinkpy/web_tests/models/test_run_results.py?l=279&rcl=cb6928930fc1fb8e29aebc6dd24af31367719bfb In layout tests, is_unexpected is set to true when *none* of the results is expected. The document is unclear about this edge case, but I think we should set is_unexpected to true when *any* result is unexpected, so that we can conveniently identify unexpected flakes, which will be of great help for FindIt. Based on my understanding of the infra, my proposed change will make the webkit_layout_tests step to be orange more often on the bots because unexpected flakes are displayed as warnings, but won't fail the build. This will also make it easier for people to spot bad (flaky) layout tests.
,
Apr 27 2018
Issue 837052 has been merged into this issue.
,
Apr 27 2018
,
Apr 27 2018
,
Apr 27 2018
Oh sorry for the duplicates. Monorail had some database issues yesterday.
,
Apr 28 2018
> 1. Unexpected passes currently also have is_unexpected set to true, > which I think is sensible and should be documented. You're right, this should read "the result was unexpected" and it isn't necessarily a regression. We've discussed adding a separate "is_regression" field to disambiguate (see bug 822078 ). > 2. Things are more complicated when we retry. The value corresponds to the result of the *last* attempt, so in the 'FAIL PASS' case is_unexpected should be false. This works only because we stop retrying after we get a passing result. The is_unexpected result is not about detecting flakes. You can tell that something is a flake simply by seeing that it ran more than once. Of course, if you are using --repeat-count or something like that to intentionally run a test more than once, a bunch of stuff in this format doesn't work well.
,
Apr 28 2018
,
Apr 28 2018
,
Apr 30 2018
> The value corresponds to the result of the *last* attempt, so in the 'FAIL PASS' case is_unexpected should be false. This works only because we stop retrying after we get a passing result. There are two reasons why I proposed to set is_unexpected to true when any of the results (attempts) fails: 1. This way we can easily tell "unexpected flakes" (pass after retry) apart from other passes. Besides, util.py already knows how to handle this case: https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/test_utils/util.py?rcl=546a13ec5959964fda7750bac483bc5efd891f85&l=116 2. In issue 828605 , I'm adding results from all repeats/iterations to the "actual" field for FindIt. My proposed new definition of is_unexpected will also be more useful for them. (Besides, in the repeat/iteration use case, we don't stop after we get a passing result.)
,
Apr 30 2018
There are multiple different questions you might ask: - was "overall" (== last) result unexpected? - was any result unexpected? - how many (%age) results were unexpected? The current implementation answers the first question. Your proposed change answers the second question, but breaks the first one, i.e., it's not a backwards-compatible change. That doesn't seem like a strict improvement to me. I prefer to optimize for the "simpler" case rather than the more complicated case, and I would prefer we keep compatibility where possible. It seems fine to me to ask the FindIt code to do more work if it's trying to do something more complicated.
,
Jun 11 2018
Ping robertma@chromium.org. What's the status of this clarification?
,
Jun 11 2018
I plan to land a CL updated the text today.
,
Jun 12 2018
,
Jun 12 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b5f530d4fbf2083d511467906156a4891c38cf5c commit b5f530d4fbf2083d511467906156a4891c38cf5c Author: Dirk Pranke <dpranke@chromium.org> Date: Tue Jun 12 22:51:32 2018 Update json_test_results_format docs. This CL adds the `is_regression` per-test field (though it won't yet be set by anything), and adds a bunch of text clarifying the differences between failures, unexpected results, and regressions. It also does some general editorial cleanup, fixing typos and re-sorting some of the enumerated values. BUG= 822078 , 837047 Change-Id: If78f6ef66c01d7c2924bfd98589832fe9299968f Reviewed-on: https://chromium-review.googlesource.com/1096486 Commit-Queue: Dirk Pranke <dpranke@chromium.org> Reviewed-by: Robert Ma <robertma@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> Reviewed-by: Chan Li <chanli@chromium.org> Reviewed-by: Stephen Martinis <martiniss@chromium.org> Cr-Commit-Position: refs/heads/master@{#566615} [modify] https://crrev.com/b5f530d4fbf2083d511467906156a4891c38cf5c/docs/testing/json_test_results_format.md
,
Jun 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/53235652f36d355532c65674a91968cedc197e56 commit 53235652f36d355532c65674a91968cedc197e56 Author: Dirk Pranke <dpranke@chromium.org> Date: Fri Jun 22 00:17:36 2018 Update json test results for webkit_layout_tests. This brings the test results more into compliance with the current test result standard in bit.ly/chromium-json-test-results-format. Notable changes: - If a test is run multiple times, we explicitly return every `actual` result. Previously, if the test produced the same result every time, we'd only return a single value for `actual` - If a test is skipped unexpectedly, that will be considered a regression and an unexpected result. - The test results will contain `is_unexpected`, `is_flaky`, and `is_regression` fields for the matching conditions. Bug: 837047 , 822078 Change-Id: I4896e61469d3b576ea9e7dbbe16fac709f74b6b9 Reviewed-on: https://chromium-review.googlesource.com/1103611 Commit-Queue: Dirk Pranke <dpranke@chromium.org> Reviewed-by: Robert Ma <robertma@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Cr-Commit-Position: refs/heads/master@{#569466} [modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/models/test_expectations.py [modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/models/test_expectations_unittest.py [modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/models/test_run_results.py [modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/models/test_run_results_unittest.py [modify] https://crrev.com/53235652f36d355532c65674a91968cedc197e56/third_party/blink/tools/blinkpy/web_tests/run_webkit_tests_unittest.py
,
Jun 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/687768e979b009ac2bb90ebca5d4d1af6777b0d6 commit 687768e979b009ac2bb90ebca5d4d1af6777b0d6 Author: Dirk Pranke <dpranke@chromium.org> Date: Fri Jun 22 18:27:07 2018 Revert "Update json test results for webkit_layout_tests." This reverts commit 53235652f36d355532c65674a91968cedc197e56. Reason for revert: Build is no longer reporting failures correctly. Original change's description: > Update json test results for webkit_layout_tests. > > This brings the test results more into compliance with the > current test result standard in bit.ly/chromium-json-test-results-format. > > Notable changes: > - If a test is run multiple times, we explicitly return every > `actual` result. Previously, if the test produced the same result every > time, we'd only return a single value for `actual` > - If a test is skipped unexpectedly, that will be considered a regression > and an unexpected result. > - The test results will contain `is_unexpected`, `is_flaky`, and > `is_regression` fields for the matching conditions. > > Bug: 837047 , 822078 > Change-Id: I4896e61469d3b576ea9e7dbbe16fac709f74b6b9 > Reviewed-on: https://chromium-review.googlesource.com/1103611 > Commit-Queue: Dirk Pranke <dpranke@chromium.org> > Reviewed-by: Robert Ma <robertma@chromium.org> > Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> > Cr-Commit-Position: refs/heads/master@{#569466} TBR=qyearsley@chromium.org,dpranke@chromium.org,seanmccullough@chromium.org,robertma@chromium.org Change-Id: Icf1882e8eea328b115a458afa6378b35bb11a638 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 837047 , 822078 Reviewed-on: https://chromium-review.googlesource.com/1112178 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Commit-Queue: Dirk Pranke <dpranke@chromium.org> Cr-Commit-Position: refs/heads/master@{#569703} [modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/models/test_expectations.py [modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/models/test_expectations_unittest.py [modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/models/test_run_results.py [modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/models/test_run_results_unittest.py [modify] https://crrev.com/687768e979b009ac2bb90ebca5d4d1af6777b0d6/third_party/blink/tools/blinkpy/web_tests/run_webkit_tests_unittest.py
,
Jul 31
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3 commit 6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3 Author: Dirk Pranke <dpranke@chromium.org> Date: Tue Jul 31 20:43:13 2018 Re-land "Update json test results for webkit_layout_tests." This re-lands r569466 (reverting 569703) with the needed fixes. This brings the test results more into compliance with the current test result standard in bit.ly/chromium-json-test-results-format. Notable changes: - If a test is run multiple times, we explicitly return every `actual` result. Previously, if the test produced the same result every time, we'd only return a single value for `actual` - If a test is skipped unexpectedly, that will be considered a regression and an unexpected result. - The test results will contain `is_unexpected`, `is_flaky`, and `is_regression` fields for the matching conditions. Bug: 837047 , 822078 Change-Id: I98a80deb2e7e095fe471507a04e80b8b583ed5cc Reviewed-on: https://chromium-review.googlesource.com/1154107 Commit-Queue: Dirk Pranke <dpranke@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> Reviewed-by: Robert Ma <robertma@chromium.org> Cr-Commit-Position: refs/heads/master@{#579556} [modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/models/test_expectations.py [modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/models/test_expectations_unittest.py [modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/models/test_run_results.py [modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/models/test_run_results_unittest.py [modify] https://crrev.com/6f42c06285f026d8a9a9cf3e4c86c076cf05a5f3/third_party/blink/tools/blinkpy/web_tests/run_webkit_tests_unittest.py
,
Sep 21
I think this is fixed now. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by foolip@chromium.org
, Apr 27 2018