One build bot step recipe sometimes hide the extend of failure |
||||||
Issue descriptionIn https://ci.chromium.org/buildbot/chromium.perf/linux-perf/168, the description say: Total tests: 944 * Passed: 909 (909 expected, 0 unexpected) * Skipped: 32 (32 expected, 0 unexpected) * Failed: 3 (0 expected, >>>3 unexpected<<<) * Flaky: 0 (0 expected, 0 unexpected) In reality, a lot many tests were failed to run. The reason are the failure are at benchmark level. These are the shards that fail all benchmark runs in that build: * Shard 12: https://chrome-swarming.appspot.com/task?id=3de57f24c1e6fe10&refresh=10 * Shard 17: https://chrome-swarming.appspot.com/task?id=3de57f32e1dbf010&refresh=10&show_raw=1 The failure of these two shards actually results in around 10 benchmark suites weren't able to run at all. Both sheriff-o-matic & buildbot step fail to show this fact, which make sheriffs fail to prioritize this highly. I am not sure how to solve this problem. Maybe modify Telemetry so that it still output json test results with all the test entry with FAIL status even for the suite level failure?
,
Jun 7 2018
This is quite a problem. Is there another way to plumb this data through to SOM other than the test results format? Currently, when a benchmark fails the test results we produce are the same, the step just turns red in buildbot and the step name is the test. How do those failures get plumbed through to SOM since they aren't in the json test results? Can we follow a pattern there in the recipe? The problem here is not that we need it to be in the flakiness dashboard, it is that we want it to show up in SOM so our sheriffs know it is a problem. Is producing dummy test results data the right way to go? I guess the test did fail, it tried to run and failed. Is it possible to even catch all failures at the benchmark level and write out the test results? I am wondering if there are corner cases there with pushing this logic into telemetry.
,
Jun 7 2018
My 2c, is that benchmark level errors should *not* show up in flakiness dashboard. As far as test results are concerned, the test did not run, and it should be treated as such. Not sure what the right way is to plumb these benchmark level errors through to SOM. If the buildbot step did turn red, shouldn't it be able to pick up that failure somehow?
,
Jun 7 2018
+Sean in case he has opinions about how SOM should surface this
,
Jun 7 2018
> In reality, a lot many tests were failed to run. The reason are the failure > are at benchmark level. I don't fully understand this sentence. Why aren't the benchmark-level failures being reflected as test failures? Did the tests themselves not run?
,
Jun 7 2018
#5: Telemetry bails out very early if it cannot find any browser. So all the tests were skipped.
The stack trace is:
RunBenchmark at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:368
expectations=expectations, max_num_values=benchmark.MAX_NUM_VALUES)
Run at /b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:215
test, finder_options.Copy(), story_set)
traced_function at /b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
return func(*args, **kwargs)
__init__ at /b/s/w/ir/third_party/catapult/telemetry/telemetry/page/shared_page_state.py:56
self._test, finder_options)
traced_function at /b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
return func(*args, **kwargs)
_GetPossibleBrowser at /b/s/w/ir/third_party/catapult/telemetry/telemetry/page/shared_page_state.py:113
possible_browser = self._FindBrowser(finder_options)
traced_function at /b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
return func(*args, **kwargs)
_FindBrowser at /b/s/w/ir/third_party/catapult/telemetry/telemetry/page/shared_page_state.py:108
finder_options))))
BrowserFinderException: Cannot find browser of type release.
Available browsers:
,
Jun 8 2018
In that case the best think to do would indeed be to mark each test as failed. You could also use the 'interrupted' top-level field in the results, but that's probably a bit misleading. The format doesn't currently have a good way to support harness-level errors.
,
Jun 11 2018
Sean can you or someone clarify how results like infra failures get plumbed through to SOM? When a test fails due to infrastructure issues (ie a down bot) that doesn't show up in the json test results does it? How does that get plumbed through to SOM?
,
Jun 11 2018
After talking offline with Ned, what if we added a json test entry for each test that didn't get run with the expectation of PASS but the actual being SKIP. Sean, how will that surface in SOM?
,
Jun 11 2018
Adding Tiffany since Sean is out this week to comment on how SOM will surface this information.
,
Jun 11 2018
We'd have to double-check the recipe code at least, since I'm not sure if an unexpected skip will turn the step red. I'm also not sure how it would be displayed in the build results or in SOM.
,
Jun 11 2018
That said, it might the right solution and so we need to make it work.
,
Jun 11 2018
Looks like the recipe code doesn't do anything with skipped test. It just assumes that all skips are expected: https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/chromium_tests/steps.py?l=1170 I think this is the right general solution. We still need to verify what this will look like in SOM
,
Jun 11 2018
these are basically unexpected skips, though, right?
,
Jun 11 2018
Right, they were expected to pass and instead were skipped. Interestingly we don't show the step as failing when there are unexpected passing tests (ie expected them to fail or skip but they passed)=
,
Jun 11 2018
> Interestingly we don't show the step as failing when there are unexpected > passing tests (ie expected them to fail or skip but they passed)= That's intentional. We used to treat unexpected passes as "warnings", but people ignored them. People didn't want to treat an unexpected pass as a failure, because it's not actually failing and so would be noise to the sheriffs. Ideally, of course, people would update the expectations so that the pass would be expected.
,
Jun 12 2018
Sorry for the delayed response here. Looking to see how unexpected skips would show up in SoM. If I understand correctly, tests show up as failures in SoM if they have unexpected results which are not passes: https://cs.chromium.org/chromium/infra/go/src/infra/appengine/sheriff-o-matic/som/analyzer/step/test_step.go?l=314 So I think the unexpected skips might work, at least if they're marked as unexpected at the point they get to SoM.
,
Jun 14 2018
,
Jun 14 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/42e9609fc92ef81cf5d093b0d16f0042fd69b4bc commit 42e9609fc92ef81cf5d093b0d16f0042fd69b4bc Author: Emily Hanley <eyaich@google.com> Date: Thu Jun 14 18:44:38 2018 Telemetry tests output SKIP when telemetry fails to run them Bug: chromium:850503 Change-Id: I560a30243797383eac2e5498682a2039e47b7183 Reviewed-on: https://chromium-review.googlesource.com/1101169 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/42e9609fc92ef81cf5d093b0d16f0042fd69b4bc/telemetry/telemetry/internal/results/page_test_results_unittest.py [modify] https://crrev.com/42e9609fc92ef81cf5d093b0d16f0042fd69b4bc/telemetry/telemetry/internal/results/page_test_results.py [modify] https://crrev.com/42e9609fc92ef81cf5d093b0d16f0042fd69b4bc/telemetry/telemetry/internal/results/json_3_output_formatter_unittest.py [modify] https://crrev.com/42e9609fc92ef81cf5d093b0d16f0042fd69b4bc/telemetry/telemetry/value/skip.py [modify] https://crrev.com/42e9609fc92ef81cf5d093b0d16f0042fd69b4bc/telemetry/telemetry/internal/story_runner.py [modify] https://crrev.com/42e9609fc92ef81cf5d093b0d16f0042fd69b4bc/telemetry/telemetry/internal/results/json_3_output_formatter.py [modify] https://crrev.com/42e9609fc92ef81cf5d093b0d16f0042fd69b4bc/telemetry/telemetry/internal/results/story_run_unittest.py [modify] https://crrev.com/42e9609fc92ef81cf5d093b0d16f0042fd69b4bc/telemetry/telemetry/internal/story_runner_unittest.py [modify] https://crrev.com/42e9609fc92ef81cf5d093b0d16f0042fd69b4bc/telemetry/telemetry/internal/results/gtest_progress_reporter_unittest.py [modify] https://crrev.com/42e9609fc92ef81cf5d093b0d16f0042fd69b4bc/telemetry/telemetry/internal/results/gtest_progress_reporter.py [modify] https://crrev.com/42e9609fc92ef81cf5d093b0d16f0042fd69b4bc/telemetry/telemetry/internal/results/story_run.py [modify] https://crrev.com/42e9609fc92ef81cf5d093b0d16f0042fd69b4bc/telemetry/telemetry/value/skip_unittest.py
,
Jun 14 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6adc519d1d3340e015e0bcea0c58a61136b390d4 commit 6adc519d1d3340e015e0bcea0c58a61136b390d4 Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Thu Jun 14 22:59:29 2018 Roll src/third_party/catapult 78a0b0424642..42e9609fc92e (3 commits) https://chromium.googlesource.com/catapult.git/+log/78a0b0424642..42e9609fc92e git log 78a0b0424642..42e9609fc92e --date=short --no-merges --format='%ad %ae %s' 2018-06-14 eyaich@google.com Telemetry tests output SKIP when telemetry fails to run them 2018-06-14 benjhayden@chromium.org Update soundwave and benchmark_health_report to use the new /api/alerts. 2018-06-14 dtu@chromium.org [pinpoint] Pull values comparison out to a separate folder. Created with: gclient setdep -r src/third_party/catapult@42e9609fc92e The AutoRoll server is located here: https://catapult-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel BUG=chromium:850503 TBR=sullivan@chromium.org Change-Id: I6aaa2a1bd056ed30634641d5d314340efdbbe2a1 Reviewed-on: https://chromium-review.googlesource.com/1101658 Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#567459} [modify] https://crrev.com/6adc519d1d3340e015e0bcea0c58a61136b390d4/DEPS
,
Jun 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/d8627bda5502218d9499d7645bbc3d364932edc3 commit d8627bda5502218d9499d7645bbc3d364932edc3 Author: Emily Hanley <eyaich@google.com> Date: Sat Jun 16 01:05:33 2018 Updating chromium_test recipe to report unexpected skips accurately. Bug:850503 Change-Id: Ic40d8c9d4ad83af6c2798912369bc9b0851f849f Reviewed-on: https://chromium-review.googlesource.com/1102824 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Commit-Queue: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/d8627bda5502218d9499d7645bbc3d364932edc3/scripts/slave/recipe_modules/test_utils/util.py [modify] https://crrev.com/d8627bda5502218d9499d7645bbc3d364932edc3/scripts/slave/recipe_modules/chromium_tests/steps.py
,
Jan 16
(6 days ago)
,
Jan 16
(6 days ago)
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by nednguyen@chromium.org
, Jun 7 2018