INVALID_TEST_RESULTS - Track timeouted/expired Swarming shards separately in build property failure_type |
|||||||||||
Issue descriptionCurrent, failure_type is set to invalid_test_results for failed builds due to timeouted/expired Swarming shards. This makes it hard to track/analyze CQ false rejections in BQ queries. We'd better set different values for them or have a sub_failure_type etc.
,
Jun 7 2018
An example of task timeout: https://ci.chromium.org/p/chromium/builds/b8944578170160750384
,
Jun 7 2018
,
Jun 7 2018
A related case of "Too many tests failing": https://ci.chromium.org/p/chromium/builds/b8944631551060168448
,
Jun 7 2018
I saw NoDevicesError: All devices were blacklisted due to errors
[W2018-06-06T11:11:56.159669Z 1386 0 butler.go:240] Butler Context was cancelled. Initiating shutdown. {"error":"context canceled"}
INFO:devil.utils.timeout_retry:condition 'logdog_stopped' not met
[E2018-06-06T11:11:56.159918Z 1386 0 main.go:205] Butler terminated with error. {"error":"context canceled"}
[E2018-06-06T11:11:56.159948Z 1386 0 main.go:233] Failed to serve.
original error: context canceled
[I2018-06-06T11:11:56.160017Z 1386 0 main.go:349] Terminating. {"returnCode":250}
INFO:devil.utils.timeout_retry:condition 'logdog_stopped' met
in the log from #1. Looks like maybe logdog failed somehow? Also, one task in that build seemed to have a broken phone.
Is there a list of failure types that exist right now?
,
Jun 7 2018
This is not limited to Android device issue 850667. You may check these two spreadsheets: 1) 711 different CQ Luci builds: https://docs.google.com/spreadsheets/d/1MQXJDXW-dB4TjHwBvRiiax26YYXP8i5c1IF9f0Z7ZPU/edit#gid=1356601349 2) 252 different CQ Buildbot builds: https://docs.google.com/spreadsheets/d/1iAdws7P8-wfzAiLGRjgrpOXrlUW6L-0hXbCxLPUJKIw/edit#gid=1229683811 I think there are more failure types, but I don't have the full list.
,
Jun 7 2018
FYI win7_chromium_rel_ng got migrated to LUCI this week.
,
Jun 25 2018
Friendly ping :) It would be great that we could have a breakdown soon so that we could categorize the failure types by root causes automatically via BQ sql query.
,
Jun 26 2018
I'm looking at this now, and the builds in #1 and #2 both are missing information on logdog somehow, so I can't take a look at them. The build in #4 works still, so I'll look at that.
,
Jun 26 2018
I'm looking at https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win_optional_gpu_tests_rel/2959, and I don't think the number of failures matters here. It might, but my suspicion is that there's an issue with parsing the test result format, since the test is outputting gtest results, and not json test results. I'll look into it more.
,
Jun 27 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/9f2f71b1512fba4a69e04169c2901b42ed1cfb8f commit 9f2f71b1512fba4a69e04169c2901b42ed1cfb8f Author: Stephen Martinis <martiniss@chromium.org> Date: Wed Jun 27 20:28:58 2018 Catch exception while deapplying patch It looks like an exception might be swallowed by a defer_results below it, so catch the exception and print information about it. Bug: 850653 Change-Id: If25b002934db46a51d7039eebccb7d4084fd8988 Reviewed-on: https://chromium-review.googlesource.com/1115874 Commit-Queue: Stephen Martinis <martiniss@chromium.org> Reviewed-by: Dirk Pranke <dpranke@chromium.org> Reviewed-by: John Budorick <jbudorick@chromium.org> Reviewed-by: Robbie Iannucci <iannucci@chromium.org> [modify] https://crrev.com/9f2f71b1512fba4a69e04169c2901b42ed1cfb8f/scripts/slave/README.recipes.md [modify] https://crrev.com/9f2f71b1512fba4a69e04169c2901b42ed1cfb8f/scripts/slave/recipes/chromium_trybot.expected/compile_failure_without_patch_deapply_fn.json [modify] https://crrev.com/9f2f71b1512fba4a69e04169c2901b42ed1cfb8f/scripts/slave/recipe_modules/test_utils/api.py [modify] https://crrev.com/9f2f71b1512fba4a69e04169c2901b42ed1cfb8f/scripts/slave/recipe_modules/test_utils/__init__.py
,
Jun 27 2018
Great, many thanks for looking into this!!!
,
Jun 28 2018
We have an existing test case where we set test results as invalid. https://cs.chromium.org/chromium/build/scripts/slave/recipes/chromium_trybot.expected/swarming_trigger_failure.json?q=swarming_trigger_failure.json&sq=package:chromium&g=0&l=980 Should this be the case? This test case is for when triggering a task fails. I guess it doesn't have any test results to mark as invalid, so this is a broken test technically.
,
Jun 28 2018
Dig some digging today. https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/83433 is a build which has a swarming task which timed-out. It is marked as having invalid test results because of https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/chromium_tests/steps.py?q=validate_task_results&sq=package:chromium&g=0&l=1648. The test output from that test has this tag, probably because the swarming task got told by swarming to terminate, because it was taking too long. That seems to make sense, but possibly shouldn't be happening? stgao@ if you have an opinion either way please let me know. Still looking around more. Also, FWIW #11 seems to have done nothing :(
,
Jun 29 2018
I keep looking at builds, and almost all of them seem to be marked as having invalid test results for good reasons. Many of them are marked that way because a swarming shard timed out. The build in #4 is marked that way because gtest added the global tag UNRELIABLE_RESULTS, which makes it be marked as invalid test results. This is not at all obvious when just looking at the build page, so we should probably surface this better.
,
Jun 29 2018
I've looked into this, and have a better idea of what caused this. I need some more feedback about what exactly you want to fix. stgao@ is OOO until July 16th... Re-assigning to chanli@, can you triage to the appropriate person?
,
Jun 29 2018
I'm not very familiar about this, but by reading the bug I think what are needed are more specific failure types? For builds failed with failure_type 'INVALID_TEST_RESULTS', maybe we should further classify them to 'shard(s) timeout' or some other types. Reassign to liaoyuke@ for more inputs.
,
Sep 6
martiniss@: many thanks for looking into this! (Sorry, I was on a long vacation and lost track of this bug after coming back.) What I requested here is the same as chanli@ explained above, I need a further breakdown of the build failure type INVALID_TEST_RESULTS for the different causes: 1. Expired Swarming tasks due to VM/device capacity, e.g.: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-marshmallow-arm64-rel/67080 2. Timeouted Swarming tasks due to bad test or other reason, e.g.: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-marshmallow-arm64-rel/77999 If we want to keep INVALID_TEST_RESULTS as is, my proposal will be to add a new build property SUB_FAILURE_TYPE=EXPIRED_SWARMING_TASK|TIMEOUTED_SWARMING_TASKS|OTHERS And later, we could further break down the OTHERS here. What do you think?
,
Sep 13
I'm fine with the idea of the proposal in #18. I think it'd be worth moving the failure_type information into some sort of json dict, rather than all being set as a plain build property. So, you'd set a build property called 'failure_info' which would have the value
[{
'test': 'webkit_layout_tests',
'type': 'INVALID_TEST_RESULT',
'sub_type': 'EXPIRED_SWARMING_TASK',
}, {...}]
WDYT? Maybe this isn't feasible in the existing build property system, I'm not sure. I want this to be more structured, if this isn't possible for you when you query build information let me know.
,
Sep 13
That sounds good to me. Build property supports json value as a string I think. nodir@ and iannucci@ would know more about that. A small tweak I would request is to change "test" to step name, and it is the same as the step name on Milo build page. I'd like to match that with individual test step. webkit_layout_tests won't work if there is "with patch" and "without patch". wdyt?
,
Sep 13
Sure, I can change that. The value of each dictionary isn't finalized yet, so step_name as a key is fine. Robbie, WDYT of dumping test failure information into a build property named 'failure_info'? I feel like maybe this should actually be dumped to an actual server, since it's getting a bit big, rather than shoving it into the build properties.
,
Sep 13
How big do we anticipate? IMO, failed steps won't be too many in a single build cycle. I was hoping to do data joining in BigQuery. +nodir@: is there a size limitation of a build output property?
,
Sep 13
> Maybe this isn't feasible in the existing build property system, I'm not sure. I want this to be more structured. this is supported. A property, input or output, can be a JSON object (dict in python). No need to encode it as a JSON string. It will make it harder to parse in BQ (you'd have to parse JSON twice because it would be JSON encoded in JSON). --- there is a 1MB limit on Build datastore entity inside buildbucket. That also includes buildbucket-internal stuff, so it is hard to estimate how much of that would be properties. Is the size of the new property O(tests)? Also currently properties are not compressed (although JSON must be very compressible). If we hit this problem, I will probably do that, just LMK
,
Sep 13
The size is of O(failed test steps). So I would expect it is small compared to O(steps) we already have in the build, unless martiniss@ wants to add more generic info into it.
,
Sep 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/5965d3363f9745a40c54d9cbda7302470ec0ebab commit 5965d3363f9745a40c54d9cbda7302470ec0ebab Author: erikchen <erikchen@chromium.org> Date: Thu Sep 13 20:04:02 2018 Remove the method test_utils.determine_new_failures. This CL is a refactor with no intended behavior change. The method determine_new_failures is used in exactly two places -- the chromium_tests recipe module and the blink_downstream recipe. There are three problems with determine_new_failures. * The method takes a closure that references global state. Running this closure has significant side effects. This makes it difficult to follow the control flow. * The retry logic for chromium_tests will be changing in the future. It will be difficult to modify determine_new_failures to take on the new behavior, partly due to the closure mentioned above. * The retry logic for chromium_tests will be diverging from that of blink_downstream. This CL removes the method. The logic was copied into blink_downstream with no changes. The functionality in chromium_tests is identical to before, but the new logic is significantly simpler. This CL removes some debugging logic added in https://chromium-review.googlesource.com/c/chromium/tools/build/+/1115874 that is no longer necessary. Bug: 883321 , 850653 Change-Id: Ie575ddcff824007a7543c344b327e1d7d86bb449 Reviewed-on: https://chromium-review.googlesource.com/1222648 Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: Erik Chen <erikchen@chromium.org> [modify] https://crrev.com/5965d3363f9745a40c54d9cbda7302470ec0ebab/scripts/slave/README.recipes.md [modify] https://crrev.com/5965d3363f9745a40c54d9cbda7302470ec0ebab/scripts/slave/recipes/blink_downstream.py [modify] https://crrev.com/5965d3363f9745a40c54d9cbda7302470ec0ebab/scripts/slave/recipe_modules/chromium_tests/api.py [modify] https://crrev.com/5965d3363f9745a40c54d9cbda7302470ec0ebab/scripts/slave/recipes/chromium_trybot.expected/compile_failure_without_patch_deapply_fn.json [modify] https://crrev.com/5965d3363f9745a40c54d9cbda7302470ec0ebab/scripts/slave/recipe_modules/test_utils/api.py
,
Oct 10
I haven't been actively working on this recently. I'm not sure if I'm going to get to this soon, and I'm not sure how high of a priority this is. If you want to work on this, you'd probably just need to change https://cs.chromium.org/chromium/tools/depot_tools/recipes/recipe_modules/tryserver/api.py?q=tryserver/api.py&sq=package:chromium&dr&l=206 and the callers of it in https://cs.chromium.org/chromium/tools/depot_tools/recipes/recipe_modules/tryserver/api.py?q=tryserver/api.py&sq=package:chromium&dr&l=206.
,
Oct 16
No problem! Thanks for all the investigations! I will take over this bug.
,
Oct 18
,
Oct 18
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by st...@chromium.org
, Jun 7 2018