New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 850653 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

INVALID_TEST_RESULTS - Track timeouted/expired Swarming shards separately in build property failure_type

Project Member Reported by st...@chromium.org, Jun 7 2018

Issue description

Current, failure_type is set to invalid_test_results for failed builds due to timeouted/expired Swarming shards. This makes it hard to track/analyze CQ false rejections in BQ queries.

We'd better set different values for them or have a sub_failure_type etc.
 

Comment 1 by st...@chromium.org, Jun 7 2018

I can't tell what went wrong in this build https://ci.chromium.org/p/chromium/builds/b8944473492489044032 which is also INVALID_TEST_RESULTS

Comment 3 by st...@chromium.org, Jun 7 2018

Summary: INVALID_TEST_RESULTS - Track timeouted/expired Swarming shards separately in build property failure_type (was: Track timeouted/expired Swarming shards separately in build property failure_type)

Comment 4 by st...@chromium.org, Jun 7 2018

A related case of "Too many tests failing": https://ci.chromium.org/p/chromium/builds/b8944631551060168448
I saw NoDevicesError: All devices were blacklisted due to errors
[W2018-06-06T11:11:56.159669Z 1386 0 butler.go:240] Butler Context was cancelled. Initiating shutdown. {"error":"context canceled"}
INFO:devil.utils.timeout_retry:condition 'logdog_stopped' not met
[E2018-06-06T11:11:56.159918Z 1386 0 main.go:205] Butler terminated with error.               {"error":"context canceled"}
[E2018-06-06T11:11:56.159948Z 1386 0 main.go:233] Failed to serve.
original error: context canceled
[I2018-06-06T11:11:56.160017Z 1386 0 main.go:349] Terminating.                                {"returnCode":250}
INFO:devil.utils.timeout_retry:condition 'logdog_stopped' met

in the log from #1. Looks like maybe logdog failed somehow? Also, one task in that build seemed to have a broken phone.

Is there a list of failure types that exist right now?

Comment 6 by st...@chromium.org, Jun 7 2018

This is not limited to Android device issue 850667. You may check these two spreadsheets:

1) 711 different CQ Luci builds: https://docs.google.com/spreadsheets/d/1MQXJDXW-dB4TjHwBvRiiax26YYXP8i5c1IF9f0Z7ZPU/edit#gid=1356601349
2) 252 different CQ Buildbot builds: https://docs.google.com/spreadsheets/d/1iAdws7P8-wfzAiLGRjgrpOXrlUW6L-0hXbCxLPUJKIw/edit#gid=1229683811

I think there are more failure types, but I don't have the full list.
FYI win7_chromium_rel_ng got migrated to LUCI this week. 

Comment 8 by st...@chromium.org, Jun 25 2018

Friendly ping :)

It would be great that we could have a breakdown soon so that we could categorize the failure types by root causes automatically via BQ sql query.
I'm looking at this now, and the builds in #1 and #2 both are missing information on logdog somehow, so I can't take a look at them. The build in #4 works still, so I'll look at that.
I'm looking at https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win_optional_gpu_tests_rel/2959, and I don't think the number of failures matters here. It might, but my suspicion is that there's an issue with parsing the test result format, since the test is outputting gtest results, and not json test results. I'll look into it more.
Project Member

Comment 11 by bugdroid1@chromium.org, Jun 27 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/9f2f71b1512fba4a69e04169c2901b42ed1cfb8f

commit 9f2f71b1512fba4a69e04169c2901b42ed1cfb8f
Author: Stephen Martinis <martiniss@chromium.org>
Date: Wed Jun 27 20:28:58 2018

Catch exception while deapplying patch

It looks like an exception might be swallowed by a defer_results below
it, so catch the exception and print information about it.

Bug: 850653
Change-Id: If25b002934db46a51d7039eebccb7d4084fd8988
Reviewed-on: https://chromium-review.googlesource.com/1115874
Commit-Queue: Stephen Martinis <martiniss@chromium.org>
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>

[modify] https://crrev.com/9f2f71b1512fba4a69e04169c2901b42ed1cfb8f/scripts/slave/README.recipes.md
[modify] https://crrev.com/9f2f71b1512fba4a69e04169c2901b42ed1cfb8f/scripts/slave/recipes/chromium_trybot.expected/compile_failure_without_patch_deapply_fn.json
[modify] https://crrev.com/9f2f71b1512fba4a69e04169c2901b42ed1cfb8f/scripts/slave/recipe_modules/test_utils/api.py
[modify] https://crrev.com/9f2f71b1512fba4a69e04169c2901b42ed1cfb8f/scripts/slave/recipe_modules/test_utils/__init__.py

Comment 12 by st...@chromium.org, Jun 27 2018

Great, many thanks for looking into this!!!
We have an existing test case where we set test results as invalid. https://cs.chromium.org/chromium/build/scripts/slave/recipes/chromium_trybot.expected/swarming_trigger_failure.json?q=swarming_trigger_failure.json&sq=package:chromium&g=0&l=980

Should this be the case? This test case is for when triggering a task fails. I guess it doesn't have any test results to mark as invalid, so this is a broken test technically.
Dig some digging today.

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/83433 is a build which has a swarming task which timed-out. It is marked as having invalid test results because of https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/chromium_tests/steps.py?q=validate_task_results&sq=package:chromium&g=0&l=1648. The test output from that test has this tag, probably because the swarming task got told by swarming to terminate, because it was taking too long.

That seems to make sense, but possibly shouldn't be happening? stgao@ if you have an opinion either way please let me know. Still looking around more.

Also, FWIW #11 seems to have done nothing :(
I keep looking at builds, and almost all of them seem to be marked as having invalid test results for good reasons. Many of them are marked that way because a swarming shard timed out. The build in #4 is marked that way because gtest added the global tag UNRELIABLE_RESULTS, which makes it be marked as invalid test results.

This is not at all obvious when just looking at the build page, so we should probably surface this better.
Cc: st...@chromium.org
Owner: chanli@chromium.org
I've looked into this, and have a better idea of what caused this. I need some more feedback about what exactly you want to fix.

stgao@ is OOO until July 16th... Re-assigning to chanli@, can you triage to the appropriate person?
Cc: chanli@chromium.org
Owner: liaoyuke@chromium.org
I'm not very familiar about this, but by reading the bug I think what are needed are more specific failure types? For builds failed with failure_type 'INVALID_TEST_RESULTS', maybe we should further classify them to 'shard(s) timeout' or some other types.

Reassign to liaoyuke@ for more inputs.
Owner: martiniss@chromium.org
martiniss@: many thanks for looking into this!
(Sorry, I was on a long vacation and lost track of this bug after coming back.)

What I requested here is the same as chanli@ explained above, I need a further breakdown of the build failure type INVALID_TEST_RESULTS for the different causes:
1. Expired Swarming tasks due to VM/device capacity, e.g.: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-marshmallow-arm64-rel/67080
2. Timeouted Swarming tasks due to bad test or other reason, e.g.: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-marshmallow-arm64-rel/77999

If we want to keep INVALID_TEST_RESULTS as is, my proposal will be to add a new build property SUB_FAILURE_TYPE=EXPIRED_SWARMING_TASK|TIMEOUTED_SWARMING_TASKS|OTHERS
And later, we could further break down the OTHERS here.
What do you think?
I'm fine with the idea of the proposal in #18. I think it'd be worth moving the failure_type information into some sort of json dict, rather than all being set as a plain build property. So, you'd set a build property called 'failure_info' which would have the value

[{
  'test': 'webkit_layout_tests',
  'type': 'INVALID_TEST_RESULT',
  'sub_type': 'EXPIRED_SWARMING_TASK',
}, {...}]

WDYT? Maybe this isn't feasible in the existing build property system, I'm not sure. I want this to be more structured, if this isn't possible for you when you query build information let me know.
That sounds good to me. Build property supports json value as a string I think. nodir@ and iannucci@ would know more about that.

A small tweak I would request is to change "test" to step name, and it is the same as the step name on Milo build page.

I'd like to match that with individual test step.
webkit_layout_tests won't work if there is "with patch" and "without patch".

wdyt?
Cc: iannucci@chromium.org
Sure, I can change that. The value of each dictionary isn't finalized yet, so step_name as a key is fine. 

Robbie, WDYT of dumping test failure information into a build property named 'failure_info'? I feel like maybe this should actually be dumped to an actual server, since it's getting a bit big, rather than shoving it into the build properties.
Cc: no...@chromium.org
How big do we anticipate? IMO, failed steps won't be too many in a single build cycle. I was hoping to do data joining in BigQuery.

+nodir@: is there a size limitation of a build output property?
> Maybe this isn't feasible in the existing build property system, I'm not sure. I want this to be more structured.

this is supported. A property, input or output, can be a JSON object (dict in python).
No need to encode it as a JSON string. It will make it harder to parse in BQ (you'd have to parse JSON twice because it would be JSON encoded in JSON).

---

there is a 1MB limit on Build datastore entity inside buildbucket. That also includes buildbucket-internal stuff, so it is hard to estimate how much of that would be properties. Is the size of the new property O(tests)? Also currently properties are not compressed (although JSON must be very compressible). If we hit this problem, I will probably do that, just LMK
The size is of O(failed test steps). So I would expect it is small compared to O(steps) we already have in the build, unless martiniss@ wants to add more generic info into it.
Project Member

Comment 25 by bugdroid1@chromium.org, Sep 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/5965d3363f9745a40c54d9cbda7302470ec0ebab

commit 5965d3363f9745a40c54d9cbda7302470ec0ebab
Author: erikchen <erikchen@chromium.org>
Date: Thu Sep 13 20:04:02 2018

Remove the method test_utils.determine_new_failures.

This CL is a refactor with no intended behavior change.

The method determine_new_failures is used in exactly two places -- the
chromium_tests recipe module and the blink_downstream recipe.

There are three problems with determine_new_failures.
  * The method takes a closure that references global state. Running this
    closure has significant side effects. This makes it difficult to follow the
    control flow.
  * The retry logic for chromium_tests will be changing in the future. It will
    be difficult to modify determine_new_failures to take on the new behavior,
    partly due to the closure mentioned above.
  * The retry logic for chromium_tests will be diverging from that of
    blink_downstream.

This CL removes the method. The logic was copied into blink_downstream with no
changes. The functionality in chromium_tests is identical to before, but the new
logic is significantly simpler.

This CL removes some debugging logic added in
https://chromium-review.googlesource.com/c/chromium/tools/build/+/1115874 that
is no longer necessary.

Bug:  883321 , 850653
Change-Id: Ie575ddcff824007a7543c344b327e1d7d86bb449
Reviewed-on: https://chromium-review.googlesource.com/1222648
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>

[modify] https://crrev.com/5965d3363f9745a40c54d9cbda7302470ec0ebab/scripts/slave/README.recipes.md
[modify] https://crrev.com/5965d3363f9745a40c54d9cbda7302470ec0ebab/scripts/slave/recipes/blink_downstream.py
[modify] https://crrev.com/5965d3363f9745a40c54d9cbda7302470ec0ebab/scripts/slave/recipe_modules/chromium_tests/api.py
[modify] https://crrev.com/5965d3363f9745a40c54d9cbda7302470ec0ebab/scripts/slave/recipes/chromium_trybot.expected/compile_failure_without_patch_deapply_fn.json
[modify] https://crrev.com/5965d3363f9745a40c54d9cbda7302470ec0ebab/scripts/slave/recipe_modules/test_utils/api.py

Owner: ----
Status: Available (was: Assigned)
I haven't been actively working on this recently. I'm not sure if I'm going to get to this soon, and I'm not sure how high of a priority this is.

If you want to work on this, you'd probably just need to change https://cs.chromium.org/chromium/tools/depot_tools/recipes/recipe_modules/tryserver/api.py?q=tryserver/api.py&sq=package:chromium&dr&l=206 and the callers of it in https://cs.chromium.org/chromium/tools/depot_tools/recipes/recipe_modules/tryserver/api.py?q=tryserver/api.py&sq=package:chromium&dr&l=206.
Components: Tools>Test>FindIt>Flakiness
Owner: st...@chromium.org
Status: Assigned (was: Available)
No problem! Thanks for all the investigations!

I will take over this bug.
Cc: iannu...@google.com
Cc: -iannucci@chromium.org

Sign in to add a comment