New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 734154 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Build failure_type is incorrect

Project Member Reported by katthomas@chromium.org, Jun 16 2017

Issue description

Build: https://luci-milo.appspot.com/buildbot/tryserver.chromium.linux/linux_chromium_rel_ng/437371

In this build, a step fails due to lack of swarming capacity. This step is recorded with result INFRA_FAILURE and the step is purple. Then, the corresponding step without the patch is recorded with result INVALID_TEST_RESULTS, as expected since the "with patch" step failed. This step is red. However, the overall build failure_type that is reported is INVALID_TEST_RESULTS and is purple.

Lots of mixed messages here. The build failed because we couldn't even run the step with the patch. The build should be purple with failure_type INFRA_FAILURE.

 
Cc: phajdan.jr@chromium.org
Is this a problem in the recipe (we returned red instead of purple), a monitoring failure, or both, do you know?
I'm not sure. Another possible culprit is whatever collects swarming results. 
I don't think it's whatever collected swarming results. Whether or not the failure is red or purple is almost certainly a recipe issue; mostly I'm wondering if fixing that will fix whatever other issue we may have it, if there is one.
I am working on fixing reporting failure_type by CQ in issue 724916. However, if the failure_type build property itself is incorrectly reported as in example build above, we would need to be fixing recipes to correctly detect failure type.
Owner: phajdan.jr@chromium.org
Status: Assigned (was: Untriaged)
Paweł, can you take a look and see if there's a bug in the recipe itself that we need to fix?
Owner: ----
Status: Untriaged (was: Assigned)
Removing myself from bugs because of team transfer, back to re-triage.

See https://goto.google.com/phajdan-goodbye-chrome (Google-internal) and  issue 783662  .

In case of any questions, feel free to ask - use phajdan@google.com for a faster response.
Labels: -Pri-2 Pri-1
Owner: bpastene@chromium.org
Status: Assigned (was: Untriaged)
This looks to still be a problem:
https://ci.chromium.org/buildbot/tryserver.chromium.android/android_n5x_swarming_rel/322834

I wonder if an easy fix would be to only set failure type to INVALID_TEST_RESULTS iff failure_type is null/none (https://codesearch.chromium.org/chromium/build/scripts/slave/recipe_modules/test_utils/api.py?rcl=d812d78fb390f73b59a439d43aa8e49359a47563&l=161) Any other type of failure should (IMO) take precedence over failure to merge/upload test results.

Given that this is misrepresenting CQ failures and there's active efforts to start improving how we monitor and react to CQ data, I'm bumping this to P1 and will start looking into it.
It looks like we're muxing "having no results" with "having invalid results":
https://codesearch.chromium.org/chromium/build/scripts/slave/recipe_modules/chromium_tests/steps.py?rcl=ff3a8cb62cfff59520aacca6e9f7f009f7d96ded&l=511

That makes things a bit more difficult.

Sign in to add a comment