New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 710710 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 711010
issue 711030



Sign in to add a comment

If the first collected swarming task is expired, no remaining swarming tasks are collected

Project Member Reported by bpastene@chromium.org, Apr 12 2017

Issue description

See:
https://build.chromium.org/p/chromium.android.fyi/builders/x64%20Device%20Tester/builds/1212
vs
https://build.chromium.org/p/chromium.android.fyi/builders/x64%20Device%20Tester/builds/1213

In the second build, all tasks expired, but only the first task is displayed as such. r06e6b3accc66f6c31053055c8e0efcd978f18b03 landed in between those two and is likely related, so assigning to author.
 
Status: Started (was: Assigned)
This appears to be the result of running _handle_summary_json for all task types, including gtests. Previously, we were running it in all cases *other* than gtests.
er, rather, this was likely happening before for non-gtest tasks, but it's only appearing on that bot now because it only runs gtest tasks.
Blocking: 711010
Labels: -Pri-2 Pri-1
Bump up the priority since this is making us losing lots of perf data
Cc: eakuefner@chromium.org nedngu...@google.com
 Issue 711010  has been merged into this issue.
It seems our tests also had similar problem.
https://uberchromegw.corp.google.com/i/internal.mediarouter/builders/Windows%20Build/builds/2504

It only happens on Windows build though.
Cc: serg...@chromium.org phajdan.jr@chromium.org
Thanks John for the quick fix in https://chromium-review.googlesource.com/c/476031/.

I still think the root cause is the loop in https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/chromium_tests/api.py?rcl=424fd63dc75bf3fec55f76b98c932621c52577cf&l=286 is too harsh:

      for t in tests:
        try:
          t.run(self._api_for_tests, suffix)
        except self.m.step.InfraFailure:  # pragma: no cover
          raise
        except self.m.step.StepFailure:  # pragma: no cover
          failed_tests.append(t)
          if t.abort_on_failure:
            raise

This means any InfraFailure on any step will block the whole build, but I don't think we can guarantee a zero percent failure rate for  InfraFailure, especially given any SwarmingFailure will be an InfraFailure (see https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/swarming/api.py?rcl=424fd63dc75bf3fec55f76b98c932621c52577cf&l=668)

I think the loop here should be adjusted so that the build keeps going upon InfraFailure. 

What do other folks think?
Project Member

Comment 9 by bugdroid1@chromium.org, Apr 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/b2c7e2705067c2f48e6ef702402467972f74de09

commit b2c7e2705067c2f48e6ef702402467972f74de09
Author: John Budorick <jbudorick@chromium.org>
Date: Wed Apr 12 21:08:49 2017

Catch all exceptions from swarming summary json processing.

Bug:710710

Change-Id: I628e6c695165221f59103f58b3bb6f266f7dc5a2
Reviewed-on: https://chromium-review.googlesource.com/476031
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Commit-Queue: John Budorick <jbudorick@chromium.org>

[modify] https://crrev.com/b2c7e2705067c2f48e6ef702402467972f74de09/scripts/slave/recipe_modules/swarming/example.expected/swarming_expired_new.json
[modify] https://crrev.com/b2c7e2705067c2f48e6ef702402467972f74de09/scripts/slave/recipe_modules/ios/example.expected/expired.json
[modify] https://crrev.com/b2c7e2705067c2f48e6ef702402467972f74de09/scripts/slave/recipes/chromium.expected/dynamic_swarmed_sharded_isolated_chartjson_test_harness_failure.json
[modify] https://crrev.com/b2c7e2705067c2f48e6ef702402467972f74de09/scripts/slave/recipe_modules/swarming/api.py
[modify] https://crrev.com/b2c7e2705067c2f48e6ef702402467972f74de09/scripts/slave/recipes/chromium.expected/dynamic_swarmed_sharded_invalid_json_isolated_script_test.json
[modify] https://crrev.com/b2c7e2705067c2f48e6ef702402467972f74de09/scripts/slave/recipe_modules/swarming/example.expected/swarming_expired_old.json
[modify] https://crrev.com/b2c7e2705067c2f48e6ef702402467972f74de09/scripts/slave/recipes/chromium.expected/dynamic_swarmed_passed_isolated_script_test_with_swarming_failure.json

#7: that's likely a different issue triggered by the same CL.

#8: I think that'll be out of the scope of this bug.
Cc: -eakuefner@chromium.org
# 10: Sorry, I filed  issue 711030  for that

Comment 13 by kbr@chromium.org, Apr 13 2017

Blocking: 711030
Components: Infra>Client>Chrome
Status: Fixed (was: Started)
Proximate issue fixed; continuing on infra failure in the general case is https://bugs.chromium.org/p/chromium/issues/detail?id=711030

Sign in to add a comment