Single shard failures should not cause full retries |
|||
Issue descriptione.g. see https://bugs.chromium.org/p/chromium/issues/detail?id=894637 When a single shard times out or produces invalid results, all contained tests should assumed to have "failed". However, successful tests from other shards do not need to be rerun.
,
Oct 12
I suspect that this will be difficult because the recipe has no way of knowing what tests the shard *would/should* have run. This will likely require changing the recipe <-> test runner interface, which is non-trivial. I suspect that this will help a lot with Android builds especially, since it's common for a single shard to fail due to ADB issues.
,
Oct 12
martiniss also pointed to bug 394826 earlier, but that seems to be covering a different issue. Another option may be to improve the android test runner's timeout handling. When a shard times out, swarming sends it a sigterm, waits for a grace period, then sigkills if the test is still running. It looks like the test runner is catching the sigterm and exits without writing the results json file. (ie: there's no isolated out for https://chromium-swarm.appspot.com/task?id=407d3fa49b092e10) That may be a bug in the test runner. Fixing that would have lessened the impact of bug 894637 I believe, since the recipe would have gotten the full test results for every shard.
,
Oct 12
Filed bug 895027 for the android test runner's results on timeout.
,
Dec 4
|
|||
►
Sign in to add a comment |
|||
Comment 1 by erikc...@chromium.org
, Oct 12