swarming trigger scripts can't emulate sharding behavior of "swarming.py trigger" |
||
Issue description"swarming.py trigger" receives the number of shards it should trigger as a command line argument. When it sees that it should trigger more than one shard, it sets up per-shard environment variables that indicate to the test harness which shard is being run: https://cs.chromium.org/chromium/infra/luci/client/swarming.py?l=241 swarming trigger scripts manually call "swarming.py trigger" for each shard, but don't (and currently can't) tell "swarming.py" trigger that each invocation is intended as "shard I of N". This means that these environment variables aren't set. The target invocation runs all tests in each shard. The collect step is failing when using a swarming trigger script to spawn a sharded test, and I think it's because there are collisions in the output JSON files during the merge. I suggest adding a command line argument to "swarming.py trigger" for use only by trigger scripts like --trigger-only-shard=[NUM]. Then trigger scripts could pass through the --shards argument to "swarming.py trigger" as well as add this new argument. If specified, this new argument would cause trigger_task_shards in swarming.py to only trigger one shard, but modified in the way all shards are currently modified. It would be a small change: if trigger_only_shard is not None: requests = [convert(trigger_only_shard)] else: requests = [convert(index) for index in xrange(shards)] See this CL, attempting to add the first trigger script to the Chromium repository: https://chromium-review.googlesource.com/833505 and this failing try job from Patch Set 10: https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/79372 webgl_conformance_tests, which has two shards, fails. Compare to this successful run on the same tryserver: https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/79610 Note that both shards from the failing run are running all of the tests. I'm not 100% sure whether that's the only reason that invalid JSON is being generated during the collect step, but the GTEST_SHARD_INDEX and GTEST_TOTAL_SHARDS environment variables are definitely missing from the shards triggered by the swarming trigger script.
,
Jan 12 2018
https://chromium-review.googlesource.com/864284 is up for review implementing this. Unfortunately I can't test it with the trigger script until it's landed and rolled into Chromium. Please feel free to take it over from me.
,
Jan 12 2018
Unless mistaken, what you want is to pass the following argument to swarming.py trigger: --env GTEST_SHARD_INDEX 1 --env GTEST_TOTAL_SHARDS 10 with the relevant number for each invocation.
,
Jan 12 2018
Thanks M-A for the idea. Yes, it turns out this is the only other change that was needed on top of the other handling for sharding that the trigger script was already doing. I've folded this into the trigger script and we'll document the need for it. To reduce the dependencies between swarming.py and trigger scripts let me close this as WontFix and abandon the CL. |
||
►
Sign in to add a comment |
||
Comment 1 by kbr@chromium.org
, Jan 12 2018