I have no particular knowledge about telemetry, I don't know why I should diagnose a silent internal failure in it. You should try reproducing locally and see what is happening?
Summary: Makes sure isolated_script_test includes shard's exit code in the combined json. (was: Unclear test failure reason)
Sorry for wasting everyone's time, looks like shard 2 return exit code 1:
https://chromium-swarm.appspot.com/user/task/30bc21b33b7aa810
Maruel is correct about swarming working fine, this is a feature request to isolated_script_test to also include shard's exit code in the combined json.
I'm not sure that adding a per-shard exit code in the combined JSON is what's really desired. That JSON isn't easily visible on the waterfall. It sounds to me like what is desired is better visibility into which shard failed?
The code which produces the combined JSON and the links on the waterfall is _isolated_script_collect_step in scripts/slave/recipe_modules/swarming/api.py in the tools/build workspace:
https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/swarming/api.py?q=_isolated_script_collect_step&sq=package:chromium&l=757&dr=CSs
I don't know how to find the exit code for each shard. Maybe maruel@ or vadimsh@ can help with that. The "collect" step seems to be accumulating all of the shards' exit codes and returning 0 if they all passed, or non-zero if any failed.
If you can find the per-shard exit codes, then if there are any non-zero ones, something like:
step_result.presentation.logs['failed_shards'] = 'The following shards failed: ' + ...
could be added.
Note that the wrapper script which generates the per-shard JSON is (I think):
src/testing/scripts/run_telemetry_as_googletest.py
and it's impossible to turn the per-shard result to "valid: False" because this script thinks that it got a 0 exit code from its sub-process.
If you grep for "Exit: " you can find it in the stdout but that's not intuitive.
I agree with what Ken said. swarming.py knows each of the shard exit code, it is in the json file. The json results metadata that (task ids, exit codes) is accessible to the recipe as the variable 'outdir_json' defined at line 781 and the information in there could be presented in a better way. Right now it's just ignoring each shard exit code, which is incorrect.
Comment 1 by mar...@chromium.org
, Aug 19 2016Owner: nednguyen@chromium.org