Guarantee story ordering in telemetry |
||||
Issue descriptionFrom https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf.fyi%2FAndroid_Go%2F1571%2F%2B%2Frecipes%2Fsteps%2Fperformance_test_suite_on_Android_device_gobo%2F0%2Fstdout Traceback (most recent call last): File "/b/c/b/Android_Go/src/tools/perf/process_perf_results.py", line 441, in <module> sys.exit(main()) File "/b/c/b/Android_Go/src/tools/perf/process_perf_results.py", line 437, in main args.smoke_test_mode) File "/b/c/b/Android_Go/src/tools/perf/process_perf_results.py", line 268, in _process_perf_results _merge_json_output(output_json, test_results_list, extra_links) File "/b/c/b/Android_Go/src/tools/perf/process_perf_results.py", line 107, in _merge_json_output merged_results = results_merger.merge_test_results(jsons_to_merge) File "/b/c/b/Android_Go/src/tools/perf/core/results_merger.py", line 72, in merge_test_results return _merge_json_test_result_format(shard_results_list) File "/b/c/b/Android_Go/src/tools/perf/core/results_merger.py", line 145, in _merge_json_test_result_format merge('tests', merge_tries) File "/b/c/b/Android_Go/src/tools/perf/core/results_merger.py", line 141, in <lambda> result_json, merged_results, key, merge_func) File "/b/c/b/Android_Go/src/tools/perf/core/results_merger.py", line 264, in merge_value dest[key] = merge_func(source[key], dest[key]) File "/b/c/b/Android_Go/src/tools/perf/core/results_merger.py", line 214, in merge_tries prefix, k, v, curr_node, dest_node)) core.results_merger.MergeException: MergeFailure for tests :blink_perf.layout:word-wrap-break-word.html:actual: u'PASS' not mergable, curr_node: {u'actual': u'PASS', u'artifacts': {u'logs': [u'https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/002acb22-685e-11e8-bb3f-0242ac110006']}, u'is_unexpected': False, u'times': [26.151926040649414], u'time': 26.151926040649414, u'expected': u'PASS'} dest_node: {u'actual': u'PASS', u'artifacts': {u'logs': [u'https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/9855075e-685f-11e8-ad78-0242ac110007']}, u'is_unexpected': False, u'times': [26.06502604484558], u'time': 26.06502604484558, u'expected': u'PASS'} WARNING:root:merge_cmd had non-zero return code: 1 step returned non-zero exit code: 2
,
Jun 5 2018
What is more interesting is that two different configurations running the same shard map are running different tests for index 0:16. If you look at shard 1 of pixel 2: https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=fb9bde16eb2131488757ca572c935f158f9e10c0&as=test_results.json vs android go: https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=391e98eb25606250e708475e12e29a38ce965a28&as=test_results.json They are completely different subsets. Now this could be copmletely unrelated to the fact that two shards are running the same test which is the error in this bug. But we assume for sharding purposes that it is a deterministic set and this is proving that the indices are different based on configuration.
,
Jun 5 2018
So it turns out that we cannot guarantee story ordering like our begin and end indices implemented in telemetry assume we do. blink_perf greps a set of directories for story names and this can vary based on configuration and in our case even bot. Therefore we need to implement something that guarantees alphabetical order in story sets so that they are always run in the same order. We are proposing doing this at the StorySet level so it is consistent across any users of telemetry.
,
Jun 5 2018
,
Jun 5 2018
+Juan: see #3
,
Jun 5 2018
We do realize for sharding this means that when new stories are added it will break device affinity for at least two stories since it will shift the index by one.
,
Jun 5 2018
#6: I believe it's "at most two stories"
,
Jun 6 2018
Is this something specific to blink_perf? Or Telemetry in general? I think in Telemetry (as implemented) the StorySet's are actually Story "lists"? https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/story/story_set.py?rcl=ba9d017c9e77b3e6bcd2bfa7dceeedc9f1de2414&l=34
,
Jun 6 2018
blink_perf is the only place we have noticed so far. Even though it is a list, we can't guarantee that the user is going to add them in a deterministic order.
,
Jun 6 2018
Hmm.. yes, I see what the problem is. I think the root cause is that Telemetry allows to create these page sets completely dynamically, based on essentially any arbitrary property of the environment in which where they are running (like which files are found on some folder on the host). But the sharding mechanism wants to see the page sets as "static" things, which can be deterministically generated (maybe even on a platform different than the intended target platform where the tests are meant to run) and stably indexed as a list. On the short term the blink_perf issue can probably be addressed by changing the page set generation to make sure pages *are* added in a specific pre-determined order. But the more general problem won't go away even if we say, patch the StorySet object to always iterate over pages alphabetically (*), because the user may still decide to add more or less pages depending on random environment conditions making indices point to different pages on different setups. What I would like to see in the longer term is moving in the direction where page set can be statically generated as I suggested above. (*) Note, if we are not careful, this will also break the (few) benchmarks where stories are not independent and the order in which they run is important.
,
Jun 6 2018
Ok it sounds like there might be a couple of different things wrong with our alphabetical assumption: 1) there are some benchmarks that order they are run is important, so making the change to alphabetize them might actually break the benchmark 2) Users could still choose to add different stories based on different variables so alphabetizing them won't guarantee indices at that point. I think these invalid assumptions mean we need to have a larger conversation around the --experimental-story-shard-*-index and whether it is still valid or not. I *think* for sharding purposes if we fix the blink_perf case and re-generate the maps based on that order we might be ok. I have not verified that this doesn't happen for other benchmarks, this is just the first one we noticed.
,
Jun 6 2018
To the point of (2), a benchmark to run on the perf bots is always guaranteed to have same set of variables. To the point of (1), which are those benchmarks? I recall we kill almost all the benchmarks that have dependency between the stories. Bisect per-story cannot work well if that is the case. Having said that, I am fine with fix blink_perf to unblock the sharding work, since it seems there are lots of nuances in addressing this issue.
,
Jun 6 2018
Yes I agree. My thoughts are that this should not affect too many benchmarks, so a one-off fix for blink_perf should be fine for now. Most other benchmarks really just add a hard-coded list of stories to the set. I'm happy to take this bug after the blink_perf fix to start the larger conversation on what the more general solution should look like.
,
Jun 6 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/eb4300c3b8e0b2214168c8a9fd9e34b7b5448089 commit eb4300c3b8e0b2214168c8a9fd9e34b7b5448089 Author: Emily Hanley <eyaich@google.com> Date: Wed Jun 06 20:07:23 2018 Sorting blink_perf story sets to be in a determinisitic order Bug: 849541 Change-Id: Idbd3c3a0339c933c70a5a2335a1faca5f507eb72 Reviewed-on: https://chromium-review.googlesource.com/1089118 Commit-Queue: Emily Hanley <eyaich@chromium.org> Commit-Queue: Ned Nguyen <nednguyen@google.com> Reviewed-by: Ned Nguyen <nednguyen@google.com> Cr-Commit-Position: refs/heads/master@{#565014} [modify] https://crrev.com/eb4300c3b8e0b2214168c8a9fd9e34b7b5448089/tools/perf/benchmarks/blink_perf.py
,
Jun 7 2018
Ah, I didn't see comment in #12 before my reply in #13.
On the point (2) because of the current StorySet.AddPage API, users are free to dynamically generate story sets based on *whatever* conditions they want (not just bot configs). An extreme example:
if today is Monday:
self.AddPage('foo')
else:
self.AddPage('bar')
This is of course silly, but points out (a) the sort of pitfalls we need to avoid [the blink_perf example where the pages that get added depend on files that may or may not exist on the host is not that different] and (b) the issue is not only about the *order* of the pages in the set; but also about which pages get added to it.
My (half-baked) thoughts at the moment on how to solve this is that some properties of stories (like their names) should be accessible on the *class* even before instantiating the object (this is already true e.g. of system health stories); so the story-set can then literally be a hard-coded list of story-classes.
Of course you could still do crazy dynamic things (in the end classes are also just objects), but it would be far less easy to make mistakes in the most common case (because then it's hard to accidentally depend on values from the instantiated objects), and if some dynamism is needed it will be easier to make more explicit which particular bot config details affect the generated story sets.
On the point (1) both memory.top_10_mobile (requires foreground/background pairs of stories to run in that order) and memory.dual_browser_test (requires chrome/webview pairs of stories to run in that order). - Note I think we should *also* fix this, and remove oddities in these benchmarks; but that won't solve the problems of (2) above.
|
||||
►
Sign in to add a comment |
||||
Comment 1 by eyaich@chromium.org
, Jun 5 2018Owner: eyaich@chromium.org
Status: Started (was: Untriaged)