Add support for alerts only for specific tags |
||||
Issue descriptionThe alerts for telemetry benchmarks are currently defined for a metric in a benchmark. So if the metric regresses (or improves) for any page in the entire pageset for that benchmark, then that would trigger an alert. This is causing issues with the rendering benchmark, which used to be split across two measurements: smoothness, and thread_times, and the pageset used to be split into a few smaller pagesets [1]. When the measurements and the pagesets were merged into rendering.desktop and rendering.mobile benchmarks, the list of alerts also had to be merged. This is now causing too many alerts to trigger that wouldn't trigger under the old benchmark/pagesets. For example, thread_raster_cpu_time_per_frame used to trigger only for the pages in 'tough_scrolling_cases' pageset, whereas it now triggers for all other pages in the rendering.desktop/rendering.mobile pagesets too. Analysing ~10K alerts for rendering.desktop benchmark from the last few weeks: . thread_raster_cpu_time_per_frame: 508 alerts for tough_scrolling_cases, vs. 3013 for the other pages. . tasks_per_frame_total_all: 178 for tough_compositor_cases, vs. 2133 for the other pages. . percentage_smooth: 264 for top-sites, vs. 1644 for other pages. . thread_total_all_cpu_time_per_frame: 168 for tough_compositor and tough_scrolling cases, vs. 624 for other pages. . mean_frame_time_renderer_compositor: 42 for tough_compositor cases, vs. 516 for other pages. To sum up, instead of triggering 1160 alerts for these metrics, a total of 9090 alerts were triggered. (more context and details: http://g/chrome-gpu-metrics/0TZOgX9PWKA/SoTZAou9AwAJ) So, to help rein in the number of alerts, it would be useful to go back to the set of alerts we had before the big-merge [2]. When the pagesets for smoothness and thread_times benchmarks were merged, we added tags to the stories that represent the old pagesets [3]. So if it were possible to include the story-tag as a filter for the alerts, then that would help with this situation. [1] The list of of pagesets that have been merged: image_decoding_cases key_desktop_move_cases key_hit_test_cases key_idle_power_cases key_noop_cases key_silk_cases maps pathological_mobile_sites polymer simple_mobile_sites top_25_smooth tough_animation_cases tough_canvas_cases tough_compositor_cases tough_filters_cases tough_image_decode_cases tough_path_rendering_cases tough_pinch_zoom_cases tough_scheduling_cases tough_scrolling_cases tough_texture_upload_cases tough_webgl_cases [2] List of old alerts: https://docs.google.com/document/d/1o0oBPMbfw8or2iKjrD23qAFsvMDhlogFFM9DcMPqxmo/edit#heading=h.x6ufcll8nkvt [3] https://cs.chromium.org/chromium/src/tools/perf/page_sets/rendering/story_tags.py?type=cs&sq=package:chromium&g=0
,
Nov 29
,
Nov 29
I think the attached file has the list of alerts we want to get back to the old list. Is there any way to test how many alerts would trigger with this new list in the last X weeks, and compare that with the alerts that actually triggered in this time?
,
Nov 29
Back to Ben :)
,
Dec 1
I tried running through those patterns in the appengine dev_console, calling list_tests.GetTestsMatchingPatterns(), then Anomaly.QueryAsync(test=test). It timed out before it could finish the first GetTestsMatchingPatterns. I could write a bunch of python, deploy it to appengine, and run it in the taskqueue, but I'd rather find a generalizable, teachable way to fish, and I think the new /api/describe will help here.
I tried using the chrome devtools console in v2spa.
cp.ReadTestSuites() showed 3 matching suites: rendering.desktop, rendering.mobile, rendering.oopd.desktop
I hit /api/describe for those 3, computed some set unions, and found 10 matching bots: Android Nexus5 Perf, Android Nexus5X WebView Perf, Android Nexus6 WebView Perf, android-nexus5x-perf, Win 7 Nvidia GPU Perf, Win 7 Perf, linux-perf, mac-10_12_laptop_low_end-perf, mac-10_13_laptop_high_end-perf, win-10-perf
and 10 cpu_time_per_frame measurements (thread_GPU_cpu_time_per_frame, thread_IO_cpu_time_per_frame, thread_browser_cpu_time_per_frame, thread_display_compositor_cpu_time_per_frame, thread_other_cpu_time_per_frame, thread_raster_cpu_time_per_frame, thread_renderer_compositor_cpu_time_per_frame, thread_renderer_main_cpu_time_per_frame, thread_total_all_cpu_time_per_frame, thread_total_fast_path_cpu_time_per_frame
A few loops later, I had a list of 21180 test paths.
I tried hitting /api/alerts?test=${path} in parallel and hoped the browser would queue them up, but it threw INSUFFICIENT_RESOURCES after about 5000. Several of those 5000 did find a few alerts, though, so the approach works. I'd try fetching them serially next. The tab didn't OOM, but it might if it fetched many more requests, I'm not sure.
I'm not sure I'd recommend doing this kind of analysis in devtools. Colab would probably work better.
I need to get home now. Do you want to try using the API in colab? Here's the documentation.
https://github.com/catapult-project/catapult/tree/master/dashboard/dashboard/api
Here's a colab notebook to demonstrate using OAuth:
https://goto.google.com/byauz
Soundwave might also be helpful:
https://github.com/catapult-project/catapult/tree/master/experimental/soundwave
,
Dec 3
I did some local experiments based on the ~10K alerts I got for the initial analysis. It looks like using the new config, only 393 alerts would trigger. I have to run right now ... I will share the scripts I used later tonight so we can double-check.
,
Dec 3
The alerts history file: http://springfield.wat.corp.google.com/stuff/alerts-history/alerts.json The new-config: http://springfield.wat.corp.google.com/stuff/alerts-history/new-config The script to process: http://springfield.wat.corp.google.com/stuff/alerts-history/process-alerts.js (I would like to learn collab one day ... but it seems a bit tricky to get started at the moment)
,
Dec 3
Can you condense that long list of patterns? I'd like to be able to just copy it into the datastore to replace the 13 patterns in #1.
,
Dec 4
What's your suggestion on shrinking the list? (I can't think of a way that can shrink the list much in its current state)
,
Dec 4
Hm, in fact, the list needs to be even bigger than I had it before. I realized that I included the list only for rendering.desktop. It doesn't include some of the metrics for rendering.mobile (e.g. avg_surface_fps etc.). Attached is the new list.
,
Dec 6
The way that these patterns are used is currently not very efficient. https://github.com/catapult-project/catapult/blob/master/dashboard/dashboard/edit_config_handler.py#L274 When updating the sheriff config, each pattern executes a query that can take up to a few minutes to run. The task queue handles 10 patterns per task, and each task times out after 10 minutes, so it's possible that some tasks will fail, and the datastore would be left in an inconsistent state: some timeseries that should be monitored would not be, and vice versa. Even if they all succeed, it might take a day or so for the job to finish. This process would be repeated every time the list of patterns is changed. I described an alternative algorithm that would be much more efficient, but would take a while to implement using Descriptors. Please see #1 for how to condense the list of patterns. For example, I see 40 test cases that would match *_pixels_per_second. You can reduce the number of patterns significantly by replacing them all with that pattern like this: ChromiumPerf/*/rendering.*/input_event_latency/*_pixels_per_second I see a few other sets of stories that can be replaced by patterns such as card_*, *balls_*, microsoft_*, *canvas_*. Alternatively, the stories could be renamed in telemetry to include tags as prefixes separated by colons like "smoothness:balls_svg_animation". Then the test case patterns could be "smoothness:*". Of course, renaming stories might require migrating data from the old test path to the new test path, which would also be expensive so I wouldn't recommend it. I only include this alternative as an example of how patterns can be used. Please feel free to schedule a VC.
,
Dec 9
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/e09a3df387b3ffd12e1ef8e9ad529b8e4577c118 commit e09a3df387b3ffd12e1ef8e9ad529b8e4577c118 Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Sun Dec 09 05:20:14 2018 telemetry: Add an option to print the names of stories. This adds a '--print-only' (or -p) option to run_benchmark to print the list of stories and/or tags for a benchmark. If '-p stories', it prints the list of stories, if '-p tags' is used, it prints the list of tags, and if '-p both' is used, it prints the list of stories, and the name of tags for each story. Examples: $ ./tools/perf/run_benchmark --browser=android-system rendering.mobile -p both accu_weather_2018 top_real_world_desktop,gpu_rasterization accu_weather_desktop_gpu_raster_2018 top_real_world_desktop,gpu_rasterization amazon_2018 top_real_world_desktop,gpu_rasterization amazon_desktop_gpu_raster_2018 top_real_world_desktop,gpu_rasterization amazon_mobile_2018 top_real_world_mobile analog_clock_svg tough_filters androidpolice_mobile_2018 top_real_world_mobile idle_power_animated_gif key_idle_power aquarium_20k tough_webgl,required_webgl aquarium tough_webgl,required_webgl background_color_animation tough_texture_upload background_color_animation_with_gradient tough_texture_upload baidu_mobile_2018 top_real_world_mobile balls_css_key_frame_animations_composited_transform tough_animation balls_css_key_frame_animations tough_animation balls_css_transition_2_properties tough_animation balls_css_transition_40_properties tough_animation balls_css_transition_all_properties tough_animation balls_javascript_canvas tough_animation balls_javascript_css tough_animation balls_svg_animations tough_animation bing_mobile_2018 top_real_world_mobile idle_power_blank key_idle_power ... Another: $ ./tools/perf/run_benchmark --browser=android-system rendering.mobile -p tags List of tags: polymer gpu_rasterization tough_canvas key_noop key_idle_power tough_path_rendering tough_filters simple_mobile_sites tough_scheduling tough_texture_upload maps use_fake_camera_device key_silk top_real_world_desktop tough_image_decode image_decoding required_webgl top_real_world_mobile fastpath motionmark pathological_mobile_sites tough_scrolling tough_compositor tough_webgl tough_animation key_hit_test [ PASSED ] 0 tests. Bug: chromium:906509 Change-Id: I076e2007cbf2b76b8c2bd72d1b0de08dc721f862 Reviewed-on: https://chromium-review.googlesource.com/c/1355960 Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/e09a3df387b3ffd12e1ef8e9ad529b8e4577c118/telemetry/telemetry/internal/story_runner.py
,
Dec 9
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/aac77a144fcea93173ab5e614dc2d1d511897c8b commit aac77a144fcea93173ab5e614dc2d1d511897c8b Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Date: Sun Dec 09 07:24:39 2018 Roll src/third_party/catapult c017b42db491..e09a3df387b3 (1 commits) https://chromium.googlesource.com/catapult.git/+log/c017b42db491..e09a3df387b3 git log c017b42db491..e09a3df387b3 --date=short --no-merges --format='%ad %ae %s' 2018-12-09 sadrul@chromium.org telemetry: Add an option to print the names of stories. Created with: gclient setdep -r src/third_party/catapult@e09a3df387b3 The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel BUG=chromium:906509 TBR=sullivan@chromium.org Change-Id: Ia2198497a72e9277cdc1cd4af945b8857e5acc9f Reviewed-on: https://chromium-review.googlesource.com/c/1369253 Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#614994} [modify] https://crrev.com/aac77a144fcea93173ab5e614dc2d1d511897c8b/DEPS |
||||
►
Sign in to add a comment |
||||
Comment 1 by benjhayden@google.com
, Nov 26Owner: sadrul@chromium.org
Sheriff configs are based on test path patterns. We don't currently have plans to migrate those patterns towards a richer data structure like Descriptors, which would be flexible enough to add a parameter for story tags. If we decide to do that, it probably won't be done for another quarter or two, since it would require changing several other things. It sounds like a short-term solution is needed. Here are the test path patterns for the rendering benchmarks in Chromium Perf Sheriff: ChromiumPerf/*/rendering.*/first_gesture_scroll_update_latency/* ChromiumPerf/*/rendering.*/frame_times/* ChromiumPerf/*/rendering.*/mean_frame_time_renderer_compositor/* ChromiumPerf/*/rendering.*/mean_pixels_approximated/* ChromiumPerf/*/rendering.*/percentage_smooth/* ChromiumPerf/*/rendering.*/queueing_durations/* ChromiumPerf/*/rendering.*/tasks_per_frame_total_all/* ChromiumPerf/*/rendering.*/thread_raster_cpu_time_per_frame/* ChromiumPerf/*/rendering.*/thread_total_all_cpu_time_per_frame/* ChromiumPerf/*/rendering.mobile/*cpu_time_per_frame/* ChromiumPerf/*/rendering.mobile/avg_surface_fps/* ChromiumPerf/*/rendering.mobile/tasks_per_second_total_all/* ChromiumPerf/*/rendering.mobile/thread_total_all_cpu_time_per_second/* All 13 of these patterns match all test cases. It sounds like the problem is that the number of test cases recently increased dramatically. I see 515 test cases under ChromiumPerf/linux-perf/rendering.desktop/percentage_smooth. I'll assume that there are a similar number of test cases for the other bots and measurements. Which test cases do you actually want to monitor? If there are few enough of them, then I would recommend simply changing the sheriffing config to list them explicitly. (If you aren't a chromeperf Admin, I'd copy-paste the patterns from this bug into /edit_sheriffs for you.) You might not need to list all of them if you can construct a pattern to match only the test cases that you want. For example, there are 40 ToughFastScrollingPage subclasses in tough_scrolling_cases.py. If you listed them explicitly, you'd need to list 13 measurements * 40 test cases = 520 patterns. I'd write some code to generate that cross-product for you, and hope that that many patterns doesn't slow down chromeperf too much. However, they all look like *_pixels_per_second, so you'd only need to list 13 measurements * 1 test case pattern = 13 test path patterns. sadrul: Can you enumerate which test cases you actually want to monitor and try to condense that list to a few glob patterns? If you can generate the cross product of those 13 measurements with those few case patterns, then I can copy-paste it into /edit_sheriffs for you, or I can generate the cross product and confirm it with you before pasting it in. Of course, if and when sheriff configs migrate to Descriptors, they could avoid computing cross products entirely by storing objects containing arrays like {suites, bots, measurements, cases, tags}, and use a new algorithm other than glob to match timeseries. Again, no plans to do that yet.