New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 906509 link

Starred by 2 users

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac
Pri: 2
Type: Bug



Sign in to add a comment

Add support for alerts only for specific tags

Project Member Reported by sadrul@chromium.org, Nov 19

Issue description

The alerts for telemetry benchmarks are currently defined for a metric in a benchmark. So if the metric regresses (or improves) for any page in the entire pageset for that benchmark, then that would trigger an alert.

This is causing issues with the rendering benchmark, which used to be split across two measurements: smoothness, and thread_times, and the pageset used to be split into a few smaller pagesets [1]. When the measurements and the pagesets were merged into rendering.desktop and rendering.mobile benchmarks, the list of alerts also had to be merged. This is now causing too many alerts to trigger that wouldn't trigger under the old benchmark/pagesets. For example, thread_raster_cpu_time_per_frame used to trigger only for the pages in 'tough_scrolling_cases' pageset, whereas it now triggers for all other pages in the rendering.desktop/rendering.mobile pagesets too.

Analysing ~10K alerts for rendering.desktop benchmark from the last few weeks:
  . thread_raster_cpu_time_per_frame: 508 alerts for tough_scrolling_cases, vs. 3013 for the other pages.
  . tasks_per_frame_total_all: 178 for tough_compositor_cases, vs. 2133 for the other pages.
  . percentage_smooth: 264 for top-sites, vs. 1644 for other pages.
  . thread_total_all_cpu_time_per_frame: 168 for tough_compositor and tough_scrolling cases, vs. 624 for other pages.
  . mean_frame_time_renderer_compositor: 42 for tough_compositor cases, vs. 516 for other pages.

To sum up, instead of triggering 1160 alerts for these metrics, a total of 9090 alerts were triggered. (more context and details: http://g/chrome-gpu-metrics/0TZOgX9PWKA/SoTZAou9AwAJ)

So, to help rein in the number of alerts, it would be useful to go back to the set of alerts we had before the big-merge [2]. When the pagesets for smoothness and thread_times benchmarks were merged, we added tags to the stories that represent the old pagesets [3]. So if it were possible to include the story-tag as a filter for the alerts, then that would help with this situation.

[1] The list of of pagesets that have been merged:
    image_decoding_cases
    key_desktop_move_cases
    key_hit_test_cases
    key_idle_power_cases
    key_noop_cases
    key_silk_cases
    maps
    pathological_mobile_sites
    polymer
    simple_mobile_sites
    top_25_smooth
    tough_animation_cases
    tough_canvas_cases
    tough_compositor_cases
    tough_filters_cases
    tough_image_decode_cases
    tough_path_rendering_cases
    tough_pinch_zoom_cases
    tough_scheduling_cases
    tough_scrolling_cases
    tough_texture_upload_cases
    tough_webgl_cases
[2] List of old alerts: https://docs.google.com/document/d/1o0oBPMbfw8or2iKjrD23qAFsvMDhlogFFM9DcMPqxmo/edit#heading=h.x6ufcll8nkvt
[3] https://cs.chromium.org/chromium/src/tools/perf/page_sets/rendering/story_tags.py?type=cs&sq=package:chromium&g=0
 
Cc: benjhayden@chromium.org
Owner: sadrul@chromium.org
Sheriff configs are based on test path patterns. We don't currently have plans to migrate those patterns towards a richer data structure like Descriptors, which would be flexible enough to add a parameter for story tags. If we decide to do that, it probably won't be done for another quarter or two, since it would require changing several other things. It sounds like a short-term solution is needed.

Here are the test path patterns for the rendering benchmarks in Chromium Perf Sheriff:

ChromiumPerf/*/rendering.*/first_gesture_scroll_update_latency/*
ChromiumPerf/*/rendering.*/frame_times/*
ChromiumPerf/*/rendering.*/mean_frame_time_renderer_compositor/*
ChromiumPerf/*/rendering.*/mean_pixels_approximated/*
ChromiumPerf/*/rendering.*/percentage_smooth/*
ChromiumPerf/*/rendering.*/queueing_durations/*
ChromiumPerf/*/rendering.*/tasks_per_frame_total_all/*
ChromiumPerf/*/rendering.*/thread_raster_cpu_time_per_frame/*
ChromiumPerf/*/rendering.*/thread_total_all_cpu_time_per_frame/*
ChromiumPerf/*/rendering.mobile/*cpu_time_per_frame/* 
ChromiumPerf/*/rendering.mobile/avg_surface_fps/*
ChromiumPerf/*/rendering.mobile/tasks_per_second_total_all/* 
ChromiumPerf/*/rendering.mobile/thread_total_all_cpu_time_per_second/* 

All 13 of these patterns match all test cases.
It sounds like the problem is that the number of test cases recently increased dramatically.
I see 515 test cases under ChromiumPerf/linux-perf/rendering.desktop/percentage_smooth. I'll assume that there are a similar number of test cases for the other bots and measurements.

Which test cases do you actually want to monitor?
If there are few enough of them, then I would recommend simply changing the sheriffing config to list them explicitly. (If you aren't a chromeperf Admin, I'd copy-paste the patterns from this bug into /edit_sheriffs for you.)
You might not need to list all of them if you can construct a pattern to match only the test cases that you want.
For example, there are 40 ToughFastScrollingPage subclasses in tough_scrolling_cases.py. If you listed them explicitly, you'd need to list 13 measurements * 40 test cases = 520 patterns. I'd write some code to generate that cross-product for you, and hope that that many patterns doesn't slow down chromeperf too much.
However, they all look like *_pixels_per_second, so you'd only need to list 13 measurements * 1 test case pattern = 13 test path patterns.

sadrul: Can you enumerate which test cases you actually want to monitor and try to condense that list to a few glob patterns?
If you can generate the cross product of those 13 measurements with those few case patterns, then I can copy-paste it into /edit_sheriffs for you, or I can generate the cross product and confirm it with you before pasting it in.

Of course, if and when sheriff configs migrate to Descriptors, they could avoid computing cross products entirely by storing objects containing arrays like {suites, bots, measurements, cases, tags}, and use a new algorithm other than glob to match timeseries. Again, no plans to do that yet.

Status: Started (was: Assigned)
I think the attached file has the list of alerts we want to get back to the old list. Is there any way to test how many alerts would trigger with this new list in the last X weeks, and compare that with the alerts that actually triggered in this time?
alerts
28.5 KB View Download
Owner: benjhayden@chromium.org
Back to Ben :)
I tried running through those patterns in the appengine dev_console, calling list_tests.GetTestsMatchingPatterns(), then Anomaly.QueryAsync(test=test). It timed out before it could finish the first GetTestsMatchingPatterns. I could write a bunch of python, deploy it to appengine, and run it in the taskqueue, but I'd rather find a generalizable, teachable way to fish, and I think the new /api/describe will help here.

I tried using the chrome devtools console in v2spa.
cp.ReadTestSuites() showed 3 matching suites: rendering.desktop, rendering.mobile, rendering.oopd.desktop
I hit /api/describe for those 3, computed some set unions, and found 10 matching bots: Android Nexus5 Perf, Android Nexus5X WebView Perf, Android Nexus6 WebView Perf, android-nexus5x-perf, Win 7 Nvidia GPU Perf, Win 7 Perf, linux-perf, mac-10_12_laptop_low_end-perf, mac-10_13_laptop_high_end-perf, win-10-perf
and 10 cpu_time_per_frame measurements (thread_GPU_cpu_time_per_frame, thread_IO_cpu_time_per_frame, thread_browser_cpu_time_per_frame, thread_display_compositor_cpu_time_per_frame, thread_other_cpu_time_per_frame, thread_raster_cpu_time_per_frame, thread_renderer_compositor_cpu_time_per_frame, thread_renderer_main_cpu_time_per_frame, thread_total_all_cpu_time_per_frame, thread_total_fast_path_cpu_time_per_frame

A few loops later, I had a list of 21180 test paths.
I tried hitting /api/alerts?test=${path} in parallel and hoped the browser would queue them up, but it threw INSUFFICIENT_RESOURCES after about 5000. Several of those 5000 did find a few alerts, though, so the approach works. I'd try fetching them serially next. The tab didn't OOM, but it might if it fetched many more requests, I'm not sure.
I'm not sure I'd recommend doing this kind of analysis in devtools. Colab would probably work better.
I need to get home now. Do you want to try using the API in colab? Here's the documentation.
https://github.com/catapult-project/catapult/tree/master/dashboard/dashboard/api
Here's a colab notebook to demonstrate using OAuth:
https://goto.google.com/byauz

Soundwave might also be helpful:
https://github.com/catapult-project/catapult/tree/master/experimental/soundwave
I did some local experiments based on the ~10K alerts I got for the initial analysis. It looks like using the new config, only 393 alerts would trigger.

I have to run right now ... I will share the scripts I used later tonight so we can double-check.
The alerts history file: http://springfield.wat.corp.google.com/stuff/alerts-history/alerts.json

The new-config: http://springfield.wat.corp.google.com/stuff/alerts-history/new-config

The script to process: http://springfield.wat.corp.google.com/stuff/alerts-history/process-alerts.js

(I would like to learn collab one day ... but it seems a bit tricky to get started at the moment)
Owner: sadrul@chromium.org
Can you condense that long list of patterns? I'd like to be able to just copy it into the datastore to replace the 13 patterns in #1.
What's your suggestion on shrinking the list? (I can't think of a way that can shrink the list much in its current state)
Hm, in fact, the list needs to be even bigger than I had it before. I realized that I included the list only for rendering.desktop. It doesn't include some of the metrics for rendering.mobile (e.g. avg_surface_fps etc.). Attached is the new list.
complete-alerts
43.5 KB View Download
The way that these patterns are used is currently not very efficient.
https://github.com/catapult-project/catapult/blob/master/dashboard/dashboard/edit_config_handler.py#L274
When updating the sheriff config, each pattern executes a query that can take up to a few minutes to run. The task queue handles 10 patterns per task, and each task times out after 10 minutes, so it's possible that some tasks will fail, and the datastore would be left in an inconsistent state: some timeseries that should be monitored would not be, and vice versa. Even if they all succeed, it might take a day or so for the job to finish. This process would be repeated every time the list of patterns is changed.
I described an alternative algorithm that would be much more efficient, but would take a while to implement using Descriptors.

Please see #1 for how to condense the list of patterns. For example, I see 40 test cases that would match *_pixels_per_second. You can reduce the number of patterns significantly by replacing them all with that pattern like this:
ChromiumPerf/*/rendering.*/input_event_latency/*_pixels_per_second
I see a few other sets of stories that can be replaced by patterns such as card_*, *balls_*, microsoft_*, *canvas_*.

Alternatively, the stories could be renamed in telemetry to include tags as prefixes separated by colons like "smoothness:balls_svg_animation". Then the test case patterns could be "smoothness:*". Of course, renaming stories might require migrating data from the old test path to the new test path, which would also be expensive so I wouldn't recommend it. I only include this alternative as an example of how patterns can be used.

Please feel free to schedule a VC.
Project Member

Comment 12 by bugdroid1@chromium.org, Dec 9

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/e09a3df387b3ffd12e1ef8e9ad529b8e4577c118

commit e09a3df387b3ffd12e1ef8e9ad529b8e4577c118
Author: Sadrul Habib Chowdhury <sadrul@chromium.org>
Date: Sun Dec 09 05:20:14 2018

telemetry: Add an option to print the names of stories.

This adds a '--print-only' (or -p) option to run_benchmark to print the
list of stories and/or tags for a benchmark. If '-p stories', it prints
the list of stories, if '-p tags' is used, it prints the list of tags,
and if '-p both' is used, it prints the list of stories, and the name
of tags for each story.

Examples:

  $ ./tools/perf/run_benchmark --browser=android-system rendering.mobile -p both
    accu_weather_2018                                    top_real_world_desktop,gpu_rasterization
    accu_weather_desktop_gpu_raster_2018                 top_real_world_desktop,gpu_rasterization
    amazon_2018                                          top_real_world_desktop,gpu_rasterization
    amazon_desktop_gpu_raster_2018                       top_real_world_desktop,gpu_rasterization
    amazon_mobile_2018                                   top_real_world_mobile
    analog_clock_svg                                     tough_filters
    androidpolice_mobile_2018                            top_real_world_mobile
    idle_power_animated_gif                              key_idle_power
    aquarium_20k                                         tough_webgl,required_webgl
    aquarium                                             tough_webgl,required_webgl
    background_color_animation                           tough_texture_upload
    background_color_animation_with_gradient             tough_texture_upload
    baidu_mobile_2018                                    top_real_world_mobile
    balls_css_key_frame_animations_composited_transform  tough_animation
    balls_css_key_frame_animations                       tough_animation
    balls_css_transition_2_properties                    tough_animation
    balls_css_transition_40_properties                   tough_animation
    balls_css_transition_all_properties                  tough_animation
    balls_javascript_canvas                              tough_animation
    balls_javascript_css                                 tough_animation
    balls_svg_animations                                 tough_animation
    bing_mobile_2018                                     top_real_world_mobile
    idle_power_blank                                     key_idle_power
    ...

Another:

  $ ./tools/perf/run_benchmark --browser=android-system rendering.mobile -p tags
  List of tags:
  polymer
  gpu_rasterization
  tough_canvas
  key_noop
  key_idle_power
  tough_path_rendering
  tough_filters
  simple_mobile_sites
  tough_scheduling
  tough_texture_upload
  maps
  use_fake_camera_device
  key_silk
  top_real_world_desktop
  tough_image_decode
  image_decoding
  required_webgl
  top_real_world_mobile
  fastpath
  motionmark
  pathological_mobile_sites
  tough_scrolling
  tough_compositor
  tough_webgl
  tough_animation
  key_hit_test
  [  PASSED  ] 0 tests.


Bug: chromium:906509
Change-Id: I076e2007cbf2b76b8c2bd72d1b0de08dc721f862
Reviewed-on: https://chromium-review.googlesource.com/c/1355960
Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>

[modify] https://crrev.com/e09a3df387b3ffd12e1ef8e9ad529b8e4577c118/telemetry/telemetry/internal/story_runner.py

Project Member

Comment 13 by bugdroid1@chromium.org, Dec 9

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/aac77a144fcea93173ab5e614dc2d1d511897c8b

commit aac77a144fcea93173ab5e614dc2d1d511897c8b
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Sun Dec 09 07:24:39 2018

Roll src/third_party/catapult c017b42db491..e09a3df387b3 (1 commits)

https://chromium.googlesource.com/catapult.git/+log/c017b42db491..e09a3df387b3


git log c017b42db491..e09a3df387b3 --date=short --no-merges --format='%ad %ae %s'
2018-12-09 sadrul@chromium.org telemetry: Add an option to print the names of stories.


Created with:
  gclient setdep -r src/third_party/catapult@e09a3df387b3

The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG=chromium:906509
TBR=sullivan@chromium.org

Change-Id: Ia2198497a72e9277cdc1cd4af945b8857e5acc9f
Reviewed-on: https://chromium-review.googlesource.com/c/1369253
Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#614994}
[modify] https://crrev.com/aac77a144fcea93173ab5e614dc2d1d511897c8b/DEPS

Sign in to add a comment