New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 721894 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 4
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Feature

Blocking:
issue 767952



Sign in to add a comment

remove summary metrics from tough_video_cases, only use per page metrics

Project Member Reported by jrumm...@chromium.org, May 12 2017

Issue description

See the link to graphs below.
 
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, May 13 2017

Cc: crouleau@google.com
Owner: crouleau@google.com

=== Auto-CCing suspected CL author crouleau@google.com ===

Hi crouleau@google.com, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Caleb Rouleau
  Commit : c52544194f2bf97e72cf9215f2c3d40928929036
  Date   : Tue May 09 17:51:57 2017
  Subject: Merge tough_video_cases_extra into tough_video_cases.

Bisect Details
  Configuration: mac_10_12_perf_bisect
  Benchmark    : media.tough_video_cases_tbmv2
  Metric       : cpu_time_percentage_avg/cpu_time_percentage_avg
  Change       : 32.52% | 0.229632941481 -> 0.304301320532

Revision             Result                      N
chromium@470312      0.229633 +- 0.00282995      6      good
chromium@470356      0.228509 +- 0.00334922      6      good
chromium@470367      0.229666 +- 0.00595407      6      good
chromium@470373      0.225816 +- 0.0101137       6      good
chromium@470374      0.229312 +- 0.00667581      6      good
chromium@470375      0.304201 +- 0.00670207      6      bad       <--
chromium@470376      0.305796 +- 0.0075482       6      bad
chromium@470378      0.302836 +- 0.00570879      6      bad
chromium@470400      0.304301 +- 0.00503317      6      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests media.tough_video_cases_tbmv2

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8979768684108547456

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=6379667335938048


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
Cc: johnchen@chromium.org crouleau@chromium.org
Components: Speed>Telemetry
Owner: ----
Status: WontFix (was: Untriaged)
+John: it seems like having this summary cpu_time_percentage_avg means that every time we add new pages in the benchmark we will get alerts like this. I wonder if it would be better to only have metrics per page but no summary.

This is expected behavior, so the status is WontFix.
We generally recommend per-page metrics: you don't get these kind of alerts, and the bisect will filter for just the page that regressed, so it runs faster and isn't dampened by pages with different performance characteristics.
Labels: -Type-Bug-Regression Type-Feature
Owner: johnchen@chromium.org
Status: Assigned (was: WontFix)
Summary: remove summary metrics from tough_video_cases_tbmv2, only use per page metrics (was: 4%-31.3% regression in media.tough_video_cases_tbmv2 at 470293:470429)
remove summary metrics from tough_video_cases_tbmv2, only use per page metrics

John, can you handle fixing this?
Owner: eakuefner@chromium.org
Ethan just submits a CL: https://codereview.chromium.org/2917423003/
Cc: benjhayden@chromium.org eakuefner@chromium.org
Owner: ----
Status: Available (was: Assigned)
Err, sorry: seems like his CL doesn't address summary metrics in general. I think the best way to avoid this kind of summary alert should be done at dashboard level.

+Ben who is working on updating how summarized data is computed.
Ah, should I simply make the regex from https://chromeperf.appspot.com/edit_sheriffs for my rotation not match the summary metrics?
Cc: -eakuefner@chromium.org
Caleb, that's exactly right; modifying the regex list for alerting is the right place to make sure that you don't get alerts about summary metrics. For example (and I think the rotation you're talking about is Chromium Perf AV Sheriff), there are currently:
ChromiumPerf/*/media.tough_video_cases_tbmv2/*
ChromiumPerf/*/media.tough_video_cases_tbmv2/*/*

By deleting the first line, you will not receive alerts about summary metrics any more.

My CL is orthogonal as Ned pointed out; I'm removing myself from this bug. Feel free to re-add if there are further questions.
Cc: -crouleau@google.com -crouleau@chromium.org
Owner: crouleau@chromium.org
Status: Assigned (was: Available)
Thanks Ethan!

I'll take this.
Cc: charliea@chromium.org
Right now the monitoring is set to:

ChromiumPerf/*/media.tough_video_cases_tbmv2/*
ChromiumPerf/*/media.tough_video_cases_tbmv2/*/*

That includes hundreds of metrics. It'd be better to monitor some specific high-level metrics. Charlie, can you help with guidance on power/cpu metrics?

For memory metrics, we currently monitor these on the system health benchmark:
memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg
memory:chrome:all_processes:reported_by_chrome:gpu:effective_size_avg
memory:chrome:all_processes:reported_by_chrome:malloc:effective_size_avg
memory:chrome:all_processes:reported_by_chrome:skia:effective_size_avg
memory:chrome:all_processes:reported_by_chrome:v8:effective_size_avg
memory:chrome:all_processes:reported_by_os:system_memory:java_heap:proportional_resident_size_avg
memory:chrome:all_processes:reported_by_os:system_memory:native_heap:proportional_resident_size_avg
memory:chrome:all_processes:reported_by_os:system_memory:proportional_resident_size_avg

As for power measurements, we've been monitoring:

ChromiumPerf/*/<benchmark_name>/*energy_sum/*
ChromiumPerf/*/<benchmark_name>/*power_avg/*
ChromiumPerf/*/<benchmark_name>/cpu_time_percentage_avg/*

This is the current minimal set of metrics that we're taking responsibility for right now and saying "if you get an alert on these metrics, and it's not a real alert, that's a problem for us and please let us know".
Okay,

I am going to change it from 

ChromiumPerf/*/media.tough_video_cases_tbmv2/*
ChromiumPerf/*/media.tough_video_cases_tbmv2/*/*

to 

ChromiumPerf/*/media.tough_video_cases_tbmv2/*energy_sum/*
ChromiumPerf/*/media.tough_video_cases_tbmv2/*power_avg/*
ChromiumPerf/*/media.tough_video_cases_tbmv2/cpu_time_percentage_avg/*
ChromiumPerf/*/media.tough_video_cases_tbmv2/memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg/*
ChromiumPerf/*/media.tough_video_cases_tbmv2/memory:chrome:all_processes:reported_by_chrome:gpu:effective_size_avg/*
ChromiumPerf/*/media.tough_video_cases_tbmv2/memory:chrome:all_processes:reported_by_chrome:malloc:effective_size_avg/*
ChromiumPerf/*/media.tough_video_cases_tbmv2/memory:chrome:all_processes:reported_by_chrome:skia:effective_size_avg/*
ChromiumPerf/*/media.tough_video_cases_tbmv2/memory:chrome:all_processes:reported_by_chrome:v8:effective_size_avg/*
ChromiumPerf/*/media.tough_video_cases_tbmv2/memory:chrome:all_processes:reported_by_os:system_memory:java_heap:proportional_resident_size_avg/*
ChromiumPerf/*/media.tough_video_cases_tbmv2/memory:chrome:all_processes:reported_by_os:system_memory:native_heap:proportional_resident_size_avg/*
ChromiumPerf/*/media.tough_video_cases_tbmv2/memory:chrome:all_processes:reported_by_os:system_memory:proportional_resident_size_avg/*

Status: Fixed (was: Assigned)
Done.

I also removed ChromiumPerf/*/media.tough_video_cases/*. I'm working on trimming the metrics that we alert on here: https://docs.google.com/document/d/1TvUWPl6diK3DTJdSWMOcdz-6qnKC6KEHAZqoBBD8lIc/edit#heading=h.pbl9canvrafx
Cc: simonhatch@chromium.org crouleau@google.com hubbe@google.com
 Issue 745924  has been merged into this issue.
Status: Assigned (was: Fixed)
See  issue 745924 . It looks like alerts for summary metrics are still happening. The line in our alerting config that is triggering this is:

ChromiumPerf/*/media.tough_video_cases_tbmv2/memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg/*

In comment #10, eakuefner@ suggested that the summary metrics would not match one I added the last "/*", but this seems to not be true anymore. 
simonhatch, any idea what could be going on here? The alerting page linked from the bug in #17 is https://chromeperf.appspot.com/group_report?sid=9686f126f87fe418f49ce0dec850d9fefacb24b3320a6a915067835c1805b8b6. It has an alert for ChromiumPerf/chromium-rel-win7-gpu-nvidia/media.tough_video_cases_tbmv2/memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg.

When I type the following into the interactive dashboard console:

from dashboard.common import utils
t = utils.TestKey('ChromiumPerf/chromium-rel-win7-gpu-nvidia/media.tough_video_cases_tbmv2/memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg').get()
print t.sheriff

It prints "None" as expected. Looking at the dates on the bug, Caleb changed the alerting pattern on June 7, and the timestamp on the alert is "2017-07-14T06:57:04.119750". So it seems like this alert shouldn't have fired. Any ideas?
Annie, what about

from dashboard.common import utils
t = utils.TestKey('ChromiumPerf/chromium-rel-win7-gpu-nvidia/media.tough_video_cases_tbmv2/memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg/memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg').get()
print t.sheriff

The reason I ask is that  issue 745924  lists the metric as
"memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg/memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg"
I think I'm missing something here; where do you see the metric name repeated with a slash like that? I attached a screenshot of what I'm seeing, which unless I'm reading wrong is what I typed in #18.

When I copy/paste #19, t is None.
metric_name.png
188 KB View Download
Comment 3 of  issue 745924  says: 

Bisect Details
  Configuration: winx64nvidia_perf_bisect
  Benchmark    : media.tough_video_cases_tbmv2
  Metric       : memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg/memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg
 Issue 745938  has been merged into this issue.
Cc: -crouleau@google.com crouleau@chromium.org
Owner: simonhatch@chromium.org
It's unclear what I can do to stop these alerts from coming in. Any suggestions, Simon?
Sorry, didn't notice my name on here, let me look into this.
re: #c19

That's just some artifacts of how bisect expects parameters. The metric memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg/memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg doesn't exist in the datastore.
Summary: remove summary metrics from tough_video_cases, only use per page metrics (was: remove summary metrics from tough_video_cases_tbmv2, only use per page metrics)
Blocking: 767952
Simon filed https://github.com/catapult-project/catapult/issues/3903 to track work on this.
 Issue 767952  has been merged into this issue.
Status: Fixed (was: Assigned)
Think we found the root cause in the github issue a while back by sweeping the datastore and fixing the TestMetadata's.
Components: Test>Telemetry
Components: -Speed>Telemetry

Sign in to add a comment