New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 838871 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jul 25
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Make sure start_with_url.{cold,warm}.startup_pages is alerted on new bots

Project Member Reported by pasko@chromium.org, May 2 2018

Issue description

With clank internal commit 0d21e9e8035b3a625c12b0665fac1e48da08c42d the runs on perf-clankium-l-{phone,tablet} moved to health-plan-clankium-{low-end,}-phone.

Yay!

I checked whether results on the non-low-end phone roughly match, and they do:

https://chromeperf.appspot.com/report?sid=2168cf9630dba06b8d4ace08d36ecc228fbc006a7542ef637c98420d646b00c3

Great!

=== Request 1:

For extra assurance I wanted to get a confirmation that the new graphs are alerted. Sorry if it is trivial, I never remember where to look for the alert configuration, and whether the bot names are part of the alert configuration. Here are the alerts we need:

Bots: health-plan-clankium-low-end-phone, health-plan-clankium-phone

Metrics on these bots: messageloop_start_time, foreground_tab_request_start, foreground_tab_load_complete.

Please confirm :)

=== Request 2:

If not too difficult, please backfill the graphs for health-plan-clankium-phone with the data that previously came from perf-clankium-l-phone. This is in order to reduce surprises when looking at graphs and improve communication over our historical abilities to triage problems and ship improvements.

Thank you :)

Assigning to sullivan@ for triage, out of ignorance mainly :/
 
[Request 3, owner me, also fix https://chrome-health.googleplex.com/health-plan/android-chrome/startup/nexus5/ which is still expecting data from the old bot, but should be switched to the new one.]
Owner: simonhatch@chromium.org
Simon, can you help out?
ClankInternal/*/start_with_url.cold.startup_pages/*
ClankInternal/*/start_with_url.warm.startup_pages/*

This is the existing config for startup_with_url.{cold,warm}.startup_pages, looks like it doesn't care about the bot specifically so your new ones are already covered. Confirmed by checking dev_console and pulling up a test path from health-plan-clankium-low-end-phone. These are alerting on the summary metric though, I can switch these to alert on the individual pages like so:

ClankInternal/*/start_with_url.*.startup_pages/messageloop_start_time/*
ClankInternal/*/start_with_url.*.startup_pages/foreground_tab_request_start/*
ClankInternal/*/start_with_url.*.startup_pages/foreground_tab_load_complete/*

Let me know if you want to do that.
+1 to alert on individual pages rather than aggregates.

Comment 5 by pasko@chromium.org, May 2 2018

Very helpful, thanks!

Summary metrics for messageloop_start_time and foreground_tab_request_start are WAI, but the foreground_tab_load_complete would be better off per-page. The effect of it would probably be tiny, so if it is more than 15 minutes of your time, let's not bother.
Ok switched the alerts over for these start_with_url tests:

ClankInternal/*/start_with_url.*.startup_pages/messageloop_start_time
ClankInternal/*/start_with_url.*.startup_pages/foreground_tab_request_start
ClankInternal/*/start_with_url.*.startup_pages/foreground_tab_load_complete/*

So they'll alert on messageloop_start_time, foreground_tab_request_start, and per-page on foreground_tab_load_complete.

You also wanted the data migrated?

Comment 7 by pasko@chromium.org, May 3 2018

> they'll alert on messageloop_start_time, foreground_tab_request_start, and per-page on foreground_tab_load_complete.

thank you!

> You also wanted the data migrated?

yes please :)
Ok data migration is underway, will probably be done in 20-30 mins.
Status: Fixed (was: Assigned)
Cc: jparent@chromium.org yfried...@chromium.org sullivan@chromium.org
Status: Assigned (was: Fixed)
I would like to reopen this bug as P1 just to have keep the context close. Let me know if creating a new bug is better and I'll do it then.

I am seeing a clear regression on May 17 on this graph:
https://chromeperf.appspot.com/group_report?bug_id=800750

There seems to be no alert, while the metric seems to match the pattern. Simon, can you please take a look?
perezju: oh, sorry, pasted a wrong link, and thank you for a correct link
Cc: dtu@chromium.org
So poking at this a bit, pulling up the TestMetadata in dev_console shows that it does have a sheriff. Went through the backlog of changes around that time and couldn't find any related sheriff test path changes.

I brought it up in /debug_alert and the default settings for alerting actually don't seem to alert. Not until I changes the min_steppiness to 0.45 did an alert show up:

https://chromeperf.appspot.com/debug_alert?test_path=ClankInternal%2Fhealth-plan-clankium-phone%2Fstart_with_url.warm.startup_pages%2Fforeground_tab_request_start&rev=1526517330&num_before=300&num_after=300&config=%7B%0D%0A++%22min_steppiness%22%3A+0.40000000000000002%0D%0A%7D

So maybe minimally we should assign an anomaly config here with updated params. Additionally, I'm a little concerned about this, wondering if we should set aside some time to look at the regression detection algorithm and it's default parameters, and see if we can do better.
Cc: simonhatch@chromium.org
Owner: dtu@chromium.org
thank you for the link to the anomaly detection debugger!

Playing around with a bigger range. So min_steppiness=0.4 is pretty good. It sometimes flags things below 50ms (we won't be able to bisect those), but not too much. The value at 0.3 matches my visual comprehension better, but we may spend too much time bisecting.

Here is my take at 0.4 (slow to load):
https://chromeperf.appspot.com/debug_alert?test_path=ClankInternal%2Fhealth-plan-clankium-phone%2Fstart_with_url.warm.startup_pages%2Fforeground_tab_request_start&rev=1526517330&num_before=3000&num_after=600&config=%7B%0D%0A++"min_steppiness"%3A+0.3%0D%0A%7D

dtu: should we switch to this value or there are other parameters to tweak? Is there a way to discover the current values used for alerting?
Owner: simonhatch@chromium.org
`min_steppiness` the major sensitivity parameter. In some cases, `multiple_of_std_dev` might be relevant, but looks like steppiness is the limiting one here.

`min_steppiness` is 0.5 by default. You can find the default parameter values here:
https://chromium.googlesource.com/catapult.git/+/HEAD/dashboard/dashboard/find_change_points.py#38

-> simonhatch@ to update the alerting config.
Status: Fixed (was: Assigned)
Sorry, this got pushed down my list. Done.
Components: Speed>Dashboard

Sign in to add a comment