Broke immediately after push-to-prod: The following CLs removed the metrics that were already in use: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/687956 https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/727539 P1 because infra is blind wrt provision status. For example, I can't tell if this push-to-prod had any impact on provision rates.
Broken dashboard: https://viceroy.corp.google.com/chromeos/provision?duration=6h In addition, the omens page no longer gets the provision inputs: https://cs.corp.google.com/piper///depot/google3/configs/monitoring/chrome_infra/chromeos/omens.py similarly, the provision rate too high alert is broken: https://cs.corp.google.com/piper///depot/google3/configs/monitoring/chrome_infra/chromeos/autotest_alerts.py?l=59
Workaround: The alert metric I just started using ( crbug.com/773806 , cl/173144546) appears unaffected and can be viewed in Pcon: https://pcon.corp.google.com/p#chromeos-infra-alert-owners/queryplayground?query=CBCCAaUDCvMCCBOaAe0CCghmYWlsdXJlcwoFdG90YWwSsAEIEpIBqgEKowEIEYoBnQEKgwEID3p_Cn0qGAoObWV0cmljOnN1Y2Nlc3MSBAgBMAAYADJhChltb25hcmNoLmFjcXVpc2l0aW9ucy5UYXNrEkQKQi9jaHJvbWUvaW5mcmEvY2hyb21lb3MvYXV0b3Rlc3QvcHJvdmlzaW9uL2Nyb3NfdXBkYXRlX2J5X2RldnNlcnZlcioECAAgATCAkJ3pGjj-__________8BEgIIARKVAQgSkgGPAQqIAQgRigGCAQppCA96ZQpjMmEKGW1vbmFyY2guYWNxdWlzaXRpb25zLlRhc2sSRApCL2Nocm9tZS9pbmZyYS9jaHJvbWVvcy9hdXRvdGVzdC9wcm92aXNpb24vY3Jvc191cGRhdGVfYnlfZGV2c2VydmVyKgQIACABMICQnekaOP7__________wESAggBGg0KCwgBGQAAAAAAAAAAGgASLQgFUhsIBFIJCCASBQgCIMgBUgwIIRIECAIgAFICCB9SDAghEgQIAiACUgIIHw&duration=1d&endtime=1508794807&constants=threshold=16&names=Query%201
We need to update the dashboard due to the link of provision rate is changed in the CLs. New metrics: https://pcon.corp.google.com/p#chrome-infra/queryplayground?duration=21600&heatmapColorScale=viceroy&legendtable=true&names=Overall%20provision%20rate%20(passed%20and%20failed)&oldHeatmap=false&outputPoints=900&showEditor=true&title=Overall%20provision%20rate%20(passed%20and%20failed)&yAxisLabel=Jobs/minute&yAxisMin=0&query=CBKSAZUCCv4BCBGKAfgBCuEBCBCCAdsBCsYBCA96wQEKvgEyuwEauAEKDGNocm9tZS1pbmZyYRIcY2hyb21lLWluZnJhLXByZWNvbXB1dGF0aW9ucxoXdGFza2xlc3MgcmF0ZSAoYXV0b2dlbilCcQoLCAoSB2NvdW50ZXJSYgoZbW9uYXJjaC5hY3F1aXNpdGlvbnMuVGFzaxJFCkMvY2hyb21lL2luZnJhL2Nocm9tZW9zL2F1dG90ZXN0L3Byb3Zpc2lvbi9jcm9zX3VwZGF0ZV9wZXJfZGV2c2VydmVyEhAIBFICCB9SCAggEgQIAiB4EgIIAyCAjs4cOP7__________wESAggBGg5tZXRyaWM6c3VjY2Vzcw
It comes back: https://viceroy.corp.google.com/chromeos/provision Mark this as Fixed.
Looks like this was due to a rename of the cros_update_by_devserver metric to cros_update_per_devserver.
Yeh, it had to be changed because we couldn't add a new field to an existing metric (per @akeshet) So we duplicated the original one with the extra fields needed
Comment 1 by pprabhu@chromium.org
, Oct 23 2017