Make label_cleaner work again, and collect metrics |
||||
Issue description
label_cleaner is dying in prod with:
ImportError: No module named MySQLdb
Traceback (most recent call last):
File "/usr/local/autotest/site_utils/label_cleaner.py", line 26, in <module>
import MySQLdb
- Make it work again.
- Add monarch metrics so it doesn't remain silent.
,
Aug 7 2017
+ add a graph to viceroy dashboard under DB inconsistencies.
,
Aug 7 2017
,
Aug 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/194b870d7330134acf0acac164907699a0204b43 commit 194b870d7330134acf0acac164907699a0204b43 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Mon Aug 07 23:42:45 2017 [autotest] Fix import order in label_cleaner MySQLdb is installed via build_externals on prod machines. This means that it is only available after 'import common' BUG= chromium:753134 TEST=Run label_cleaner locally. Change-Id: Id91b8cbdc718c517720b5e2bc5ddef7f47d7f334 Reviewed-on: https://chromium-review.googlesource.com/604723 Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> [modify] https://crrev.com/194b870d7330134acf0acac164907699a0204b43/site_utils/label_cleaner.py
,
Aug 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/826844901b8e261729d5727cf12331e85bcbd6e0 commit 826844901b8e261729d5727cf12331e85bcbd6e0 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Tue Aug 08 17:09:10 2017 [autotest] Emit metrics from label_cleaner + cleanup some weird python idioms BUG= chromium:753134 TEST=Cleaned labels from local autotest setup; metrics testing by setting debug_file in ts_mon. Change-Id: I6a19d41b868394b9459a514acefe8df700fff4d3 Reviewed-on: https://chromium-review.googlesource.com/604758 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Paul Hobbs <phobbs@google.com> [modify] https://crrev.com/826844901b8e261729d5727cf12331e85bcbd6e0/site_utils/label_cleaner.py
,
Aug 11 2017
label cleaner has been working a few days and spewing metrics. Here are some indicative graphs (to be added to viceroy): Every thing filtered to show only the master DB (shards are also cleaning their own labels): Tick (# of iterations / 6 hours): http://shortn/_Z58Uy7bDT4 I thought it was running once a day, but I was clearly wrong Total/Used labels (averaged over a day): http://shortn/_qI1CVmUgu6 This is also whacky # used labels > # total labels Total labels deleted in the past day: http://shortn/_I8CBooM5yr This suggests that we moved the label cleaner from chromeos-server2 to chromeos-server18. I didn't do it, so who did?
,
Jan 10 2018
Well, label_cleaner has been working alright according to the graphs above, but it's cleaning <100 labels per day out of the 100K unused labels it finds. It'll take a while to catch up! Logs: 2017-08-10 06:36:14 INFO | Attempting refresh to obtain initial access_token 2017-08-10 06:36:14 INFO | Refreshing access_token 2017-08-10 06:36:16 INFO | Starting new HTTP connection (1): metadata.google.internal 2017-08-10 06:36:16 NOTIC| ts_mon was set up. 2017-08-10 06:36:16 INFO | Label cleaner starts. Will delete all labels whose prefix is "fwrw-version". 2017-08-10 06:36:16 INFO | Target database: 172.24.26.45. 2017-08-10 06:36:16 DEBUG| Running: '\nSELECT id FROM afe_labels WHERE name LIKE "fwrw-version%"\n' 2017-08-10 06:36:16 INFO | Found total 310 labels 2017-08-10 06:36:16 DEBUG| Running: '\nSELECT DISTINCT(label_id) FROM afe_autotests_dependency_labels UNION\nSELECT DISTINCT(label_id) FROM afe_hosts_labels UNION\nSELECT DISTINCT(label_id) FROM afe_jobs_dependency_labels UNION\nSELECT DISTINCT(label_id) FROM afe_shards_labels UNION\nSELECT DISTINCT(label_id) FROM afe_parameterized_jobs UNION\nSELECT DISTINCT(meta_host) FROM afe_host_queue_entries\n' 2017-08-10 06:36:17 INFO | Found 93883 labels are used 2017-08-10 06:36:17 INFO | Deleting 2 unused labels 2017-08-10 06:36:17 DEBUG| Running: '\nDELETE FROM afe_labels WHERE id in (693680,693478)\n' 2017-08-10 06:36:17 INFO | Attempting refresh to obtain initial access_token 2017-08-10 06:36:17 INFO | Refreshing access_token 2017-08-10 06:36:19 INFO | Starting new HTTP connection (1): metadata.google.internal 2017-08-10 06:36:19 NOTIC| ts_mon was set up. 2017-08-10 06:36:19 INFO | Label cleaner starts. Will delete all labels whose prefix is "fwro-version". 2017-08-10 06:36:19 INFO | Target database: 172.24.26.45. 2017-08-10 06:36:19 DEBUG| Running: '\nSELECT id FROM afe_labels WHERE name LIKE "fwro-version%"\n' 2017-08-10 06:36:19 INFO | Found total 750 labels 2017-08-10 06:36:19 DEBUG| Running: '\nSELECT DISTINCT(label_id) FROM afe_autotests_dependency_labels UNION\nSELECT DISTINCT(label_id) FROM afe_hosts_labels UNION\nSELECT DISTINCT(label_id) FROM afe_jobs_dependency_labels UNION\nSELECT DISTINCT(label_id) FROM afe_shards_labels UNION\nSELECT DISTINCT(label_id) FROM afe_parameterized_jobs UNION\nSELECT DISTINCT(meta_host) FROM afe_host_queue_entries\n' 2017-08-10 06:36:20 INFO | Found 93883 labels are used 2017-08-10 06:36:20 INFO | Deleting 0 unused labels 2017-08-10 06:36:20 INFO | Attempting refresh to obtain initial access_token 2017-08-10 06:36:20 INFO | Refreshing access_token 2017-08-10 06:36:22 INFO | Starting new HTTP connection (1): metadata.google.internal 2017-08-10 06:36:22 NOTIC| ts_mon was set up. 2017-08-10 06:36:22 INFO | Label cleaner starts. Will delete all labels whose prefix is "pool". 2017-08-10 06:36:22 INFO | Target database: 172.24.26.45. 2017-08-10 06:36:22 DEBUG| Running: '\nSELECT id FROM afe_labels WHERE name LIKE "pool%"\n' 2017-08-10 06:36:22 INFO | Found total 159 labels 2017-08-10 06:36:22 DEBUG| Running: '\nSELECT DISTINCT(label_id) FROM afe_autotests_dependency_labels UNION\nSELECT DISTINCT(label_id) FROM afe_hosts_labels UNION\nSELECT DISTINCT(label_id) FROM afe_jobs_dependency_labels UNION\nSELECT DISTINCT(label_id) FROM afe_shards_labels UNION\nSELECT DISTINCT(label_id) FROM afe_parameterized_jobs UNION\nSELECT DISTINCT(meta_host) FROM afe_host_queue_entries\n'
,
Jan 11 2018
The metrics reported and label clean have several problems. - The metric reported for all labels only includes labels filtered for the given prefix - But the metric for used labels includes all labels. Also, this double counts labels used in different tables (the select distinct(...) is per table, not global) Also, pretty much all the labels are referenced by afe_jobs_dependency_labels: mysql> select count(*) from (select distinct(label_id) from afe_autotests_dependency_labels) t; +----------+ | count(*) | +----------+ | 129 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from (select distinct(label_id) from afe_hosts_labels) t; +----------+ | count(*) | +----------+ | 1826 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from (select distinct(label_id) from afe_jobs_dependency_labels) t; +----------+ | count(*) | +----------+ | 166934 | +----------+ 1 row in set (1 min 31.07 sec) mysql> select count(*) from (select distinct(label_id) from afe_shards_labels) t; +----------+ | count(*) | +----------+ | 95 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from (select distinct(label_id) from afe_parameterized_jobs) t; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.01 sec) mysql> select count(*) from (select distinct(meta_host) from afe_host_queue_entries) t; +----------+ | count(*) | +----------+ | 136 | +----------+ 1 row in set (0.13 sec) mysql> select count(*) from afe_labels where name like "cros-version%"; +----------+ | count(*) | +----------+ | 152093 | +----------+ 1 row in set (0.62 sec)
,
Jan 11 2018
The oldest job referred to by the afe_job_dependency_labels is:
mysql> select id, owner, name, created_on, timeout, max_runtime_hrs from afe_jobs where id=125053379 \G
*************************** 1. row ***************************
id: 125053379
owner: chromeos-test
name: ninja-release/R61-9680.0.0/faft_bios_au_1/firmware_TPMExtend
created_on: 2017-06-25 00:00:10
timeout: 24
max_runtime_hrs: 72
1 row in set (0.00 sec)
which is 6 months old, in line with our test result horizon policy.
So, the number of existing labels ~185529 is roughly the steady state number of labels. label_cleaner has caught up and is doing its job correctly.
The only thing that remains to do here is to perhaps clarify the metrics somewhat.
,
Jan 13 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/2f02ab8283db2221e3c57f943bfe89f6be8d718e commit 2f02ab8283db2221e3c57f943bfe89f6be8d718e Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Sat Jan 13 06:37:53 2018 Improve metrics from label_cleaner. label_cleaner's metrics were slightly wrong -- we were never reporting all the existing labels, and misreporting the prefix-matched labels under "all". While there, - add a dry-run option to test stuff - add options to pass in database user/password etc from commandline. BUG= chromium:753134 TEST=Run in dry-run mode. Change-Id: Ieeca75af725b27e46277589a7a62afe35d63765b Reviewed-on: https://chromium-review.googlesource.com/862196 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> [modify] https://crrev.com/2f02ab8283db2221e3c57f943bfe89f6be8d718e/site_utils/label_cleaner.py
,
Jan 19 2018
Updated total/used label count is now correct: http://shortn/_qIdE1p3Pxz |
||||
►
Sign in to add a comment |
||||
Comment 1 by pprabhu@chromium.org
, Aug 7 2017