New issue
Advanced search Search tips

Issue 753134 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2018
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue 751802



Sign in to add a comment

Make label_cleaner work again, and collect metrics

Project Member Reported by pprabhu@chromium.org, Aug 7 2017

Issue description

label_cleaner is dying in prod with:

ImportError: No module named MySQLdb
Traceback (most recent call last):
  File "/usr/local/autotest/site_utils/label_cleaner.py", line 26, in <module>
    import MySQLdb


- Make it work again.
- Add monarch metrics so it doesn't remain silent.
 
Blocking: 751802
+ add a graph to viceroy dashboard under DB inconsistencies.
Labels: -Pri-3 Pri-2
Project Member

Comment 4 by bugdroid1@chromium.org, Aug 7 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/194b870d7330134acf0acac164907699a0204b43

commit 194b870d7330134acf0acac164907699a0204b43
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Mon Aug 07 23:42:45 2017

[autotest] Fix import order in label_cleaner

MySQLdb is installed via build_externals on prod machines. This means
that it is only available after 'import common'

BUG= chromium:753134 
TEST=Run label_cleaner locally.

Change-Id: Id91b8cbdc718c517720b5e2bc5ddef7f47d7f334
Reviewed-on: https://chromium-review.googlesource.com/604723
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>

[modify] https://crrev.com/194b870d7330134acf0acac164907699a0204b43/site_utils/label_cleaner.py

Project Member

Comment 5 by bugdroid1@chromium.org, Aug 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/826844901b8e261729d5727cf12331e85bcbd6e0

commit 826844901b8e261729d5727cf12331e85bcbd6e0
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Tue Aug 08 17:09:10 2017

[autotest] Emit metrics from label_cleaner

+ cleanup some weird python idioms

BUG= chromium:753134 
TEST=Cleaned labels from local autotest setup;
     metrics testing by setting debug_file in ts_mon.

Change-Id: I6a19d41b868394b9459a514acefe8df700fff4d3
Reviewed-on: https://chromium-review.googlesource.com/604758
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Paul Hobbs <phobbs@google.com>

[modify] https://crrev.com/826844901b8e261729d5727cf12331e85bcbd6e0/site_utils/label_cleaner.py

label cleaner has been working a few days and spewing metrics.
Here are some indicative graphs (to be added to viceroy):

Every thing filtered to show only the master DB (shards are also cleaning their own labels):

Tick (# of iterations / 6 hours): http://shortn/_Z58Uy7bDT4
I thought it was running once a day, but I was clearly wrong

Total/Used labels (averaged over a day):  http://shortn/_qI1CVmUgu6
This is also whacky # used labels > # total labels

Total labels deleted in the past day: http://shortn/_I8CBooM5yr
This suggests that we moved the label cleaner from chromeos-server2 to chromeos-server18. I didn't do it, so who did?
Owner: pprabhu@chromium.org
Well, label_cleaner has been working alright according to the graphs above, but it's cleaning <100 labels per day out of the 100K unused labels it finds.

It'll take a while to catch up!

Logs:

2017-08-10 06:36:14 INFO | Attempting refresh to obtain initial access_token
2017-08-10 06:36:14 INFO | Refreshing access_token
2017-08-10 06:36:16 INFO | Starting new HTTP connection (1): metadata.google.internal
2017-08-10 06:36:16 NOTIC| ts_mon was set up.
2017-08-10 06:36:16 INFO | Label cleaner starts. Will delete all labels whose prefix is "fwrw-version".
2017-08-10 06:36:16 INFO | Target database: 172.24.26.45.
2017-08-10 06:36:16 DEBUG| Running: '\nSELECT id FROM afe_labels WHERE name LIKE "fwrw-version%"\n'
2017-08-10 06:36:16 INFO | Found total 310 labels
2017-08-10 06:36:16 DEBUG| Running: '\nSELECT DISTINCT(label_id) FROM afe_autotests_dependency_labels UNION\nSELECT DISTINCT(label_id) FROM afe_hosts_labels UNION\nSELECT DISTINCT(label_id) FROM afe_jobs_dependency_labels UNION\nSELECT DISTINCT(label_id) FROM afe_shards_labels UNION\nSELECT DISTINCT(label_id) FROM afe_parameterized_jobs UNION\nSELECT DISTINCT(meta_host) FROM afe_host_queue_entries\n'
2017-08-10 06:36:17 INFO | Found 93883 labels are used
2017-08-10 06:36:17 INFO | Deleting 2 unused labels
2017-08-10 06:36:17 DEBUG| Running: '\nDELETE FROM afe_labels WHERE id in (693680,693478)\n'
2017-08-10 06:36:17 INFO | Attempting refresh to obtain initial access_token
2017-08-10 06:36:17 INFO | Refreshing access_token
2017-08-10 06:36:19 INFO | Starting new HTTP connection (1): metadata.google.internal
2017-08-10 06:36:19 NOTIC| ts_mon was set up.
2017-08-10 06:36:19 INFO | Label cleaner starts. Will delete all labels whose prefix is "fwro-version".
2017-08-10 06:36:19 INFO | Target database: 172.24.26.45.
2017-08-10 06:36:19 DEBUG| Running: '\nSELECT id FROM afe_labels WHERE name LIKE "fwro-version%"\n'
2017-08-10 06:36:19 INFO | Found total 750 labels
2017-08-10 06:36:19 DEBUG| Running: '\nSELECT DISTINCT(label_id) FROM afe_autotests_dependency_labels UNION\nSELECT DISTINCT(label_id) FROM afe_hosts_labels UNION\nSELECT DISTINCT(label_id) FROM afe_jobs_dependency_labels UNION\nSELECT DISTINCT(label_id) FROM afe_shards_labels UNION\nSELECT DISTINCT(label_id) FROM afe_parameterized_jobs UNION\nSELECT DISTINCT(meta_host) FROM afe_host_queue_entries\n'
2017-08-10 06:36:20 INFO | Found 93883 labels are used
2017-08-10 06:36:20 INFO | Deleting 0 unused labels
2017-08-10 06:36:20 INFO | Attempting refresh to obtain initial access_token
2017-08-10 06:36:20 INFO | Refreshing access_token
2017-08-10 06:36:22 INFO | Starting new HTTP connection (1): metadata.google.internal
2017-08-10 06:36:22 NOTIC| ts_mon was set up.
2017-08-10 06:36:22 INFO | Label cleaner starts. Will delete all labels whose prefix is "pool".
2017-08-10 06:36:22 INFO | Target database: 172.24.26.45.
2017-08-10 06:36:22 DEBUG| Running: '\nSELECT id FROM afe_labels WHERE name LIKE "pool%"\n'
2017-08-10 06:36:22 INFO | Found total 159 labels
2017-08-10 06:36:22 DEBUG| Running: '\nSELECT DISTINCT(label_id) FROM afe_autotests_dependency_labels UNION\nSELECT DISTINCT(label_id) FROM afe_hosts_labels UNION\nSELECT DISTINCT(label_id) FROM afe_jobs_dependency_labels UNION\nSELECT DISTINCT(label_id) FROM afe_shards_labels UNION\nSELECT DISTINCT(label_id) FROM afe_parameterized_jobs UNION\nSELECT DISTINCT(meta_host) FROM afe_host_queue_entries\n'

The metrics reported and label clean have several problems.

- The metric reported for all labels only includes labels filtered for the given prefix
- But the metric for used labels includes all labels. Also, this double counts labels used in different tables (the select distinct(...) is per table, not global)

Also, pretty much all the labels are referenced by afe_jobs_dependency_labels:

mysql> select count(*) from (select distinct(label_id) from afe_autotests_dependency_labels) t;
+----------+
| count(*) |
+----------+
|      129 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from (select distinct(label_id) from afe_hosts_labels) t;                                                                                                               
+----------+
| count(*) |
+----------+
|     1826 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from (select distinct(label_id) from afe_jobs_dependency_labels) t;                                                                                                     
+----------+
| count(*) |
+----------+
|   166934 |
+----------+
1 row in set (1 min 31.07 sec)

mysql> select count(*) from (select distinct(label_id) from afe_shards_labels) t;                                                                                                              
+----------+
| count(*) |
+----------+
|       95 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from (select distinct(label_id) from afe_parameterized_jobs) t;                                                                                                         
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.01 sec)

mysql> select count(*) from (select distinct(meta_host) from afe_host_queue_entries) t;                                                                                                        
+----------+
| count(*) |
+----------+
|      136 |
+----------+
1 row in set (0.13 sec)


mysql> select count(*) from afe_labels where name like "cros-version%";                                                                                                                        
+----------+
| count(*) |
+----------+
|   152093 |
+----------+
1 row in set (0.62 sec)

The oldest job referred to by the afe_job_dependency_labels is: 

mysql> select id, owner, name, created_on, timeout, max_runtime_hrs from afe_jobs where id=125053379 \G
*************************** 1. row ***************************
             id: 125053379
          owner: chromeos-test
           name: ninja-release/R61-9680.0.0/faft_bios_au_1/firmware_TPMExtend
     created_on: 2017-06-25 00:00:10
        timeout: 24
max_runtime_hrs: 72
1 row in set (0.00 sec)


which is 6 months old, in line with our test result horizon policy.

So, the number of existing labels ~185529 is roughly the steady state number of labels. label_cleaner has caught up and is doing its job correctly.

The only thing that remains to do here is to perhaps clarify the metrics somewhat.
Project Member

Comment 10 by bugdroid1@chromium.org, Jan 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/2f02ab8283db2221e3c57f943bfe89f6be8d718e

commit 2f02ab8283db2221e3c57f943bfe89f6be8d718e
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Sat Jan 13 06:37:53 2018

Improve metrics from label_cleaner.

label_cleaner's metrics were slightly wrong -- we were never reporting
all the existing labels, and misreporting the prefix-matched labels
under "all".

While there,
- add a dry-run option to test stuff
- add options to pass in database user/password etc from commandline.

BUG= chromium:753134 
TEST=Run in dry-run mode.

Change-Id: Ieeca75af725b27e46277589a7a62afe35d63765b
Reviewed-on: https://chromium-review.googlesource.com/862196
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>

[modify] https://crrev.com/2f02ab8283db2221e3c57f943bfe89f6be8d718e/site_utils/label_cleaner.py

Status: Fixed (was: Started)
Updated total/used label count is now correct: http://shortn/_qIdE1p3Pxz

Sign in to add a comment