AFE: Large number of master-shard host_label desync since Aug 18th |
||||
Issue descriptionhttps://viceroy.corp.google.com/chromeos/sentinel?duration=8d#_VG_PFGzLahW sentinel is recovering a large number of afe_label inconsistencies. No visible impact (yet).
,
Aug 22 2017
deleted labels:
mysql> select * from afe_labels where id in ('1500', '164763', '152520', '400055', '187672', '9952', '10974');
+--------+-----------------------+---------------+----------+---------+----------------+-----------------+
| id | name | kernel_config | platform | invalid | only_if_needed | atomic_group_id |
+--------+-----------------------+---------------+----------+---------+----------------+-----------------+
| 1500 | pool:suites | | 0 | 0 | 0 | NULL |
| 9952 | bluetooth | | 0 | 0 | 0 | NULL |
| 10974 | webcam | | 0 | 0 | 0 | NULL |
| 152520 | audio_loopback_dongle | | 0 | 0 | 0 | NULL |
| 164763 | pool:performance | | 0 | 0 | 0 | NULL |
| 187672 | hw_video_acc_vp9 | | 0 | 0 | 0 | NULL |
| 400055 | pool:crosperf | | 0 | 0 | 0 | NULL |
+--------+-----------------------+---------------+----------+---------+----------------+-----------------+
7 rows in set (0.00 sec)
added labels:
mysql> select * from afe_labels where id in ('1500', '164763', '400055', '187672', '152520');
+--------+-----------------------+---------------+----------+---------+----------------+-----------------+
| id | name | kernel_config | platform | invalid | only_if_needed | atomic_group_id |
+--------+-----------------------+---------------+----------+---------+----------------+-----------------+
| 1500 | pool:suites | | 0 | 0 | 0 | NULL |
| 152520 | audio_loopback_dongle | | 0 | 0 | 0 | NULL |
| 164763 | pool:performance | | 0 | 0 | 0 | NULL |
| 187672 | hw_video_acc_vp9 | | 0 | 0 | 0 | NULL |
| 400055 | pool:crosperf | | 0 | 0 | 0 | NULL |
+--------+-----------------------+---------------+----------+---------+----------------+-----------------+
---------------
And the label-desync has come back to its baseline level. Perhaps a fallout of the shard migration on Friday?
,
Aug 22 2017
This looks very fishy. Why are so many hosts with the incorrect pool labels? For example, one of the DUTs from which pool:suites was removed no longer has _any_ pool: mysql> select * from afe_hosts where id = 6220; +------+-----------------------------+--------+----------+--------+---------+------------+--------------+-----------+-------+--------+----------+-------------+ | id | hostname | locked | synch_id | status | invalid | protection | locked_by_id | lock_time | dirty | leased | shard_id | lock_reason | +------+-----------------------------+--------+----------+--------+---------+------------+--------------+-----------+-------+--------+----------+-------------+ | 6220 | chromeos6-row2-rack5-host16 | 0 | NULL | Ready | 0 | 0 | NULL | NULL | 1 | 0 | 85 | | +------+-----------------------------+--------+----------+--------+---------+------------+--------------+-----------+-------+--------+----------+-------------+ 1 row in set (0.00 sec) mysql> ^CCtrl-C -- exit! Aborted pprabhu@pprabhu:~$ atest host list chromeos6-row2-rack5-host16 Host Status Shard Locked Lock Reason Locked by Platform Labels chromeos6-row2-rack5-host16 Ready chromeos-server50.hot.corp.google.com False None stout bluetooth, storage:ssd, os:cros, hw_jpeg_acc_dec, power:battery, board:stout, hw_video_acc_h264, cts_abi_x86, cts_abi_arm, webcam, internal_display, audio_loopback_dongle, variant:stout, sku:stout_intel_celeron_1007U_4Gb, touchpad, cros-version:stout-release/R62-9856.0.0 pprabhu@pprabhu:~$ atest host list chromeos6-row2-rack5-host16 | grep pool
,
Aug 22 2017
The # of desync'ed hosts has stabilized to its basline (which is way too high, imo). |
||||
►
Sign in to add a comment |
||||
Comment 1 by pprabhu@chromium.org
, Aug 21 2017chromeos-test@chromeos-server18:/var/log/autotest_sentinel$ grep -i 'delete label' sentinel.log.3 | awk '{print $6}'| sort | uniq -c | sort -n 1 10974 2 9952 3 187672 4 400055 5 152520 14 1500 24 164763 chromeos-test@chromeos-server18:/var/log/autotest_sentinel$ grep -i 'add label' sentinel.log.3 | awk '{print $6}'| so rt | uniq -c | sort -n 1 9952 2 152520 2 187672 6 400055 19 164763 26 1500