samus, grunt, fizz, scarlet db inconsistencies lead to incorrect scheduling | sentinel service unable to connect to and fix cros-full-0039 and 0040 |
||||
Issue descriptionAs seen on chromeos-test@chromeos-server156:/var/log/autotest_sentinel/sentinel.log 2018-10-04 14:20:19,920 ERRO| Query against cros-full-0039.mtv.corp.google.com failed: SELECT id, hostname FROM afe_shards 2018-10-04 14:20:19,962 ERRO| (1130, "Host 'chromeos-server156.cbf.corp.google.com' is not allowed to connect to this MySQL server") Traceback (most recent call last): File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/service.py", line 697, in _sync_once shard_db.sync_to_master(master_db) File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/service.py", line 563, in sync_to_master shard_infos = self.get_shards() File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/service.py", line 534, in get_shards rows = self.dbmanager.fetch('id, hostname', 'afe_shards', condition) File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/db_manager.py", line 71, in fetch return self.execute_and_fetch(sql) File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/db_manager.py", line 152, in execute_and_fetch self._execute(sql) File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/db_manager.py", line 140, in _execute self.connect() File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/db_manager.py", line 45, in connect self._conn = MySQLdb.connect(**kwargs) File "/usr/local/google/home/chromeos-test/.cache/cros_venv/venv-2.7.6-95de6b4f9b30bb6fc148ee4eccd758dc/local/lib/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect return Connection(*args, **kwargs) File "/usr/local/google/home/chromeos-test/.cache/cros_venv/venv-2.7.6-95de6b4f9b30bb6fc148ee4eccd758dc/local/lib/python2.7/site-packages/MySQLdb/connections.py", line 193, in __init__ super(Connection, self).__init__(*args, **kwargs2) OperationalError: (1130, "Host 'chromeos-server156.cbf.corp.google.com' is not allowed to connect to this MySQL server") 2018-10-04 14:20:19,962 INFO| cros-full-0039.mtv.corp.google.com: Done. (and many repeated occurrences) This is probably contributing to Issue 891758 because it means our inconsistency-fixer isn't able to act. Chase-Pending because this should be easy enough to alert on.
,
Oct 8
The shard apepars to have misconfigured mysql users. On cros-full-0039 (inaccessible from sentinel): mysql> SELECT USER, HOST from mysql.user; +------------------+------------------------------------+ | USER | HOST | +------------------+------------------------------------+ | root | 127.0.0.1 | | root | ::1 | | root | cros-full-0039.mtv.corp.google.com | | chromeosqa-admin | localhost | | debian-sys-maint | localhost | | root | localhost | +------------------+------------------------------------+ 6 rows in set (0.00 sec) On -0024 (accessible): mysql> SELECT USER, HOST from mysql.user; +------------------+------------------------------------+ | USER | HOST | +------------------+------------------------------------+ | chromeosqa-admin | % | | root | 127.0.0.1 | | root | ::1 | | root | cros-full-0024.mtv.corp.google.com | | chromeosqa-admin | localhost | | debian-sys-maint | localhost | | root | localhost | +------------------+------------------------------------+ 7 rows in set (0.00 sec) I'm not sure what is supposed to create this account, but apparently failed to create the wildcard-host variant of chromeosqa-admin.
,
Oct 8
don't know how many other shards suffer from this. It's easy enough to fix a single instance to start with.
,
Oct 8
Ran this on 0039 mysql> CREATE USER 'chromeosqa-admin'@'%' IDENTIFIED BY PASSWORD <redacted>; Query OK, 0 rows affected (0.00 sec) mysql> GRANT ALL PRIVILEGES ON `chromeos_autotest_db`.* TO 'chromeosqa-admin'@'%'; Query OK, 0 rows affected (0.00 sec) mysql> GRANT ALL PRIVILEGES ON `chromeos_lab_servers`.* TO 'chromeosqa-admin'@'%'; Query OK, 0 rows affected (0.00 sec)
,
Oct 8
Will babysit sentinel to see if that allows it to fix 0039; not sure what the cycle rate of sentinel iss so may have to babysite for a while.
,
Oct 8
I've successfully forced 1 run of the sentinel service against that shard.
,
Oct 8
Grepping through the sentinel log, I see this issue also affects shard 40, but no others that I can see. Will fix that one manually too.
,
Oct 8
,
Oct 9
Examining grunt, the locked state issue fixed, but the new problem is that pool: labels are also inconsistent :/ This might be related to Issue 893355 ; maybe we bail out after a foreign key constraint and fail to sync the other labels? Investigating...
,
Oct 9
,
Oct 15
|
||||
►
Sign in to add a comment |
||||
Comment 1 by akes...@chromium.org
, Oct 8Labels: -Pri-1 Pri-0