New issue
Advanced search Search tips

Issue 893331 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Oct 15
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug

Blocking:
issue 891758
issue 893252



Sign in to add a comment

samus, grunt, fizz, scarlet db inconsistencies lead to incorrect scheduling | sentinel service unable to connect to and fix cros-full-0039 and 0040

Project Member Reported by akes...@chromium.org, Oct 8

Issue description

As seen on chromeos-test@chromeos-server156:/var/log/autotest_sentinel/sentinel.log

2018-10-04 14:20:19,920 ERRO| Query against cros-full-0039.mtv.corp.google.com failed: SELECT id, hostname FROM afe_shards 
2018-10-04 14:20:19,962 ERRO| (1130, "Host 'chromeos-server156.cbf.corp.google.com' is not allowed to connect to this MySQL server")
Traceback (most recent call last):
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/service.py", line 697, in _sync_once
    shard_db.sync_to_master(master_db)
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/service.py", line 563, in sync_to_master
    shard_infos = self.get_shards()
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/service.py", line 534, in get_shards
    rows = self.dbmanager.fetch('id, hostname', 'afe_shards', condition)
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/db_manager.py", line 71, in fetch
    return self.execute_and_fetch(sql)
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/db_manager.py", line 152, in execute_and_fetch
    self._execute(sql)
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/db_manager.py", line 140, in _execute
    self.connect()
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/db_manager.py", line 45, in connect
    self._conn = MySQLdb.connect(**kwargs)
  File "/usr/local/google/home/chromeos-test/.cache/cros_venv/venv-2.7.6-95de6b4f9b30bb6fc148ee4eccd758dc/local/lib/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect
    return Connection(*args, **kwargs)
  File "/usr/local/google/home/chromeos-test/.cache/cros_venv/venv-2.7.6-95de6b4f9b30bb6fc148ee4eccd758dc/local/lib/python2.7/site-packages/MySQLdb/connections.py", line 193, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
OperationalError: (1130, "Host 'chromeos-server156.cbf.corp.google.com' is not allowed to connect to this MySQL server")
2018-10-04 14:20:19,962 INFO| cros-full-0039.mtv.corp.google.com: Done.

(and many repeated occurrences)

This is probably contributing to  Issue 891758  because it means our inconsistency-fixer isn't able to act.

Chase-Pending because this should be easy enough to alert on.


 
Blocking: 891758
Labels: -Pri-1 Pri-0
P0 to resolve the likely permission issue.
The shard apepars to have misconfigured mysql users.

On cros-full-0039 (inaccessible from sentinel):

mysql> SELECT USER, HOST from mysql.user;
+------------------+------------------------------------+
| USER             | HOST                               |
+------------------+------------------------------------+
| root             | 127.0.0.1                          |
| root             | ::1                                |
| root             | cros-full-0039.mtv.corp.google.com |
| chromeosqa-admin | localhost                          |
| debian-sys-maint | localhost                          |
| root             | localhost                          |
+------------------+------------------------------------+
6 rows in set (0.00 sec)


On -0024 (accessible):
mysql> SELECT USER, HOST from mysql.user;
+------------------+------------------------------------+
| USER             | HOST                               |
+------------------+------------------------------------+
| chromeosqa-admin | %                                  |
| root             | 127.0.0.1                          |
| root             | ::1                                |
| root             | cros-full-0024.mtv.corp.google.com |
| chromeosqa-admin | localhost                          |
| debian-sys-maint | localhost                          |
| root             | localhost                          |
+------------------+------------------------------------+
7 rows in set (0.00 sec)


I'm not sure what is supposed to create this account, but apparently failed to create the wildcard-host variant of chromeosqa-admin.
don't know how many other shards suffer from this. It's easy enough to fix a single instance to start with.
Ran this on 0039

mysql> CREATE USER 'chromeosqa-admin'@'%' IDENTIFIED BY PASSWORD <redacted>;
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT ALL PRIVILEGES ON `chromeos_autotest_db`.* TO 'chromeosqa-admin'@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT ALL PRIVILEGES ON `chromeos_lab_servers`.* TO 'chromeosqa-admin'@'%';
Query OK, 0 rows affected (0.00 sec)
Will babysit sentinel to see if that allows it to fix 0039; not sure what the cycle rate of sentinel iss so may have to babysite for a while.
I've successfully forced 1 run of the sentinel service against that shard.


Grepping through the sentinel log, I see this issue also affects shard 40, but no others that I can see. Will fix that one manually too.
Summary: samus, grunt, fizz, scarlet db inconsistencies lead to incorrect scheduling | sentinel service unable to connect to and fix cros-full-0039 and 0040 (was: sentinel service unable to connect to and fix cros-full-0039)
Examining grunt, the locked state issue fixed, but the new problem is that pool: labels are also inconsistent :/

This might be related to  Issue 893355 ; maybe we bail out after a foreign key constraint and fail to sync the other labels?

Investigating...
Blocking: 893252
Status: Fixed (was: Assigned)

Sign in to add a comment