AfeAddToProdTask failing from some new servers
Reported by
jrbarnette@chromium.org,
Feb 23 2018
|
||
Issue description
This bug is forked from bug 810588, starting at comment #20.
I ran this command:
$ bin/run_server_task AfeAddToProdTask --prod_master cautotest --host_server cros-full-0035.mtv.corp.google.com
The command failed when calling a routine named `_update_cloudsql_server_whitelist()`.
Looking at the general function of the server, RPC calls to
`get_jobs_summary()` via that AFE fail when trying to read
from the TKO. The same failure is also observed from
cros-full-0037.mtv.
I suspect the "add to prod" failure is causing the "get_jobs_summary"
failures. So, let's fix whatever is wrong with Cloud SQL.
Here's the traceback from the AfeAddToProdTask failure:
2018-02-23 06:41:39,708 ERRO| Traceback (most recent call last):
File "/usr/local/google/home/jrbarnette/repos/cros.base/chromeos-admin/venv/server_management_lib/tasks/task.py", line 66, in run
self._run()
File "/usr/local/google/home/jrbarnette/repos/cros.base/chromeos-admin/venv/server_management_lib/tasks/atomic_afe.py", line 42, in _run
self._update_cloudsql_server_whitelist()
File "/usr/local/google/home/jrbarnette/repos/cros.base/chromeos-admin/venv/server_management_lib/tasks/atomic_common.py", line 113, in decorated_func
return func(self)
File "/usr/local/google/home/jrbarnette/repos/cros.base/chromeos-admin/venv/server_management_lib/tasks/atomic_common.py", line 441, in _update_cloudsql_server_whitelist
api.local(command, capture=True)
File "/usr/local/google/home/jrbarnette/.cache/cros_venv/venv-2.7.13-95de6b4f9b30bb6fc148ee4eccd758dc/local/lib/python2.7/site-packages/fabric/operations.py", line 1237, in local
error(message=msg, stdout=out, stderr=err)
File "/usr/local/google/home/jrbarnette/.cache/cros_venv/venv-2.7.13-95de6b4f9b30bb6fc148ee4eccd758dc/local/lib/python2.7/site-packages/fabric/utils.py", line 358, in error
return func(message)
File "/usr/local/google/home/jrbarnette/.cache/cros_venv/venv-2.7.13-95de6b4f9b30bb6fc148ee4eccd758dc/local/lib/python2.7/site-packages/fabric/utils.py", line 54, in abort
raise env.abort_exception(msg)
FabricException: local() encountered an error (return code 1) while executing '/usr/local/autotest//site_utils/sync_cloudsql_access.py --project google.com:chromeos-lab --instance tko --afe cautotest --extra_servers 172.25.66.97'
2018-02-23 06:41:39,708 ERRO| local() encountered an error (return code 1) while executing '/usr/local/autotest//site_utils/sync_cloudsql_access.py --project google.com:chromeos-lab --instance tko --afe cautotest --extra_servers 172.25.66.97'
,
Feb 23 2018
That step is to whitelist the ip address of the new server in tko cloudsql. Did you run 'gcloud auth login' before you run this task?
,
Feb 23 2018
It is explicitly mentioned in the instruction
,
Feb 23 2018
The instructions I followed are here:
https://sites.google.com/a/google.com/chromeos/for-team-members/infrastructure/chromeos-admin/basic-servertype-management
That doesn't mention `gcloud auth login`.
,
Feb 23 2018
Sorry, this is required for adding any server to the prod. I will update the instruction I followed.
,
Feb 23 2018
"you followed" I mean
,
Feb 23 2018
I've run `gcloud auth login`, and re-run the AfeAddToProdTask command. It continues to fail as before.
,
Feb 23 2018
chromeos-test@cros-full-0034:/usr/local/autotest$ /usr/local/autotest//site_utils/sync_cloudsql_access.py --project google.com:chromeos-lab --instance tko --afe cautotest --extra_servers 172.25.66.97 Adding servers ['chromeos-server151.cbf.corp.google.com', 'chromeos-server158.cbf.corp.google.com', 'chromeos-server156.cbf.corp.google.com', 'chromeos-server125.hot.corp.google.com', 'chromeos-server134.hot.corp.google.com', 'chromeos-server126.hot.corp.google.com', 'chromeos-server155.cbf.corp.google.com', 'chromeos-server129.hot.corp.google.com', 'chromeos-skunk-2.mtv.corp.google.com', 'chromeos-skunk-3.mtv.corp.google.com', 'chromeos-skunk-4.mtv.corp.google.com', 'chromeos-skunk-5.mtv.corp.google.com', 'chromeos-skunk-1.mtv.corp.google.com', 'chromeos-server-tester1.mtv.corp.google.com', 'chromeos-server-tester2.mtv.corp.google.com', 'cros-full-0001.mtv.corp.google.com', 'cros-full-0002.mtv.corp.google.com', 'cros-full-0003.mtv.corp.google.com', 'cros-full-0004.mtv.corp.google.com', 'cros-full-0005.mtv.corp.google.com', 'cros-full-0006.mtv.corp.google.com', 'cros-full-0007.mtv.corp.google.com', 'cros-full-0008.mtv.corp.google.com', 'cros-full-0009.mtv.corp.google.com', 'cros-full-0010.mtv.corp.google.com', 'cros-full-0011.mtv.corp.google.com', 'cros-full-0012.mtv.corp.google.com', 'cros-full-0013.mtv.corp.google.com', 'cros-full-0014.mtv.corp.google.com', 'cros-full-0015.mtv.corp.google.com', 'cros-full-0016.mtv.corp.google.com', 'cros-full-0017.mtv.corp.google.com', 'cros-full-0018.mtv.corp.google.com', 'cros-full-0019.mtv.corp.google.com', 'cros-full-0020.mtv.corp.google.com', 'cros-full-0021.mtv.corp.google.com', 'cros-full-0022.mtv.corp.google.com', 'cros-full-0023.mtv.corp.google.com', 'cros-full-0024.mtv.corp.google.com', 'cros-full-0025.mtv.corp.google.com', 'cros-full-0030.mtv.corp.google.com', 'cros-full-0027.mtv.corp.google.com', 'cros-full-0029.mtv.corp.google.com', 'cros-full-0028.mtv.corp.google.com', 'cros-full-0026.mtv.corp.google.com', 'cros-full-0033.mtv.corp.google.com', 'cros-full-0032.mtv.corp.google.com', 'cros-full-0031.mtv.corp.google.com', 'cros-full-0036.mtv.corp.google.com', 'cros-full-0034.mtv.corp.google.com', 'cros-bighd-0001.mtv.corp.google.com', 'cros-full-0035.mtv.corp.google.com', 'cros-full-0037.mtv.corp.google.com', '172.25.66.97'] to access list for projects tko Fetching their IP addresses... ...Done: ['100.108.133.186', '100.108.133.206', '100.108.133.205', '100.109.178.143', '100.109.175.140', '100.109.178.145', '100.108.133.203', '100.109.178.146', '100.116.60.160', '100.116.60.161', '100.116.60.162', '100.116.60.163', '100.116.60.159', '100.109.25.87', '100.109.25.88', '100.109.25.130', '100.109.25.142', '100.109.25.145', '100.109.25.143', '100.109.25.132', '100.109.25.147', '100.109.25.144', '100.109.25.134', '100.109.25.139', '100.109.25.133', '100.109.25.135', '100.109.25.140', '100.109.25.138', '100.109.25.131', '100.109.25.141', '100.109.25.149', '100.109.25.137', '100.109.25.146', '100.109.25.148', '100.109.25.150', '100.108.189.2', '100.108.189.4', '100.108.189.3', '100.108.189.5', '100.108.189.6', '100.108.189.49', '100.108.189.41', '100.108.189.48', '100.108.189.47', '100.108.189.40', '100.108.189.54', '100.108.189.51', '100.108.189.50', '100.108.189.42', '100.108.189.52', '100.108.189.33', '100.108.189.53', '100.108.189.43', '172.25.66.97'] DEBUG:root:Running 'gcloud config set project google.com:chromeos-lab -q' DEBUG:root:Running 'gcloud auth login' /bin/bash: gcloud: command not found Traceback (most recent call last): File "/usr/local/autotest//site_utils/sync_cloudsql_access.py", line 135, in <module> main() File "/usr/local/autotest//site_utils/sync_cloudsql_access.py", line 131, in main options.extra_servers) File "/usr/local/autotest//site_utils/sync_cloudsql_access.py", line 110, in update_allowed_networks gcloud_login(project) File "/usr/local/autotest//site_utils/sync_cloudsql_access.py", line 57, in gcloud_login stderr_tee=sys.stderr, stdin=sys.stdin) File "/usr/local/autotest/client/common_lib/utils.py", line 748, in run "Command returned non-zero exit status") autotest_lib.client.common_lib.error.CmdError: Command <gcloud auth login> failed, rc=127, Command returned non-zero exit status * Command: gcloud auth login Exit status: 127 Duration: 0.000725984573364 stderr: /bin/bash: gcloud: command not found gcloud is not installed on cautotest
,
Feb 23 2018
,
Feb 23 2018
> gcloud is not installed on cautotest That might explain this problem overall. However, I'm not fully convinced: cautotest isn't suffering from this symptom. So far, I see only cros-full-0035 and cros-full-0037 being affected. So, if the problem is "gcloud isn't installed" we still need to explain why some servers are affected, and others aren't. ATM, my leading theory is that I don't have some necessary permission, since the work to add cautotest (cros-full-0034) was done by akeshet@.
,
Feb 23 2018
> gcloud is not installed on cautotest One other potential point of interest: The failures I'm observing are from running commands on my workstation, not on cautotest.
,
Feb 23 2018
Sorry, that line is ran locally. I just ran this command without any issue: shuqianz@charlenez ~/c/chromeos-admin> ~/chromiumos/src/third_party/autotest/files/site_utils/sync_cloudsql_access.py --project google.com:chromeos-lab --instance tko --afe cautotest --extra_servers 172.25.66.97 Adding servers ['chromeos-server151.cbf.corp.google.com', 'chromeos-server158.cbf.corp.google.com', 'chromeos-server156.cbf.corp.google.com', 'chromeos-server125.hot.corp.google.com', 'chromeos-server134.hot.corp.google.com', 'chromeos-server126.hot.corp.google.com', 'chromeos-server155.cbf.corp.google.com', 'chromeos-server129.hot.corp.google.com', 'chromeos-skunk-2.mtv.corp.google.com', 'chromeos-skunk-3.mtv.corp.google.com', 'chromeos-skunk-4.mtv.corp.google.com', 'chromeos-skunk-5.mtv.corp.google.com', 'chromeos-skunk-1.mtv.corp.google.com', 'chromeos-server-tester1.mtv.corp.google.com', 'chromeos-server-tester2.mtv.corp.google.com', 'cros-full-0001.mtv.corp.google.com', 'cros-full-0002.mtv.corp.google.com', 'cros-full-0003.mtv.corp.google.com', 'cros-full-0004.mtv.corp.google.com', 'cros-full-0005.mtv.corp.google.com', 'cros-full-0006.mtv.corp.google.com', 'cros-full-0007.mtv.corp.google.com', 'cros-full-0008.mtv.corp.google.com', 'cros-full-0009.mtv.corp.google.com', 'cros-full-0010.mtv.corp.google.com', 'cros-full-0011.mtv.corp.google.com', 'cros-full-0012.mtv.corp.google.com', 'cros-full-0013.mtv.corp.google.com', 'cros-full-0014.mtv.corp.google.com', 'cros-full-0015.mtv.corp.google.com', 'cros-full-0016.mtv.corp.google.com', 'cros-full-0017.mtv.corp.google.com', 'cros-full-0018.mtv.corp.google.com', 'cros-full-0019.mtv.corp.google.com', 'cros-full-0020.mtv.corp.google.com', 'cros-full-0021.mtv.corp.google.com', 'cros-full-0022.mtv.corp.google.com', 'cros-full-0023.mtv.corp.google.com', 'cros-full-0024.mtv.corp.google.com', 'cros-full-0025.mtv.corp.google.com', 'cros-full-0030.mtv.corp.google.com', 'cros-full-0027.mtv.corp.google.com', 'cros-full-0029.mtv.corp.google.com', 'cros-full-0028.mtv.corp.google.com', 'cros-full-0026.mtv.corp.google.com', 'cros-full-0033.mtv.corp.google.com', 'cros-full-0032.mtv.corp.google.com', 'cros-full-0031.mtv.corp.google.com', 'cros-full-0036.mtv.corp.google.com', 'cros-full-0034.mtv.corp.google.com', 'cros-bighd-0001.mtv.corp.google.com', 'cros-full-0035.mtv.corp.google.com', 'cros-full-0037.mtv.corp.google.com', '172.25.66.97'] to access list for projects tko Fetching their IP addresses... ...Done: ['100.108.133.186', '100.108.133.206', '100.108.133.205', '100.109.178.143', '100.109.175.140', '100.109.178.145', '100.108.133.203', '100.109.178.146', '100.116.60.160', '100.116.60.161', '100.116.60.162', '100.116.60.163', '100.116.60.159', '100.109.25.87', '100.109.25.88', '100.109.25.130', '100.109.25.142', '100.109.25.145', '100.109.25.143', '100.109.25.132', '100.109.25.147', '100.109.25.144', '100.109.25.134', '100.109.25.139', '100.109.25.133', '100.109.25.135', '100.109.25.140', '100.109.25.138', '100.109.25.131', '100.109.25.141', '100.109.25.149', '100.109.25.137', '100.109.25.146', '100.109.25.148', '100.109.25.150', '100.108.189.2', '100.108.189.4', '100.108.189.3', '100.108.189.5', '100.108.189.6', '100.108.189.49', '100.108.189.41', '100.108.189.48', '100.108.189.47', '100.108.189.40', '100.108.189.54', '100.108.189.51', '100.108.189.50', '100.108.189.42', '100.108.189.52', '100.108.189.33', '100.108.189.53', '100.108.189.43', '172.25.66.97'] DEBUG:root:Running 'gcloud config set project google.com:chromeos-lab -q' Running command to update whitelists: "gcloud sql instances patch tko --authorized-networks 100.108.133.186/32,100.108.133.206/32,100.108.133.205/32,100.109.178.143/32,100.109.175.140/32,100.109.178.145/32,100.108.133.203/32,100.109.178.146/32,100.116.60.160/32,100.116.60.161/32,100.116.60.162/32,100.116.60.163/32,100.116.60.159/32,100.109.25.87/32,100.109.25.88/32,100.109.25.130/32,100.109.25.142/32,100.109.25.145/32,100.109.25.143/32,100.109.25.132/32,100.109.25.147/32,100.109.25.144/32,100.109.25.134/32,100.109.25.139/32,100.109.25.133/32,100.109.25.135/32,100.109.25.140/32,100.109.25.138/32,100.109.25.131/32,100.109.25.141/32,100.109.25.149/32,100.109.25.137/32,100.109.25.146/32,100.109.25.148/32,100.109.25.150/32,100.108.189.2/32,100.108.189.4/32,100.108.189.3/32,100.108.189.5/32,100.108.189.6/32,100.108.189.49/32,100.108.189.41/32,100.108.189.48/32,100.108.189.47/32,100.108.189.40/32,100.108.189.54/32,100.108.189.51/32,100.108.189.50/32,100.108.189.42/32,100.108.189.52/32,100.108.189.33/32,100.108.189.53/32,100.108.189.43/32,172.25.66.97/32 -q" DEBUG:root:Running 'gcloud sql instances patch tko --authorized-networks 100.108.133.186/32,100.108.133.206/32,100.108.133.205/32,100.109.178.143/32,100.109.175.140/32,100.109.178.145/32,100.108.133.203/32,100.109.178.146/32,100.116.60.160/32,100.116.60.161/32,100.116.60.162/32,100.116.60.163/32,100.116.60.159/32,100.109.25.87/32,100.109.25.88/32,100.109.25.130/32,100.109.25.142/32,100.109.25.145/32,100.109.25.143/32,100.109.25.132/32,100.109.25.147/32,100.109.25.144/32,100.109.25.134/32,100.109.25.139/32,100.109.25.133/32,100.109.25.135/32,100.109.25.140/32,100.109.25.138/32,100.109.25.131/32,100.109.25.141/32,100.109.25.149/32,100.109.25.137/32,100.109.25.146/32,100.109.25.148/32,100.109.25.150/32,100.108.189.2/32,100.108.189.4/32,100.108.189.3/32,100.108.189.5/32,100.108.189.6/32,100.108.189.49/32,100.108.189.41/32,100.108.189.48/32,100.108.189.47/32,100.108.189.40/32,100.108.189.54/32,100.108.189.51/32,100.108.189.50/32,100.108.189.42/32,100.108.189.52/32,100.108.189.33/32,100.108.189.53/32,100.108.189.43/32,172.25.66.97/32 -q' The following message will be used for the patch API method. {"project": "google.com:chromeos-lab", "name": "tko", "settings": {"ipConfiguration": {"authorizedNetworks": [{"value": "100.108.133.186/32"}, {"value": "100.108.133.206/32"}, {"value": "100.108.133.205/32"}, {"value": "100.109.178.143/32"}, {"value": "100.109.175.140/32"}, {"value": "100.109.178.145/32"}, {"value": "100.108.133.203/32"}, {"value": "100.109.178.146/32"}, {"value": "100.116.60.160/32"}, {"value": "100.116.60.161/32"}, {"value": "100.116.60.162/32"}, {"value": "100.116.60.163/32"}, {"value": "100.116.60.159/32"}, {"value": "100.109.25.87/32"}, {"value": "100.109.25.88/32"}, {"value": "100.109.25.130/32"}, {"value": "100.109.25.142/32"}, {"value": "100.109.25.145/32"}, {"value": "100.109.25.143/32"}, {"value": "100.109.25.132/32"}, {"value": "100.109.25.147/32"}, {"value": "100.109.25.144/32"}, {"value": "100.109.25.134/32"}, {"value": "100.109.25.139/32"}, {"value": "100.109.25.133/32"}, {"value": "100.109.25.135/32"}, {"value": "100.109.25.140/32"}, {"value": "100.109.25.138/32"}, {"value": "100.109.25.131/32"}, {"value": "100.109.25.141/32"}, {"value": "100.109.25.149/32"}, {"value": "100.109.25.137/32"}, {"value": "100.109.25.146/32"}, {"value": "100.109.25.148/32"}, {"value": "100.109.25.150/32"}, {"value": "100.108.189.2/32"}, {"value": "100.108.189.4/32"}, {"value": "100.108.189.3/32"}, {"value": "100.108.189.5/32"}, {"value": "100.108.189.6/32"}, {"value": "100.108.189.49/32"}, {"value": "100.108.189.41/32"}, {"value": "100.108.189.48/32"}, {"value": "100.108.189.47/32"}, {"value": "100.108.189.40/32"}, {"value": "100.108.189.54/32"}, {"value": "100.108.189.51/32"}, {"value": "100.108.189.50/32"}, {"value": "100.108.189.42/32"}, {"value": "100.108.189.52/32"}, {"value": "100.108.189.33/32"}, {"value": "100.108.189.53/32"}, {"value": "100.108.189.43/32"}, {"value": "172.25.66.97/32"}]}, "databaseFlags": [{"name": "innodb_file_per_table", "value": "on"}, {"name": "slow_query_log", "value": "on"}, {"name": "long_query_time", "value": "90"}, {"name": "wait_timeout", "value": "300"}]}} Patching Cloud SQL instance... .done. Updated [https://www.googleapis.com/sql/v1beta4/projects/google.com%3Achromeos-lab/instances/tko]. No backup False
,
Feb 23 2018
So, the problem is your setup. |
||
►
Sign in to add a comment |
||
Comment 1 by jrbarnette@chromium.org
, Feb 23 2018