shard-client in the lab went down |
|||
Issue description
11:58:29 ERROR| Heartbeat failed. JSONRPCException: DatabaseError: (1146, "Table 'chromeos_autotest_db.afe_jobs' doesn't exist")
Traceback (most recent call last):
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 109, in dispatchRequest
results['result'] = self.invokeServiceEndpoint(meth, args)
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 147, in invokeServiceEndpoint
return meth(*args)
File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
return f(*args, **keyword_args)
File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 2196, in shard_heartbeat
rpc_utils.persist_records_sent_from_shard(shard_obj, jobs, hqes)
File "/usr/local/autotest/frontend/afe/rpc_utils.py", line 1029, in persist_records_sent_from_shard
shard, jobs, models.Job)
File "/usr/local/autotest/frontend/afe/rpc_utils.py", line 989, in _persist_records_with_type_sent_from_shard
current_record = record_type.objects.get(pk=pk)
File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
return self.get_query_set().get(*args, **kwargs)
File "/usr/local/autotest/site-packages/django/db/models/query.py", line 382, in get
num = len(clone)
File "/usr/local/autotest/site-packages/django/db/models/query.py", line 90, in __len__
self._result_cache = list(self.iterator())
File "/usr/local/autotest/site-packages/django/db/models/query.py", line 301, in iterator
for row in compiler.results_iter():
File "/usr/local/autotest/site-packages/django/db/models/sql/compiler.py", line 775, in results_iter
for rows in self.execute_sql(MULTI):
File "/usr/local/autotest/site-packages/django/db/models/sql/compiler.py", line 840, in execute_sql
cursor.execute(sql, params)
File "/usr/local/autotest/site-packages/django/db/backends/mysql/base.py", line 130, in execute
six.reraise(utils.DatabaseError, utils.DatabaseError(*tuple(e.args)), sys.exc_info()[2])
File "/usr/local/autotest/site-packages/django/db/backends/mysql/base.py", line 120, in execute
return self.cursor.execute(query, args)
File "/usr/local/autotest/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/usr/local/autotest/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
DatabaseError: (1146, "Table 'chromeos_autotest_db.afe_jobs' doesn't exist")
,
Mar 2 2018
,
Mar 2 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/2e315cf2389fd1d9642a17694054f76e30f0240a commit 2e315cf2389fd1d9642a17694054f76e30f0240a Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Fri Mar 02 20:33:06 2018 Revert "rpc: Fetch from readonly db during shard heartbeat" This reverts commit 3c75d9252ec132f724e72d025939a82fa64c7d14. Reason for revert: Shard hearbeat failure in prod. Original change's description: > rpc: Fetch from readonly db during shard heartbeat > > Fall back to master if readonly isn't available (mostly during tests) > > BUG=chromium:810965 > TEST=unit tests > > Change-Id: I5442d7b31a79908e12d09a60bed3f42645422ebc > Reviewed-on: https://chromium-review.googlesource.com/938384 > Commit-Ready: Jacob Kopczynski <jkop@chromium.org> > Tested-by: Jacob Kopczynski <jkop@chromium.org> > Reviewed-by: Xixuan Wu <xixuan@chromium.org> BUG=chromium:810965 BUG= chromium:818271 Change-Id: I2c428cf91a6a57c2a96c98c89938f683120ed77b Reviewed-on: https://chromium-review.googlesource.com/946750 Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> [modify] https://crrev.com/2e315cf2389fd1d9642a17694054f76e30f0240a/frontend/afe/models.py
,
Mar 2 2018
Emergency push for #3 done. shard clients are returning to life: https://viceroy.corp.google.com/chromeos/deputy-view?duration=6h#_VG_lnuPnWCa
,
Mar 2 2018
The problem was that #3 made the hearbeat RPC use the cautotest::readonly_host for part of the DB queries. But that is actually pointing to the CloudSQL TKO database. I had a CL to try to fix that: https://chrome-internal-review.googlesource.com/c/chromeos/chromeos-admin/+/581567 but I'm not sure what is using that setting to actually refer to the TKO instance. The correct fix would involve separating out the reference to TKO DB from the readonly AFE DB.
,
Mar 2 2018
I already have a CL in progress to make it fall back to master DB and a test to check that it doesn't leave it in a bad state. I don't think it will guarantee this doesn't recur, but will fail more gracefully.
,
Mar 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d6a5e919d346e17834f3730a899ba221f4d4ce17 commit d6a5e919d346e17834f3730a899ba221f4d4ce17 Author: Jacob Kopczynski <jkop@google.com> Date: Fri Mar 09 03:29:04 2018 rpc: Guarantee reset to good state after readonly The kludge needed to query readonly backup DB could leave the connection to master broken in case of an error. Remedy this. Behind a feature flag, fetch_readonly_jobs, defaults to False. BUG=chromium:810965 BUG= chromium:818271 TEST=Old and new unit tests, feature flag for canarying rollout Change-Id: Idc5e3793f5dc5a2bd1022e468456b88b2f347ed3 Reviewed-on: https://chromium-review.googlesource.com/944041 Commit-Ready: Jacob Kopczynski <jkop@chromium.org> Tested-by: Jacob Kopczynski <jkop@chromium.org> Reviewed-by: Xixuan Wu <xixuan@chromium.org> [modify] https://crrev.com/d6a5e919d346e17834f3730a899ba221f4d4ce17/frontend/afe/models.py [modify] https://crrev.com/d6a5e919d346e17834f3730a899ba221f4d4ce17/frontend/afe/rpc_interface_unittest.py [modify] https://crrev.com/d6a5e919d346e17834f3730a899ba221f4d4ce17/global_config.ini
,
Mar 9 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/c47ff078b5cd97050c345c5206d07e561f233175 commit c47ff078b5cd97050c345c5206d07e561f233175 Author: Jacob Kopczynski <jkop@google.com> Date: Fri Mar 09 19:31:44 2018
,
Mar 12 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/01489ded899f10e420085db62e3ea5a3b71ede1d commit 01489ded899f10e420085db62e3ea5a3b71ede1d Author: Jacob Kopczynski <jkop@google.com> Date: Mon Mar 12 23:11:57 2018
,
Mar 13 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/b9766dea3c8cfdc07359edc729a8c47ba688b846 commit b9766dea3c8cfdc07359edc729a8c47ba688b846 Author: Jacob Kopczynski <jkop@google.com> Date: Tue Mar 13 18:55:58 2018
,
Mar 13 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/c21ff807077ffc27a0d39468ff609020e1ebc08a commit c21ff807077ffc27a0d39468ff609020e1ebc08a Author: Jacob Kopczynski <jkop@google.com> Date: Tue Mar 13 19:00:14 2018
,
Mar 24 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/348f0c77d5d57c63f7a1f88e24d0535cdbf19316 commit 348f0c77d5d57c63f7a1f88e24d0535cdbf19316 Author: Jacob Kopczynski <jkop@google.com> Date: Sat Mar 24 00:29:49 2018 autotest: rpc: rollout shard heartbeat whitelist BUG=chromium:810965 BUG= chromium:818271 TEST=unit tests Change-Id: I818b1ec237dc09caba68ca79fac05705f9b94b17 Reviewed-on: https://chromium-review.googlesource.com/969994 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Jacob Kopczynski <jkop@chromium.org> Reviewed-by: Allen Li <ayatane@chromium.org> [modify] https://crrev.com/348f0c77d5d57c63f7a1f88e24d0535cdbf19316/frontend/afe/models.py [modify] https://crrev.com/348f0c77d5d57c63f7a1f88e24d0535cdbf19316/global_config.ini
,
Mar 30 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/77ecb5da72f5bfc68f4d0ef0b4d27e8a6e4e4429 commit 77ecb5da72f5bfc68f4d0ef0b4d27e8a6e4e4429 Author: Jacob Kopczynski <jkop@google.com> Date: Fri Mar 30 18:53:32 2018
,
Mar 30 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/d94fced6c17b8266b604d95c740453e7716cc8f4 commit d94fced6c17b8266b604d95c740453e7716cc8f4 Author: Jacob Kopczynski <jkop@google.com> Date: Fri Mar 30 22:50:39 2018
,
Apr 18 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/02b064bfd46c9125849dafcbb7c584a9c316e8d7 commit 02b064bfd46c9125849dafcbb7c584a9c316e8d7 Author: Jacob Kopczynski <jkop@google.com> Date: Wed Apr 18 17:22:28 2018
,
Apr 19 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/a3cd7591c19a7dd66bb4486cfd8abe9b96d9fc5b commit a3cd7591c19a7dd66bb4486cfd8abe9b96d9fc5b Author: Jacob Kopczynski <jkop@google.com> Date: Thu Apr 19 03:07:42 2018
,
Apr 26 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/62489c8a040b4e9e7ba1230f030cd97c5d55b42c commit 62489c8a040b4e9e7ba1230f030cd97c5d55b42c Author: Jacob Kopczynski <jkop@google.com> Date: Thu Apr 26 19:30:41 2018
,
Apr 27 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/36ff72fa159ee5e6589b5ad443c2e2b1db314f49 commit 36ff72fa159ee5e6589b5ad443c2e2b1db314f49 Author: Jacob Kopczynski <jkop@google.com> Date: Fri Apr 27 23:31:49 2018
,
Apr 28 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/aa312346fec2b56405cc466c3b42d8260d13cddf commit aa312346fec2b56405cc466c3b42d8260d13cddf Author: Jacob Kopczynski <jkop@google.com> Date: Sat Apr 28 03:18:51 2018
,
May 2 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/a2f738d3ac6c289cbe66bacb96d12a72787a1f71 commit a2f738d3ac6c289cbe66bacb96d12a72787a1f71 Author: Jacob Kopczynski <jkop@google.com> Date: Wed May 02 20:21:34 2018
,
May 2 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/f244072bbcbb45f4879f844ae173ee29de75e2fa commit f244072bbcbb45f4879f844ae173ee29de75e2fa Author: Jacob Kopczynski <jkop@google.com> Date: Wed May 02 22:34:06 2018
,
May 2 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/4a904e6961fe9c6f889464fea3cf40b9ce766a88 commit 4a904e6961fe9c6f889464fea3cf40b9ce766a88 Author: Jacob Kopczynski <jkop@google.com> Date: Wed May 02 23:23:43 2018
,
May 7 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/4525d2453cefdfe0ebb36ad20a25a4faa2510c6c commit 4525d2453cefdfe0ebb36ad20a25a4faa2510c6c Author: Jacob Kopczynski <jkop@google.com> Date: Mon May 07 18:04:33 2018 autotest: Readonly heartbeat on all shards. BUG=chromium:810965 BUG= chromium:818271 TEST=No problems in the partial rollout. Change-Id: Ib778a5d2492f24c88878e773359ede965d8e39df Reviewed-on: https://chromium-review.googlesource.com/1038619 Commit-Ready: Jacob Kopczynski <jkop@chromium.org> Tested-by: Jacob Kopczynski <jkop@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> [modify] https://crrev.com/4525d2453cefdfe0ebb36ad20a25a4faa2510c6c/frontend/afe/models.py |
|||
►
Sign in to add a comment |
|||
Comment 1 by nxia@chromium.org
, Mar 2 2018Labels: -Pri-3 Pri-1