in shard heartbeat rpc, failure to query slave should failover to master |
||||
Issue descriptionForked from Issue 845172 Relevant code: job_ids = set([]) <snip> if cls.FETCH_READONLY_JOBS: #TODO(jkop): Get rid of this kludge when we update Django to >=1.7 #correct usage would be .raw(..., using='readonly') old_db = Job.objects._db try: Job.objects._db = 'readonly' job_ids |= set([j.id for j in Job.objects.raw(raw_sql)]) finally: Job.objects._db = old_db if not job_ids: #If the replica is down or we're in a test, fetch from master. job_ids |= set([j.id for j in Job.objects.raw(raw_sql)]) The master is consulted if the first query returns no results. However, it is not consulted if the first queries times out or fails and throws an exception, which is what I believe was happening here. I don't think this is the intended before for two reasons: 1) It doesn't failover on the common case of the slave being unavailable or very slow. 2) It might failover incorrectly in the case where the jobs_ids set is truly empty.
,
May 29 2018
CL in flight
,
May 30 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/592cbd67bb20cfbb0e9a9d247b0db3c01cd1dc67 commit 592cbd67bb20cfbb0e9a9d247b0db3c01cd1dc67 Author: Jacob Kopczynski <jkop@google.com> Date: Wed May 30 08:02:39 2018 autotest: Switch heartbeat query retry to on error Retrying on no new jobs is overeager, so hide it behind a config flag. Retrying when there is an error handles the cases we care about (timeouts and slave DB outages) much more reasonably. BUG= chromium:845537 TEST=unit tests pass Change-Id: Ia60541be22d6c5389d3697770c93a10691783a44 Reviewed-on: https://chromium-review.googlesource.com/1069587 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Jacob Kopczynski <jkop@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/592cbd67bb20cfbb0e9a9d247b0db3c01cd1dc67/frontend/afe/models.py [modify] https://crrev.com/592cbd67bb20cfbb0e9a9d247b0db3c01cd1dc67/frontend/afe/rpc_interface_unittest.py [modify] https://crrev.com/592cbd67bb20cfbb0e9a9d247b0db3c01cd1dc67/global_config.ini
,
May 30 2018
,
Jun 4 2018
|
||||
►
Sign in to add a comment |
||||
Comment 1 by akes...@chromium.org
, May 22 2018