moblab-scheduler-init stops running sometimes |
|||||
Issue descriptionMobMonitor also reports that moblab-scheduler-init is not running. I can take the default action and the issue is resolved, but when I check again a day or two later I need to take action through MobMonitor again. Currently on beta channel (10575.45.0) but I saw this on the previous version as well.
,
Jun 8 2018
I believe the root cause is that I added this into the upstart job https://cs.corp.google.com/chromeos_public/src/overlays/project-moblab/chromeos-base/chromeos-bsp-moblab/files/init/moblab-scheduler-init.conf?rcl=40a55735d331dc588cf30cea9d6dec0935e9922b&l=13 When scheduler looses the connection to the db it exits normally, https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/scheduler/monitor_db.py?rcl=17e1f5694c20616f729178bbb6f5651ac83e0f8f&l=204
,
Jun 8 2018
This is the traceback from my machine
06/07 02:47:46.661 ERROR| monitor_db:0205| Uncaught exception, terminating monitor_db.
Traceback (most recent call last):
File "/usr/local/autotest/scheduler/monitor_db.py", line 194, in main_without_exception_handling
dispatcher.tick()
File "/usr/lib64/python2.7/site-packages/chromite/lib/metrics.py", line 490, in wrapper
return fn(*args, **kwargs)
File "/usr/local/autotest/scheduler/monitor_db.py", line 407, in tick
self._run_cleanup()
File "/usr/local/autotest/scheduler/monitor_db.py", line 307, in wrapper
return func(self, *args, **kwargs)
File "/usr/local/autotest/scheduler/monitor_db.py", line 434, in _run_cleanup
self._periodic_cleanup.run_cleanup_maybe()
File "/usr/local/autotest/scheduler/monitor_db_cleanup.py", line 48, in run_cleanup_maybe
self._cleanup()
File "/usr/lib64/python2.7/site-packages/chromite/lib/metrics.py", line 490, in wrapper
return fn(*args, **kwargs)
File "/usr/local/autotest/scheduler/monitor_db_cleanup.py", line 75, in _cleanup
self._django_session_cleanup()
File "/usr/local/autotest/scheduler/monitor_db_cleanup.py", line 263, in _django_session_cleanup
self._db.execute(sql)
File "/usr/local/autotest/database/database_connection.py", line 304, in execute
results = self._backend.execute(query, parameters)
File "/usr/local/autotest/database/database_connection.py", line 134, in execute
self._django_transaction.commit_unless_managed()
File "/usr/lib64/python2.7/site-packages/django/db/transaction.py", line 134, in commit_unless_managed
connection.commit_unless_managed()
File "/usr/lib64/python2.7/site-packages/django/db/backends/__init__.py", line 221, in commit_unless_managed
self._commit()
File "/usr/lib64/python2.7/site-packages/django/db/backends/__init__.py", line 55, in _commit
return self.connection.commit()
OperationalError: (2006, 'MySQL server has gone away')
,
Jun 21 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/bbd3ccd1533907dff427ae513142c70d1c297370 commit bbd3ccd1533907dff427ae513142c70d1c297370 Author: Keith Haddow <haddowk@chromium.org> Date: Thu Jun 21 18:41:43 2018 [autotest] Remove normal exit so scheduler always respawns the scheduler will exit with a code 0 even if there was an error ( like the db connection being lost ) the root cause should be solved but for now go back to always respawn the scheduler. BUG= chromium:846773 TEST=build and tested on moblab Change-Id: Ia218a76297c0d336d8197567e248310e6907f3b5 Reviewed-on: https://chromium-review.googlesource.com/1110299 Reviewed-by: Keith Haddow <haddowk@chromium.org> Commit-Queue: Keith Haddow <haddowk@chromium.org> Tested-by: Keith Haddow <haddowk@chromium.org> [modify] https://crrev.com/bbd3ccd1533907dff427ae513142c70d1c297370/project-moblab/chromeos-base/chromeos-bsp-moblab/files/init/moblab-scheduler-init.conf
,
Jun 21 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/eb9a30e1e497ff3376bc6239d972142b30c96921 commit eb9a30e1e497ff3376bc6239d972142b30c96921 Author: Keith Haddow <haddowk@chromium.org> Date: Thu Jun 21 18:41:45 2018 [autotest] Remove normal exit so scheduler always respawns the scheduler will exit with a code 0 even if there was an error ( like the db connection being lost ) the root cause should be solved but for now go back to always respawn the scheduler. BUG= chromium:846773 TEST=build and tested on moblab Change-Id: Ia218a76297c0d336d8197567e248310e6907f3b5 Reviewed-on: https://chromium-review.googlesource.com/1110300 Reviewed-by: Keith Haddow <haddowk@chromium.org> Commit-Queue: Keith Haddow <haddowk@chromium.org> Tested-by: Keith Haddow <haddowk@chromium.org> [modify] https://crrev.com/eb9a30e1e497ff3376bc6239d972142b30c96921/project-moblab/chromeos-base/chromeos-bsp-moblab/files/init/moblab-scheduler-init.conf
,
Jun 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/5fff905c1d95f66368732363228a63bd5619986a commit 5fff905c1d95f66368732363228a63bd5619986a Author: Keith Haddow <haddowk@chromium.org> Date: Fri Jun 22 02:56:32 2018 [autotest] Remove normal exit so scheduler always respawns the scheduler will exit with a code 0 even if there was an error ( like the db connection being lost ) the root cause should be solved but for now go back to always respawn the scheduler. BUG= chromium:846773 TEST=build and tested on moblab Change-Id: Ia218a76297c0d336d8197567e248310e6907f3b5 Reviewed-on: https://chromium-review.googlesource.com/1110477 Commit-Ready: Keith Haddow <haddowk@chromium.org> Tested-by: Keith Haddow <haddowk@chromium.org> Reviewed-by: Keith Haddow <haddowk@chromium.org> Reviewed-by: Matt Mallett <mattmallett@chromium.org> [modify] https://crrev.com/5fff905c1d95f66368732363228a63bd5619986a/project-moblab/chromeos-base/chromeos-bsp-moblab/files/init/moblab-scheduler-init.conf
,
Jun 28 2018
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by haddowk@chromium.org
, Jun 8 2018Owner: haddowk@chromium.org
Status: Started (was: Untriaged)