From master scheduler logs:
12/04 08:01:49.847 DEBUG| monitor_db:1238| Starting _run_cleanup
12/04 08:01:49.847 INFO |monitor_db_cleanup:0069| Running periodic cleanup
12/04 08:01:49.847 INFO |monitor_db_cleanup:0080| Aborting all jobs that have timed out and are not complete
12/04 08:02:30.381 WARNI|monitor_db_cleanup:0412| #### START: timed out jobs (total: 1) ####
12/04 08:02:30.381 WARNI|monitor_db_cleanup:0089| Aborting job 262562930 due to job timeout
12/04 08:02:30.386 WARNI|monitor_db_cleanup:0416| #### END: timed out jobs (total: 1) ####
12/04 08:02:30.387 INFO |monitor_db_cleanup:0098| Aborting all jobs that have passed maximum runtime
...
12/04 08:07:26.417 WARNI| metrics:0091| Flushing process has been closed (exit code -15), skipped sending metric 'FloatMetric'
12/04 08:07:26.417 WARNI| metrics:0091| Flushing process has been closed (exit code -15), skipped sending metric 'PercentageDistribution'
...
Traceback (most recent call last):
File "/usr/local/autotest/scheduler/monitor_db.py", line 193, in main_without_exception_handling
dispatcher.tick()
File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 492, in wrapper
return fn(*args, **kwargs)
File "/usr/local/autotest/scheduler/monitor_db.py", line 399, in tick
self._run_cleanup()
File "/usr/local/autotest/scheduler/monitor_db.py", line 306, in wrapper
return func(self, *args, **kwargs)
File "/usr/local/autotest/scheduler/monitor_db.py", line 426, in _run_cleanup
self._periodic_cleanup.run_cleanup_maybe()
File "/usr/local/autotest/scheduler/monitor_db_cleanup.py", line 48, in run_cleanup_maybe
self._cleanup()
File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 492, in wrapper
return fn(*args, **kwargs)
File "/usr/local/autotest/scheduler/monitor_db_cleanup.py", line 71, in _cleanup
self._abort_jobs_past_max_runtime()
File "/usr/local/autotest/scheduler/monitor_db_cleanup.py", line 104, in _abort_jobs_past_max_runtime
""")
File "/usr/local/autotest/database/database_connection.py", line 304, in execute
results = self._backend.execute(query, parameters)
File "/usr/local/autotest/database/database_connection.py", line 134, in execute
self._django_transaction.commit_unless_managed()
File "/usr/local/autotest/site-packages/django/db/transaction.py", line 134, in commit_unless_managed
connection.commit_unless_managed()
File "/usr/local/autotest/site-packages/django/db/backends/__init__.py", line 221, in commit_unless_managed
self._commit()
File "/usr/local/autotest/site-packages/django/db/backends/__init__.py", line 55, in _commit
return self.connection.commit()
ProgrammingError: (2014, "Commands out of sync; you can't run this command now")
------------------------
Also note that metrics process has been crashing on shutdown / crash consistently. So we're likely dropping crash metrics on the floor.
I'm filing this bug as a low priority FYI. I doubt we want to invest in fixing such issues in monitor_db at this point.
Comment 1 by ayatane@chromium.org
, Jan 9Status: Available (was: Untriaged)