New issue
Advanced search Search tips

Issue 911730 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

master scheduler periodically crashes due to out-of-sync mysql commands

Project Member Reported by pprabhu@chromium.org, Dec 4

Issue description

From master scheduler logs:


12/04 08:01:49.847 DEBUG|        monitor_db:1238| Starting _run_cleanup
12/04 08:01:49.847 INFO |monitor_db_cleanup:0069| Running periodic cleanup
12/04 08:01:49.847 INFO |monitor_db_cleanup:0080| Aborting all jobs that have timed out and are not complete
12/04 08:02:30.381 WARNI|monitor_db_cleanup:0412| #### START: timed out jobs (total: 1) ####
12/04 08:02:30.381 WARNI|monitor_db_cleanup:0089| Aborting job 262562930 due to job timeout
12/04 08:02:30.386 WARNI|monitor_db_cleanup:0416| #### END: timed out jobs (total: 1) ####
12/04 08:02:30.387 INFO |monitor_db_cleanup:0098| Aborting all jobs that have passed maximum runtime

...


12/04 08:07:26.417 WARNI|           metrics:0091| Flushing process has been closed (exit code -15), skipped sending metric 'FloatMetric'
12/04 08:07:26.417 WARNI|           metrics:0091| Flushing process has been closed (exit code -15), skipped sending metric 'PercentageDistribution'


...


Traceback (most recent call last):
  File "/usr/local/autotest/scheduler/monitor_db.py", line 193, in main_without_exception_handling
    dispatcher.tick()
  File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 492, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/autotest/scheduler/monitor_db.py", line 399, in tick
    self._run_cleanup()
  File "/usr/local/autotest/scheduler/monitor_db.py", line 306, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/autotest/scheduler/monitor_db.py", line 426, in _run_cleanup
    self._periodic_cleanup.run_cleanup_maybe()
  File "/usr/local/autotest/scheduler/monitor_db_cleanup.py", line 48, in run_cleanup_maybe
    self._cleanup()
  File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 492, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/autotest/scheduler/monitor_db_cleanup.py", line 71, in _cleanup
    self._abort_jobs_past_max_runtime()
  File "/usr/local/autotest/scheduler/monitor_db_cleanup.py", line 104, in _abort_jobs_past_max_runtime
    """)
  File "/usr/local/autotest/database/database_connection.py", line 304, in execute
    results = self._backend.execute(query, parameters)
  File "/usr/local/autotest/database/database_connection.py", line 134, in execute
    self._django_transaction.commit_unless_managed()
  File "/usr/local/autotest/site-packages/django/db/transaction.py", line 134, in commit_unless_managed
    connection.commit_unless_managed()
  File "/usr/local/autotest/site-packages/django/db/backends/__init__.py", line 221, in commit_unless_managed
    self._commit()
  File "/usr/local/autotest/site-packages/django/db/backends/__init__.py", line 55, in _commit
    return self.connection.commit()
ProgrammingError: (2014, "Commands out of sync; you can't run this command now")
------------------------


Also note that metrics process has been crashing on shutdown / crash consistently. So we're likely dropping crash metrics on the floor.


I'm filing this bug as a low priority FYI. I doubt we want to invest in fixing such issues in monitor_db at this point.

 
Labels: Hotlist-Deputy
Status: Available (was: Untriaged)

Sign in to add a comment