Issue metadata
Sign in to add a comment
|
Moblab schedule: duplicate entry for host_queue_entries_job_id_and_host_id key |
||||||||||||||||||||||
Issue description
Dan's moblab's scheduler is stuck in a bad state.
IP is 100.96.48.101
Not sure how he got into this state.
08/03 11:33:48.996 DEBUG| rdb:0416| Host acquisition stats: distinct requests: 3, leased hosts: 1, unsatisfied requests: 79
08/03 11:33:48.996 INFO | scheduler_models:0530| Assigning host 192.168.231.100 to entry HQE: 1283, for job: 1276 and host: no host has status:Queued
08/03 11:33:49.005 ERROR| email_manager:0082| Uncaught exception; terminating monitor_db
Traceback (most recent call last):
File "/usr/local/autotest/scheduler/monitor_db.py", line 179, in main_without_exception_handling
dispatcher.tick()
File "/usr/local/autotest/scheduler/site_monitor_db.py", line 106, in tick
super(SiteDispatcher, self).tick()
File "/usr/local/autotest/scheduler/monitor_db.py", line 354, in tick
self._schedule_new_jobs()
File "/usr/local/autotest/scheduler/site_monitor_db.py", line 158, in _schedule_new_jobs
super(SiteDispatcher, self)._schedule_new_jobs()
File "/usr/local/autotest/scheduler/monitor_db.py", line 842, in _schedule_new_jobs
self._schedule_host_job(host_assignment.host, host_assignment.job)
File "/usr/local/autotest/scheduler/monitor_db.py", line 800, in _schedule_host_job
self._host_scheduler.schedule_host_job(host, queue_entry)
File "/usr/local/autotest/scheduler/host_scheduler.py", line 233, in schedule_host_job
queue_entry.set_host(host)
File "/usr/local/autotest/scheduler/scheduler_models.py", line 531, in set_host
self.update_field('host_id', host.id)
File "/usr/local/autotest/scheduler/scheduler_models.py", line 308, in update_field
_db.execute(query, (value, self.id))
File "/usr/local/autotest/database/database_connection.py", line 312, in execute
results = self._backend.execute(query, parameters)
File "/usr/local/autotest/database/database_connection.py", line 132, in execute
parameters=parameters)
File "/usr/local/autotest/database/database_connection.py", line 54, in execute
self._cursor.execute(query, parameters)
File "/usr/lib64/python2.7/site-packages/django/db/backends/mysql/base.py", line 122, in execute
six.reraise(utils.IntegrityError, utils.IntegrityError(*tuple(e.args)), sys.exc_info()[2])
File "/usr/lib64/python2.7/site-packages/django/db/backends/mysql/base.py", line 120, in execute
return self.cursor.execute(query, args)
File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 205, in execute
self.errorhandler(self, exc, value)
File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
IntegrityError: (1062, "Duplicate entry '1276-1' for key 'host_queue_entries_job_id_and_host_id'")
08/03 11:33:49.006 ERROR| email_manager:0054| monitor_db exception
EXCEPTION: Uncaught exception; terminating monitor_db
Traceback (most recent call last):
File "/usr/local/autotest/scheduler/monitor_db.py", line 179, in main_without_exception_handling
dispatcher.tick()
File "/usr/local/autotest/scheduler/site_monitor_db.py", line 106, in tick
super(SiteDispatcher, self).tick()
File "/usr/local/autotest/scheduler/monitor_db.py", line 354, in tick
self._schedule_new_jobs()
File "/usr/local/autotest/scheduler/site_monitor_db.py", line 158, in _schedule_new_jobs
super(SiteDispatcher, self)._schedule_new_jobs()
File "/usr/local/autotest/scheduler/monitor_db.py", line 842, in _schedule_new_jobs
self._schedule_host_job(host_assignment.host, host_assignment.job)
File "/usr/local/autotest/scheduler/monitor_db.py", line 800, in _schedule_host_job
self._host_scheduler.schedule_host_job(host, queue_entry)
File "/usr/local/autotest/scheduler/host_scheduler.py", line 233, in schedule_host_job
queue_entry.set_host(host)
File "/usr/local/autotest/scheduler/scheduler_models.py", line 531, in set_host
self.update_field('host_id', host.id)
File "/usr/local/autotest/scheduler/scheduler_models.py", line 308, in update_field
_db.execute(query, (value, self.id))
File "/usr/local/autotest/database/database_connection.py", line 312, in execute
results = self._backend.execute(query, parameters)
File "/usr/local/autotest/database/database_connection.py", line 132, in execute
parameters=parameters)
File "/usr/local/autotest/database/database_connection.py", line 54, in execute
self._cursor.execute(query, parameters)
File "/usr/lib64/python2.7/site-packages/django/db/backends/mysql/base.py", line 122, in execute
six.reraise(utils.IntegrityError, utils.IntegrityError(*tuple(e.args)), sys.exc_info()[2])
File "/usr/lib64/python2.7/site-packages/django/db/backends/mysql/base.py", line 120, in execute
return self.cursor.execute(query, args)
File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 205, in execute
self.errorhandler(self, exc, value)
File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
IntegrityError: (1062, "Duplicate entry '1276-1' for key 'host_queue_entries_job_id_and_host_id'")
,
Aug 3 2016
here are the job that caused this FAFT bios http://100.96.48.101/afe/#tab_id=view_job&object_id=1278 FAFT ec http://100.96.48.101/afe/#tab_id=view_job&object_id=1279
,
Aug 3 2016
Did you something similar to #4 in crbug.com/611064 . That's a unsupported flow that will cause issue. To fix your moblab, you can try to delete the row in problem from mysql.
,
Aug 4 2016
- goto http://100.96.48.101/afe/#tab_id=view_job&object_id=1196 - click Clone button and select similar host (default) - click submit job, it return with error about 0 host select (can't remember exactly), I then change the Use [1] host for execution and resubmit
,
Aug 4 2016
I terminate and resubmit the job, looks like the page don't complain about the 0 host any more. I resubmit the jobs http://100.96.48.101/afe/#tab_id=view_job&object_id=1438 (BIOS) http://100.96.48.101/afe/#tab_id=view_job&object_id=1412 (EC)
,
Aug 4 2016
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by sbasi@chromium.org
, Aug 3 2016