Follow-up to crbug.com/714571
EXCEPTION: Uncaught exception; terminating monitor_db
Traceback (most recent call last):
File "/usr/local/autotest/scheduler/monitor_db.py", line 179, in main_without_exception_handling
dispatcher.initialize(recover_hosts=options.recover_hosts)
File "/usr/local/autotest/scheduler/monitor_db.py", line 336, in initialize
self._recover_processes()
File "/usr/local/autotest/scheduler/monitor_db.py", line 491, in _recover_processes
agent_tasks = self._create_recovery_agent_tasks()
File "/usr/local/autotest/scheduler/monitor_db.py", line 506, in _create_recovery_agent_tasks
+ self._get_special_task_agent_tasks(is_active=True))
File "/usr/local/autotest/scheduler/monitor_db.py", line 560, in _get_special_task_agent_tasks
for task in special_tasks]
File "/usr/local/autotest/scheduler/monitor_db.py", line 637, in _get_agent_task_for_special_task
return agent_task_class(task=special_task)
File "/usr/local/autotest/scheduler/prejob_task.py", line 368, in __init__
self._set_ids(host=self.host, queue_entries=[self.queue_entry])
File "/usr/local/autotest/scheduler/agent_task.py", line 166, in _set_ids
self.host_ids = [entry.host.id for entry in queue_entries]
AttributeError: 'NoneType' object has no attribute 'id'
When faced with invalid entries such as this, we should probably raise a purpose-specific exception and log it rather than crashinglooping.
Possible approaches proposed:
1 Catch and log exception in the tick.
2 Crash when seeing this error, but modify the scheduler start-up db_cleanup phase to fix these problems.
3 Get sentinel service to fix these problems.
My suggestion is #1 is probably the easiest, and also would lead to lowest impact if/when we encounter this db inconsistency in future.
Comment 1 by dshi@chromium.org
, Apr 24 2017