New issue
Advanced search Search tips

Issue 682729 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 673639
Owner: ----
Closed: Apr 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

scheduler took long time to restart

Project Member Reported by dshi@chromium.org, Jan 19 2017

Issue description

Following is the tail of scheduler log after shutdown request received:

01/19 09:35:07.478 INFO |        monitor_db:0204| Shutdown request received.
01/19 09:35:07.479 WARNI|        base_utils:0862| (4, 'Interrupted system call')
01/19 09:35:07.480 CRITI|            drones:0097| Invalid response:
---

---
01/19 09:35:07.487 ERROR|     email_manager:0082| Uncaught exception; terminating monitor_db
Traceback (most recent call last):
  File "/usr/local/autotest/scheduler/monitor_db.py", line 180, in main_without_exception_handling
    dispatcher.tick()
  File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 274, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/autotest/scheduler/monitor_db.py", line 398, in tick
    metrics.Counter('chromeos/autotest/scheduler/tick').increment()
  File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 334, in __exit__
    outer_timer.add(self._total_time_s, fields=self._fields)
  File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 68, in enqueue
    reset_after=self.reset_after))
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod
    kind, result = conn.recv()
IOError: [Errno 104] Connection reset by peer
01/19 09:35:07.488 ERROR|     email_manager:0054| monitor_db exception
EXCEPTION: Uncaught exception; terminating monitor_db
Traceback (most recent call last):
  File "/usr/local/autotest/scheduler/monitor_db.py", line 180, in main_without_exception_handling
    dispatcher.tick()
  File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 274, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/autotest/scheduler/monitor_db.py", line 398, in tick
    metrics.Counter('chromeos/autotest/scheduler/tick').increment()
  File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 334, in __exit__
    outer_timer.add(self._total_time_s, fields=self._fields)
  File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 68, in enqueue
    reset_after=self.reset_after))
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in _callmethod
    kind, result = conn.recv()
IOError: [Errno 104] Connection reset by peer

01/19 09:35:07.489 ERROR|        monitor_db:0100| Exception escaping in monitor_db
Traceback (most recent call last):
  File "/usr/local/autotest/scheduler/monitor_db.py", line 96, in main
    main_without_exception_handling()
  File "/usr/local/autotest/scheduler/monitor_db.py", line 192, in main_without_exception_handling
    "Uncaught exception; terminating monitor_db")
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/local/autotest/site-packages/chromite/lib/ts_mon_config.py", line 149, in _CreateTsMonFlushingProcess
    message_q.put(None)
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe

It took almost 4 mins before the scheduler is able to be restarted.
 
Cc: ayatane@chromium.org

Comment 2 by dshi@chromium.org, Jan 23 2017

Cc: -ayatane@chromium.org
Labels: -current-issue
Owner: ayatane@chromium.org
ayatane@, can you take a look at this?
Owner: ----
Not working on for fixit right this moment.
Mergedinto: 673639
Status: Duplicate (was: Untriaged)

Sign in to add a comment