New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 878188 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

chromeos15-row1-rack5-host3 got stuck repairing

Reported by jrbarnette@chromium.org, Aug 28

Issue description

Host chromeos15-row1-rack5-host3 is stuck in state "Repairing",
and unable to make forward progress.

 
The board is nyan_big, and the shard is chromeos-skunk-1.

Logging in there, you find there's still an active repair job:
553222-repair/20182408092243

Looking in autoserv.DEBUG, it ends like this:
08/24 09:26:43.806 ERROR|           process:0274| Process Process-6:
08/24 09:26:43.806 ERROR|         traceback:0013| Traceback (most recent call last):
08/24 09:26:43.807 ERROR|         traceback:0013|   File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
08/24 09:26:43.807 ERROR|         traceback:0013|     self.run()
08/24 09:26:43.807 ERROR|         traceback:0013|   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
08/24 09:26:43.808 ERROR|         traceback:0013|     self._target(*self._args, **self._kwargs)
08/24 09:26:43.808 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 506, in get_devserver_load_wrapper
08/24 09:26:43.808 ERROR|         traceback:0013|     output.put(load)
08/24 09:26:43.808 ERROR|         traceback:0013|   File "/usr/lib/python2.7/multiprocessing/queues.py", line 107, in put
08/24 09:26:43.809 ERROR|         traceback:0013|     self._start_thread()
08/24 09:26:43.809 ERROR|         traceback:0013|   File "/usr/lib/python2.7/multiprocessing/queues.py", line 191, in _start_thread
08/24 09:26:43.809 ERROR|         traceback:0013|     self._thread.start()
08/24 09:26:43.809 ERROR|           process:0274| Process Process-9:
08/24 09:26:43.810 ERROR|         traceback:0013|   File "/usr/lib/python2.7/threading.py", line 745, in start
08/24 09:26:43.810 ERROR|         traceback:0013| Traceback (most recent call last):
08/24 09:26:43.810 ERROR|         traceback:0013|   File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
08/24 09:26:43.810 ERROR|         traceback:0013|     _start_new_thread(self.__bootstrap, ())
08/24 09:26:43.810 ERROR|         traceback:0013|     self.run()
08/24 09:26:43.811 ERROR|         traceback:0013| error: can't start new thread
08/24 09:26:43.811 ERROR|         traceback:0013|   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
08/24 09:26:43.811 ERROR|         traceback:0013|     self._target(*self._args, **self._kwargs)
08/24 09:26:43.811 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 506, in get_devserver_load_wrapper
08/24 09:26:43.811 ERROR|         traceback:0013|     output.put(load)
08/24 09:26:43.812 ERROR|         traceback:0013|   File "/usr/lib/python2.7/multiprocessing/queues.py", line 107, in put
08/24 09:26:43.812 ERROR|         traceback:0013|     self._start_thread()
08/24 09:26:43.812 ERROR|         traceback:0013|   File "/usr/lib/python2.7/multiprocessing/queues.py", line 191, in _start_thread
08/24 09:26:43.813 ERROR|         traceback:0013|     self._thread.start()
08/24 09:26:43.813 ERROR|         traceback:0013|   File "/usr/lib/python2.7/threading.py", line 745, in start
08/24 09:26:43.813 ERROR|         traceback:0013|     _start_new_thread(self.__bootstrap, ())
08/24 09:26:43.814 ERROR|         traceback:0013| error: can't start new thread

Status: Available (was: Untriaged)
Summary: chromeos15-row1-rack5-host3 got stuck repairing (was: chromeos15-row1-rack5-host3 is stuck repairing)
I ran this command:
    $ ps -ef | awk '/chromeos15-row1-rack5-host3/ {print $2}' | xargs kill

That seems to have set things in motion:

    $ atest host list chromeos15-row1-rack5-host3
    Host                         Status        [ ... ]
    chromeos15-row1-rack5-host3  Provisioning  [ ... ]

So, back in action.

I'm not quite sure why the 'autoserv' process got stuck like this...
We might want to investigate.

Cc: cros-conn-test-team@google.com

Sign in to add a comment