Issue metadata
Sign in to add a comment
|
servo_repair failure: Failed: servod service is taking calls |
||||||||||||||||||||||
Issue description
This is another failure mode for servo initialization. This impacts servo_repair ability. It also slows down verify and provision because we try to servo_host.verify which wastes 1 minute whenever this failure mode happens:
Symptom:
06/20 11:55:47.230 INFO | server_job:0199| GOOD ---- verify.job timestamp=1497984947 localtime=Jun 20 11:55:47
06/20 11:55:47.231 INFO | repair:0327| Verifying this condition: servod service is taking calls
06/20 11:55:47.232 DEBUG| abstract_ssh:0895| Full tunnel command: /usr/bin/ssh -a -x -n -N -q -L 51631:localhost:9999 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=180 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row13-rack1-host8-servo
06/20 11:55:47.326 DEBUG| abstract_ssh:0903| Started ssh tunnel, local = 51631 remote = 9999, pid = 9295
06/20 11:55:47.328 INFO |rpc_server_tracker:0196| Waiting 60 seconds for XMLRPC server to start.
06/20 11:55:47.331 WARNI| retry:0228| <class 'socket.error'>([Errno 111] Connection refused)
06/20 11:55:47.332 WARNI| retry:0183| Retrying in 0.653886 seconds...
06/20 11:55:47.991 WARNI| retry:0228| <class 'socket.error'>([Errno 111] Connection refused)
06/20 11:55:47.991 WARNI| retry:0183| Retrying in 1.143166 seconds...
06/20 11:55:49.141 WARNI| retry:0228| <class 'socket.error'>([Errno 111] Connection refused)
06/20 11:55:49.142 WARNI| retry:0183| Retrying in 1.339090 seconds...
06/20 11:55:50.617 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:50.617 WARNI| retry:0183| Retrying in 1.471007 seconds...
06/20 11:55:52.224 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:52.224 WARNI| retry:0183| Retrying in 0.692521 seconds...
06/20 11:55:53.049 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:53.050 WARNI| retry:0183| Retrying in 1.096108 seconds...
06/20 11:55:54.279 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:54.279 WARNI| retry:0183| Retrying in 1.374995 seconds...
06/20 11:55:55.789 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:55.790 WARNI| retry:0183| Retrying in 1.499037 seconds...
06/20 11:55:57.427 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:57.428 WARNI| retry:0183| Retrying in 0.930749 seconds...
06/20 11:55:58.493 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:58.493 WARNI| retry:0183| Retrying in 1.027220 seconds...
06/20 11:55:59.654 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:59.654 WARNI| retry:0183| Retrying in 0.803959 seconds...
06/20 11:56:00.591 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:00.591 WARNI| retry:0183| Retrying in 1.330823 seconds...
06/20 11:56:02.057 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:02.057 WARNI| retry:0183| Retrying in 1.006909 seconds...
06/20 11:56:03.198 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:03.199 WARNI| retry:0183| Retrying in 1.416251 seconds...
06/20 11:56:04.749 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:04.750 WARNI| retry:0183| Retrying in 1.479887 seconds...
06/20 11:56:06.366 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:06.367 WARNI| retry:0183| Retrying in 1.354130 seconds...
06/20 11:56:07.857 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:07.858 WARNI| retry:0183| Retrying in 1.297230 seconds...
06/20 11:56:09.291 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:09.291 WARNI| retry:0183| Retrying in 0.659260 seconds...
06/20 11:56:10.082 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:10.083 WARNI| retry:0183| Retrying in 1.289981 seconds...
06/20 11:56:11.508 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:11.509 WARNI| retry:0183| Retrying in 0.930766 seconds...
06/20 11:56:12.574 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:12.575 WARNI| retry:0183| Retrying in 1.493085 seconds...
06/20 11:56:14.204 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:14.205 WARNI| retry:0183| Retrying in 0.809355 seconds...
06/20 11:56:15.149 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:15.150 WARNI| retry:0183| Retrying in 1.082790 seconds...
06/20 11:56:16.367 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:16.368 WARNI| retry:0183| Retrying in 1.196846 seconds...
06/20 11:56:17.700 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:17.701 WARNI| retry:0183| Retrying in 1.341930 seconds...
06/20 11:56:19.179 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:19.180 WARNI| retry:0183| Retrying in 0.764220 seconds...
06/20 11:56:20.076 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:20.077 WARNI| retry:0183| Retrying in 1.251292 seconds...
06/20 11:56:21.462 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:21.463 WARNI| retry:0183| Retrying in 1.282019 seconds...
06/20 11:56:22.880 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:22.880 WARNI| retry:0183| Retrying in 1.218999 seconds...
06/20 11:56:24.234 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:24.234 WARNI| retry:0183| Retrying in 1.220819 seconds...
06/20 11:56:25.590 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:25.590 WARNI| retry:0183| Retrying in 1.087725 seconds...
06/20 11:56:26.812 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:26.813 WARNI| retry:0183| Retrying in 0.546023 seconds...
06/20 11:56:27.491 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:27.492 WARNI| retry:0183| Retrying in 1.143919 seconds...
06/20 11:56:28.770 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:28.771 WARNI| retry:0183| Retrying in 1.285926 seconds...
06/20 11:56:30.192 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:30.193 WARNI| retry:0183| Retrying in 0.609070 seconds...
06/20 11:56:30.934 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:30.934 WARNI| retry:0183| Retrying in 0.720766 seconds...
06/20 11:56:31.788 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:31.788 WARNI| retry:0183| Retrying in 0.645937 seconds...
06/20 11:56:32.566 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:32.567 WARNI| retry:0183| Retrying in 0.589833 seconds...
06/20 11:56:33.288 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:33.289 WARNI| retry:0183| Retrying in 1.368165 seconds...
06/20 11:56:34.793 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:34.793 WARNI| retry:0183| Retrying in 1.324737 seconds...
06/20 11:56:36.251 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:36.251 WARNI| retry:0183| Retrying in 0.540796 seconds...
06/20 11:56:36.924 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:36.924 WARNI| retry:0183| Retrying in 1.145426 seconds...
06/20 11:56:38.204 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:38.205 WARNI| retry:0183| Retrying in 0.855913 seconds...
06/20 11:56:39.194 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:39.194 WARNI| retry:0183| Retrying in 0.516261 seconds...
06/20 11:56:39.842 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:39.843 WARNI| retry:0183| Retrying in 0.674996 seconds...
06/20 11:56:40.650 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:40.650 WARNI| retry:0183| Retrying in 0.572320 seconds...
06/20 11:56:41.355 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:41.355 WARNI| retry:0183| Retrying in 0.924346 seconds...
06/20 11:56:42.412 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:42.413 WARNI| retry:0183| Retrying in 1.102302 seconds...
06/20 11:56:43.650 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:43.650 WARNI| retry:0183| Retrying in 1.435181 seconds...
06/20 11:56:45.221 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:45.222 WARNI| retry:0183| Retrying in 1.034024 seconds...
06/20 11:56:46.390 WARNI| retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:46.391 ERROR|rpc_server_tracker:0201| Failed to start XMLRPC server.
06/20 11:56:46.392 DEBUG| abstract_ssh:0918| Terminated tunnel, pid 9295
06/20 11:56:46.392 ERROR| repair:0332| Failed: servod service is taking calls
Traceback (most recent call last):
File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 329, in _verify_host
self.verify(host)
File "/usr/local/autotest/server/hosts/servo_repair.py", line 211, in verify
host.connect_servo()
File "/usr/local/autotest/server/hosts/servo_host.py", line 132, in connect_servo
servo_obj = servo.Servo(servo_host=self, servo_serial=self.servo_serial)
File "/usr/local/autotest/server/cros/servo/servo.py", line 176, in __init__
self._server = servo_host.get_servod_server_proxy()
File "/usr/local/autotest/server/hosts/servo_host.py", line 185, in get_servod_server_proxy
timeout_seconds=60)
File "/usr/local/autotest/server/hosts/rpc_server_tracker.py", line 197, in xmlrpc_connect
ready_test()
File "/usr/local/autotest/client/common_lib/cros/retry.py", line 218, in func_retry
remaining_time)
File "/usr/local/autotest/client/common_lib/cros/retry.py", line 123, in timeout
default_result = func(*args, **kwargs)
File "/usr/local/autotest/server/hosts/rpc_server_tracker.py", line 192, in ready_test
getattr(proxy, ready_test_name)()
File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/lib/python2.7/xmlrpclib.py", line 1303, in single_request
response = h.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1089, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 408, in _read_status
raise BadStatusLine(line)
BadStatusLine: ''
06/20 11:56:46.397 INFO | server_job:0199| FAIL ---- verify.servod timestamp=1497985006 localtime=Jun 20 11:56:46 ''
Optional: It then tries to restart servod, then it fails this same verification again. (wasting another 1+ minute)
Other bugs where this happens added as blockers below.
,
Jun 20 2017
,
Jun 20 2017
,
Jun 20 2017
This is not the same as issue 735156 This bug is about handling the particular servo_host failure better (and also refers to other bugs where this handling affects provision, verify etc) issue 735156 is about one particular reason this is happening for nyan_kitty.
,
Jun 20 2017
Also, I want this bug open so that I can collect most/all servo_repair failure modes to track for issue 735121 . Please talk to me IRL if anything is unclear here.
,
Jun 22 2017
> This bug is about handling the particular servo_host
> failure better (and also refers to other bugs where this
> handling affects provision, verify etc)
What's the proposed better handling?
Note:
* Servo verify will accurately say what can be determined,
which is a) whether the servod job is running and b) if
running, whether the service answers RPC calls.
* Servo repair will try and restart servod in the case that
either of those checks fails.
,
Mar 31 2018
Bulk closing old unconfirmed issues. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by pprabhu@chromium.org
, Jun 20 2017