New issue
Advanced search Search tips

Issue 735144 link

Starred by 1 user

Issue metadata

Status: Archived
Merged: issue 735156
Owner: ----
Closed: Mar 2018
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on:
issue 691156
issue 696694

Blocking:
issue 735121



Sign in to add a comment

servo_repair failure: Failed: servod service is taking calls

Project Member Reported by pprabhu@chromium.org, Jun 20 2017

Issue description

This is another failure mode for servo initialization. This impacts servo_repair ability. It also slows down verify and provision because we try to servo_host.verify which wastes 1 minute whenever this failure mode happens:

Symptom:
06/20 11:55:47.230 INFO |        server_job:0199| 	GOOD	----	verify.job	timestamp=1497984947	localtime=Jun 20 11:55:47	
06/20 11:55:47.231 INFO |            repair:0327| Verifying this condition: servod service is taking calls
06/20 11:55:47.232 DEBUG|      abstract_ssh:0895| Full tunnel command: /usr/bin/ssh -a -x -n -N -q -L 51631:localhost:9999 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=180 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row13-rack1-host8-servo
06/20 11:55:47.326 DEBUG|      abstract_ssh:0903| Started ssh tunnel, local = 51631 remote = 9999, pid = 9295
06/20 11:55:47.328 INFO |rpc_server_tracker:0196| Waiting 60 seconds for XMLRPC server to start.
06/20 11:55:47.331 WARNI|             retry:0228| <class 'socket.error'>([Errno 111] Connection refused)
06/20 11:55:47.332 WARNI|             retry:0183| Retrying in 0.653886 seconds...
06/20 11:55:47.991 WARNI|             retry:0228| <class 'socket.error'>([Errno 111] Connection refused)
06/20 11:55:47.991 WARNI|             retry:0183| Retrying in 1.143166 seconds...
06/20 11:55:49.141 WARNI|             retry:0228| <class 'socket.error'>([Errno 111] Connection refused)
06/20 11:55:49.142 WARNI|             retry:0183| Retrying in 1.339090 seconds...
06/20 11:55:50.617 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:50.617 WARNI|             retry:0183| Retrying in 1.471007 seconds...
06/20 11:55:52.224 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:52.224 WARNI|             retry:0183| Retrying in 0.692521 seconds...
06/20 11:55:53.049 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:53.050 WARNI|             retry:0183| Retrying in 1.096108 seconds...
06/20 11:55:54.279 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:54.279 WARNI|             retry:0183| Retrying in 1.374995 seconds...
06/20 11:55:55.789 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:55.790 WARNI|             retry:0183| Retrying in 1.499037 seconds...
06/20 11:55:57.427 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:57.428 WARNI|             retry:0183| Retrying in 0.930749 seconds...
06/20 11:55:58.493 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:58.493 WARNI|             retry:0183| Retrying in 1.027220 seconds...
06/20 11:55:59.654 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:55:59.654 WARNI|             retry:0183| Retrying in 0.803959 seconds...
06/20 11:56:00.591 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:00.591 WARNI|             retry:0183| Retrying in 1.330823 seconds...
06/20 11:56:02.057 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:02.057 WARNI|             retry:0183| Retrying in 1.006909 seconds...
06/20 11:56:03.198 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:03.199 WARNI|             retry:0183| Retrying in 1.416251 seconds...
06/20 11:56:04.749 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:04.750 WARNI|             retry:0183| Retrying in 1.479887 seconds...
06/20 11:56:06.366 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:06.367 WARNI|             retry:0183| Retrying in 1.354130 seconds...
06/20 11:56:07.857 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:07.858 WARNI|             retry:0183| Retrying in 1.297230 seconds...
06/20 11:56:09.291 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:09.291 WARNI|             retry:0183| Retrying in 0.659260 seconds...
06/20 11:56:10.082 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:10.083 WARNI|             retry:0183| Retrying in 1.289981 seconds...
06/20 11:56:11.508 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:11.509 WARNI|             retry:0183| Retrying in 0.930766 seconds...
06/20 11:56:12.574 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:12.575 WARNI|             retry:0183| Retrying in 1.493085 seconds...
06/20 11:56:14.204 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:14.205 WARNI|             retry:0183| Retrying in 0.809355 seconds...
06/20 11:56:15.149 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:15.150 WARNI|             retry:0183| Retrying in 1.082790 seconds...
06/20 11:56:16.367 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:16.368 WARNI|             retry:0183| Retrying in 1.196846 seconds...
06/20 11:56:17.700 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:17.701 WARNI|             retry:0183| Retrying in 1.341930 seconds...
06/20 11:56:19.179 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:19.180 WARNI|             retry:0183| Retrying in 0.764220 seconds...
06/20 11:56:20.076 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:20.077 WARNI|             retry:0183| Retrying in 1.251292 seconds...
06/20 11:56:21.462 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:21.463 WARNI|             retry:0183| Retrying in 1.282019 seconds...
06/20 11:56:22.880 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:22.880 WARNI|             retry:0183| Retrying in 1.218999 seconds...
06/20 11:56:24.234 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:24.234 WARNI|             retry:0183| Retrying in 1.220819 seconds...
06/20 11:56:25.590 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:25.590 WARNI|             retry:0183| Retrying in 1.087725 seconds...
06/20 11:56:26.812 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:26.813 WARNI|             retry:0183| Retrying in 0.546023 seconds...
06/20 11:56:27.491 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:27.492 WARNI|             retry:0183| Retrying in 1.143919 seconds...
06/20 11:56:28.770 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:28.771 WARNI|             retry:0183| Retrying in 1.285926 seconds...
06/20 11:56:30.192 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:30.193 WARNI|             retry:0183| Retrying in 0.609070 seconds...
06/20 11:56:30.934 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:30.934 WARNI|             retry:0183| Retrying in 0.720766 seconds...
06/20 11:56:31.788 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:31.788 WARNI|             retry:0183| Retrying in 0.645937 seconds...
06/20 11:56:32.566 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:32.567 WARNI|             retry:0183| Retrying in 0.589833 seconds...
06/20 11:56:33.288 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:33.289 WARNI|             retry:0183| Retrying in 1.368165 seconds...
06/20 11:56:34.793 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:34.793 WARNI|             retry:0183| Retrying in 1.324737 seconds...
06/20 11:56:36.251 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:36.251 WARNI|             retry:0183| Retrying in 0.540796 seconds...
06/20 11:56:36.924 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:36.924 WARNI|             retry:0183| Retrying in 1.145426 seconds...
06/20 11:56:38.204 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:38.205 WARNI|             retry:0183| Retrying in 0.855913 seconds...
06/20 11:56:39.194 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:39.194 WARNI|             retry:0183| Retrying in 0.516261 seconds...
06/20 11:56:39.842 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:39.843 WARNI|             retry:0183| Retrying in 0.674996 seconds...
06/20 11:56:40.650 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:40.650 WARNI|             retry:0183| Retrying in 0.572320 seconds...
06/20 11:56:41.355 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:41.355 WARNI|             retry:0183| Retrying in 0.924346 seconds...
06/20 11:56:42.412 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:42.413 WARNI|             retry:0183| Retrying in 1.102302 seconds...
06/20 11:56:43.650 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:43.650 WARNI|             retry:0183| Retrying in 1.435181 seconds...
06/20 11:56:45.221 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:45.222 WARNI|             retry:0183| Retrying in 1.034024 seconds...
06/20 11:56:46.390 WARNI|             retry:0228| <class 'httplib.BadStatusLine'>('')
06/20 11:56:46.391 ERROR|rpc_server_tracker:0201| Failed to start XMLRPC server.
06/20 11:56:46.392 DEBUG|      abstract_ssh:0918| Terminated tunnel, pid 9295
06/20 11:56:46.392 ERROR|            repair:0332| Failed: servod service is taking calls
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 329, in _verify_host
    self.verify(host)
  File "/usr/local/autotest/server/hosts/servo_repair.py", line 211, in verify
    host.connect_servo()
  File "/usr/local/autotest/server/hosts/servo_host.py", line 132, in connect_servo
    servo_obj = servo.Servo(servo_host=self, servo_serial=self.servo_serial)
  File "/usr/local/autotest/server/cros/servo/servo.py", line 176, in __init__
    self._server = servo_host.get_servod_server_proxy()
  File "/usr/local/autotest/server/hosts/servo_host.py", line 185, in get_servod_server_proxy
    timeout_seconds=60)
  File "/usr/local/autotest/server/hosts/rpc_server_tracker.py", line 197, in xmlrpc_connect
    ready_test()
  File "/usr/local/autotest/client/common_lib/cros/retry.py", line 218, in func_retry
    remaining_time)
  File "/usr/local/autotest/client/common_lib/cros/retry.py", line 123, in timeout
    default_result = func(*args, **kwargs)
  File "/usr/local/autotest/server/hosts/rpc_server_tracker.py", line 192, in ready_test
    getattr(proxy, ready_test_name)()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1303, in single_request
    response = h.getresponse(buffering=True)
  File "/usr/lib/python2.7/httplib.py", line 1089, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 444, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 408, in _read_status
    raise BadStatusLine(line)
BadStatusLine: ''
06/20 11:56:46.397 INFO |        server_job:0199| 	FAIL	----	verify.servod	timestamp=1497985006	localtime=Jun 20 11:56:46	''


Optional: It then tries to restart servod, then it fails this same verification again. (wasting another 1+ minute)

Other bugs where this happens added as blockers below.
 
Blockedon: 691156 696694
Blocking: 735121
Mergedinto: 735156
Status: Duplicate (was: Untriaged)
Status: Unconfirmed (was: Duplicate)
This is not the same as  issue 735156 

This bug is about handling the particular servo_host failure better (and also refers to other bugs where this handling affects provision, verify etc)

 issue 735156  is about one particular reason this is happening for nyan_kitty.
Also, I want this bug open so that I can collect most/all servo_repair failure modes to track for  issue 735121 . Please talk to me IRL if anything is unclear here.
> This bug is about handling the particular servo_host
> failure better (and also refers to other bugs where this
> handling affects provision, verify etc)

What's the proposed better handling?

Note:
  * Servo verify will accurately say what can be determined,
    which is a) whether the servod job is running and b) if
    running, whether the service answers RPC calls.
  * Servo repair will try and restart servod in the case that
    either of those checks fails.

Status: Archived (was: Unconfirmed)
Bulk closing old unconfirmed issues.

Sign in to add a comment