Repair task hits AFE for RPM information |
||
Issue descriptionExample task: https://chrome-swarming.appspot.com/task?id=3ea8f55168c9b711 Logs weren't offloaded due to issue 863192 I've manually marked the results directory for offload, so the results _should_ become available at https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/swarming-3ea8f55168c9b711 in a bit. From autoserv logs: 07/12 13:30:53.460 INFO | server_job:0216| START ---- repair.rpm timestamp=1531427453 localtime=Jul 12 13:30:53 07/12 13:30:53.773 ERROR| rpm_client:0044| <Fault 1: "<class 'rpm_infrastructure_exception.RPMInfrastructureException'>:Can not retrieve rpm information from AFE for chromeos4-row7-rack6-host19, no host found."> Traceback (most recent call last): File "/usr/local/autotest/site_utils/rpm_control_system/rpm_client.py", line 42, in set_power default_result=False) File "/usr/local/autotest/client/common_lib/cros/retry.py", line 123, in timeout default_result = func(*args, **kwargs) File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__ return self.__send(self.__name, args) File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request verbose=self.__verbose File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib/python2.7/xmlrpclib.py", line 1306, in single_request return self.parse_response(response) File "/usr/lib/python2.7/xmlrpclib.py", line 1482, in parse_response return u.close() File "/usr/lib/python2.7/xmlrpclib.py", line 794, in close raise Fault(**self._stack[0]) Fault: <Fault 1: "<class 'rpm_infrastructure_exception.RPMInfrastructureException'>:Can not retrieve rpm information from AFE for chromeos4-row7-rack6-host19, no host found."> 07/12 13:30:53.774 ERROR| repair:0507| Repair failed: Power cycle the host with RPM Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 505, in _repair_host self.repair(host) File "/usr/local/autotest/server/hosts/repair.py", line 92, in repair host.power_cycle() File "/usr/local/autotest/server/hosts/cros_host.py", line 1731, in power_cycle rpm_client.set_power(self.hostname, 'CYCLE') File "/usr/local/autotest/site_utils/rpm_control_system/rpm_client.py", line 46, in set_power 'Client call exception: ' + str(e)) RemotePowerException: Client call exception: <Fault 1: "<class 'rpm_infrastructure_exception.RPMInfrastructureException'>:Can not retrieve rpm information from AFE for chromeos4-row7-rack6-host19, no host found."> 07/12 13:30:53.775 INFO | server_job:0216| FAIL ---- repair.rpm timestamp=1531427453 localtime=Jul 12 13:30:53 Client call exception: <Fault 1: "<class 'rpm_infrastructure_exception.RPMInfrastructureException'>:Can not retrieve rpm information from AFE for chromeos4-row7-rack6-host19, no host found."> 07/12 13:30:53.775 INFO | server_job:0216| END FAIL ---- repair.rpm timestamp=1531427453 localtime=Jul 12 13:30:53 07/12 13:30:53.775 INFO | repair:0110| Attempting this repair action: Reset the DUT via keyboard sysrq-x
,
Jul 13
Step 1: Add new RPCs that allow DUTs to supply the required RPM information instead of having the rpm_server hit AFE behind our back: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1136026
,
Jul 13
> The RPM server itself hits the AFE to get powerunit information about the DUT. Why not change the RPM server to rely on the new source of truth?
,
Jul 17
Re #3: No servers in the lab should rely on Skylab services directly. We want to create a clear bifurcation between GCP services (which can call each other more freely) and baremetal / full stack deployments. The only flow of information from GCP services to the baremetal deployment will be via tasks running on skylab-drones. These tasks obtain all the information necessary to execute, and may then pass it around to stuff deployed within the lab.
,
Jul 17
Seems like this is needed in the current phase (mark skylab-based paladin important)
,
Jul 18
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/c32b49367c0ec053855f7ed0d6b9646ff9e4f9d9 commit c32b49367c0ec053855f7ed0d6b9646ff9e4f9d9 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Wed Jul 18 04:53:26 2018 rpm: Split _get_powerunit_info() BUG=chromium:863217 TEST=Locally run rpmserver and ensure behaviour matches prod for both DUT and servo. Change-Id: I1d4db04a7a3b17d9e4a94634e11b8402eb41bf1d Reviewed-on: https://chromium-review.googlesource.com/1136024 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Congbin Guo <guocb@chromium.org> [modify] https://crrev.com/c32b49367c0ec053855f7ed0d6b9646ff9e4f9d9/site_utils/rpm_control_system/frontend_server.py
,
Jul 18
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/bc41d1c23137900ffef33ef0c08bc3a96736e652 commit bc41d1c23137900ffef33ef0c08bc3a96736e652 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Wed Jul 18 08:40:06 2018 rpm: Extract _queue_once() in preparation for replacement RPCs for queue_requests() BUG=chromium:863217 TEST=Locally run rpmserver and ensure behaviour matches prod for both DUT and servo. Change-Id: Ib1d37df1ef99c9889be10f2f40cbd3a1f09ac0f7 Reviewed-on: https://chromium-review.googlesource.com/1136025 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> [modify] https://crrev.com/bc41d1c23137900ffef33ef0c08bc3a96736e652/site_utils/rpm_control_system/frontend_server.py
,
Jul 18
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/0a931caac805f598ade1fdf54a8f7aa189f82877 commit 0a931caac805f598ade1fdf54a8f7aa189f82877 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Wed Jul 18 11:08:11 2018 rpm: Add new RPCs to set power. These RPCs are intended to replace the old queue_request() RPC. BUG=chromium:863217 TEST=Locally run rpmserver and ensure behaviour matches prod for both DUT and servo. Change-Id: I661f5769a5799772006ba0513d46f7573711503d Reviewed-on: https://chromium-review.googlesource.com/1136026 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> [modify] https://crrev.com/0a931caac805f598ade1fdf54a8f7aa189f82877/site_utils/rpm_control_system/frontend_server.py
,
Jul 18
OK, now to push these changes to the rpmserver. We do not update rpmserver as part of push-to-prod. In fact, current git# of the autotest checkout is at: chromeos-test@chromeos-server160:/usr/local/autotest$ git log -1 commit 1e3b52e60ea3e764af2281e43b8ab7e2b567103a (HEAD, m/master, cros/prod-next, cros/prod, cros/master) Author: Sida Liu <sidal@chromium.org> Date: Fri Sep 22 10:34:18 2017 -0700 ------- Doing a manual update.
,
Jul 18
Manual server update steps: [1] chromeos-test@chromeos-server160:~/chromiumos$ repo sync [2] chromeos-test@chromeos-server160:/usr/local/autotest$ repo sync [3] # Updates the chromite checkout in site-packages: chromeos-test@chromeos-server160:/usr/local/autotest$ ./utils/build_externals.py [4] # Doesn't matter what branch. Anyway this is not tested / automatically deployed. chromeos-test@chromeos-server160:/usr/local/autotest$ git checkout cros/master [5] # Restart relevant services chromeos-test@chromeos-server160:/usr/local/autotest$ sudo service rpmserver_frontend_server stop rpmserver_frontend_server stop/waiting chromeos-test@chromeos-server160:/usr/local/autotest$ sudo service rpmserver_frontend_server start rpmserver_frontend_server start/running, process 225267 chromeos-test@chromeos-server160:/usr/local/autotest$ sudo service rpmserver_dispatcher stop rpmserver_dispatcher stop/waiting chromeos-test@chromeos-server160:/usr/local/autotest$ sudo service rpmserver_dispatcher start rpmserver_dispatcher start/running, process 230464 -------- chromeos-test@chromeos-server160:~$ tail /var/log/rpmserver/rpmserver_frontend_server.log 100.109.25.143 - - [18/Jul/2018 09:36:52] "POST /RPC2 HTTP/1.1" 200 - 100.109.25.143 - - [18/Jul/2018 09:36:53] "POST /RPC2 HTTP/1.1" 200 - 100.109.25.143 - - [18/Jul/2018 09:36:53] "POST /RPC2 HTTP/1.1" 200 - 100.109.25.143 - - [18/Jul/2018 09:36:54] "POST /RPC2 HTTP/1.1" 200 - 100.109.178.145 - - [18/Jul/2018 09:36:55] "POST /RPC2 HTTP/1.1" 200 - 100.109.178.145 - - [18/Jul/2018 09:36:55] "POST /RPC2 HTTP/1.1" 200 - 100.109.25.148 - - [18/Jul/2018 09:36:57] "POST /RPC2 HTTP/1.1" 200 - 100.109.25.143 - - [18/Jul/2018 09:36:58] "POST /RPC2 HTTP/1.1" 200 - 100.109.25.143 - - [18/Jul/2018 09:36:59] "POST /RPC2 HTTP/1.1" 200 - 100.108.189.50 - - [18/Jul/2018 09:37:01] "POST /RPC2 HTTP/1.1" 200 - chromeos-test@chromeos-server160:~$ tail /var/log/rpmserver/rpmserver_dispatcher.log 100.108.133.208 - - [18/Jul/2018 09:36:44] "POST /RPC2 HTTP/1.1" 200 - 100.108.133.208 - - [18/Jul/2018 09:36:47] "POST /RPC2 HTTP/1.1" 200 - 100.108.133.208 - - [18/Jul/2018 09:36:52] "POST /RPC2 HTTP/1.1" 200 - 100.108.133.208 - - [18/Jul/2018 09:36:53] "POST /RPC2 HTTP/1.1" 200 - 100.108.133.208 - - [18/Jul/2018 09:36:53] "POST /RPC2 HTTP/1.1" 200 - 100.108.133.208 - - [18/Jul/2018 09:36:54] "POST /RPC2 HTTP/1.1" 200 - 100.108.133.208 - - [18/Jul/2018 09:36:57] "POST /RPC2 HTTP/1.1" 200 - 100.108.133.208 - - [18/Jul/2018 09:36:58] "POST /RPC2 HTTP/1.1" 200 - 100.108.133.208 - - [18/Jul/2018 09:36:59] "POST /RPC2 HTTP/1.1" 200 - 100.108.133.208 - - [18/Jul/2018 09:37:01] "POST /RPC2 HTTP/1.1" 200 -
,
Dec 20
,
Dec 26
|
||
►
Sign in to add a comment |
||
Comment 1 by pprabhu@chromium.org
, Jul 12