site_sysinfo breaking backward compatibility causes autoserv crashes |
||||||||
Issue descriptionExample build: https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-paladin/builds/1660 The failed HWTest stage for this build: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=101129882 If you click any of the failed test in this suite and checked the debug log, you will find: Running (ssh) 'true' 02/13 17:13:47.959 DEBUG| abstract_ssh:0587| Host chromeos4-row6-rack9-host12 is now up 02/13 17:13:47.960 DEBUG| abstract_ssh:0346| get_file. source: /usr/local/autotest/results/default/, dest: /usr/local/autotest/results/101132185-chromeos-test/chromeos4-row6-rack9-host12, delete_dest: False,preserve_perm: True, preserve_symlinks:True 02/13 17:13:47.960 DEBUG| abstract_ssh:0357| Using Rsync. 02/13 17:13:47.960 DEBUG| base_utils:0185| Running 'rsync -l --timeout=1800 --rsh='/usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_1eRKtissh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g root@chromeos4-row6-rack9-host12:"/usr/local/autotest/results/default/" "/usr/local/autotest/results/101132185-chromeos-test/chromeos4-row6-rack9-host12"' 02/13 17:13:48.401 DEBUG| server_job:1370| Client state file /usr/local/autotest/results/101132185-chromeos-test/chromeos4-row6-rack9-host12/control.autoserv.state not found 02/13 17:13:48.402 DEBUG| base_job:0392| Persistent state client.* deleted 02/13 17:13:48.403 ERROR| server_job:0808| Exception escaped control file, job aborting: Traceback (most recent call last): File "/usr/local/autotest/server/server_job.py", line 800, in run self._execute_code(server_control_file, namespace) File "/usr/local/autotest/server/server_job.py", line 1300, in _execute_code execfile(code_file, namespace, namespace) File "/usr/local/autotest/results/101132185-chromeos-test/chromeos4-row6-rack9-host12/control.srv", line 10, in <module> job.parallel_simple(run_client, machines) File "/usr/local/autotest/server/server_job.py", line 624, in parallel_simple return_results=return_results) File "/usr/local/autotest/server/subcommand.py", line 93, in parallel_simple function(arg) File "/usr/local/autotest/results/101132185-chromeos-test/chromeos4-row6-rack9-host12/control.srv", line 7, in run_client at.run(control, host=host, use_packaging=use_packaging) File "/usr/local/autotest/server/autotest.py", line 375, in run client_disconnect_timeout, use_packaging=use_packaging) File "/usr/local/autotest/server/autotest.py", line 458, in _do_run client_disconnect_timeout=client_disconnect_timeout) File "/usr/local/autotest/server/autotest.py", line 892, in execute_control logger, client_disconnect_timeout) File "/usr/local/autotest/server/autotest.py", line 837, in execute_section raise err AutotestRunError: client job was aborted 02/13 17:13:48.404 INFO | server_job:0183| INFO ---- ---- timestamp=1487034828 job_abort_reason=client job was aborted localtime=Feb 13 17:13:48 client job was aborted 02/13 17:13:48.407 ERROR| logging_manager:0626| {'builds': "{'cros-version': 'cyan-paladin/R58-9281.0.0-rc3'}", 'job_started': 1487034802, 'hostname': 'chromeos4-row6-rack9-host12', 'status_version': 1, 'label': 'cyan-paladin/R58-9281.0.0-rc3/arc-bvt-cq/cheets_ContainerSmokeTest', 'parent_job_id': 101129882, 'drone': 'chromeos-server51.cbf.corp.google.com', 'user': 'chromeos-test', 'suite': 'arc-bvt-cq', 'job_queued': 1487034391, 'retry_original_job_id': 101130036, 'experimental': 'False', 'build': 'cyan-paladin/R58-9281.0.0-rc3'} 02/13 17:13:48.408 ERROR| logging_manager:0626| MACHINE NAME: chromeos4-row6-rack9-host12 02/13 17:13:48.408 ERROR| logging_manager:0626| MACHINE GROUP: cyan 02/13 17:13:48.408 ERROR| logging_manager:0626| parsing partial test ---- SERVER_JOB 02/13 17:13:48.408 ERROR| logging_manager:0626| 02/13 17:13:48.408 ERROR| logging_manager:0626| STATUS: INFO ---- ---- kernel=3.18.0-13908-gcc004d0 localtime=Feb 13 17:13:30 timestamp=1487034810 02/13 17:13:48.409 ERROR| logging_manager:0626| 02/13 17:13:48.409 ERROR| logging_manager:0626| STATUS: END ABORT ---- ---- timestamp=1487034827 localtime=Feb 13 17:13:47 Autotest client terminated unexpectedly: DUT is pingable, could not determine if an un-expected reboot occured during the test. 02/13 17:13:48.409 ERROR| logging_manager:0626| unexpected extra indentation, ignoring 02/13 17:13:48.409 ERROR| logging_manager:0626| 02/13 17:13:48.409 ERROR| logging_manager:0626| STATUS: INFO ---- ---- timestamp=1487034828 job_abort_reason=client job was aborted localtime=Feb 13 17:13:48 client job was aborted Similar failures occur on other boards
,
Feb 14 2017
Also found from the log: ... 'job_queued': 1487032857, 'experimental': 'False', 'build': 'daisy_skate-paladin/R58-9281.0.0-rc3'} 02/13 17:00:31.525 ERROR| logging_manager:0626| MACHINE NAME: chromeos4-row9-rack7-host13 02/13 17:00:31.526 ERROR| logging_manager:0626| MACHINE GROUP: daisy_skate 02/13 17:00:31.526 ERROR| logging_manager:0626| parsing partial test ---- SERVER_JOB 02/13 17:00:31.526 ERROR| logging_manager:0626| 02/13 17:00:31.527 ERROR| logging_manager:0626| STATUS: INFO ---- ---- kernel=3.8.11 localtime=Feb 13 17:00:12 timestamp=1487034012 02/13 17:00:31.527 ERROR| logging_manager:0626| 02/13 17:00:31.528 ERROR| logging_manager:0626| STATUS: END ABORT ---- ---- timestamp=1487034030 localtime=Feb 13 17:00:30 Autotest client terminated unexpectedly: DUT is pingable, could not determine if an un-expected reboot occured during the test. 02/13 17:00:31.528 ERROR| logging_manager:0626| unexpected extra indentation, ignoring 02/13 17:00:31.528 ERROR| logging_manager:0626| 02/13 17:00:31.529 ERROR| logging_manager:0626| STATUS: INFO ---- ---- timestamp=1487034031 job_abort_reason=client job was aborted localtime=Feb 13 17:00:31 client job was aborted 02/13 17:00:31.529 ERROR| logging_manager:0626| parsing test ---- SERVER_JOB
,
Feb 14 2017
Probably caused by bad CL: https://chromium-review.googlesource.com/c/437654/
,
Feb 14 2017
,
Feb 14 2017
02/13 17:14:00.122 INFO | job:0200| Writing results to /usr/local/autotest/results/default
02/13 17:14:00.125 CRITI| job:1212| JOB ERROR (autotest bug?): No module named site_sysinfo
Traceback (most recent call last):
File "/usr/local/autotest/bin/job.py", line 1182, in runjob
myjob = job(control=control, drop_caches=drop_caches, options=options)
File "/usr/local/autotest/bin/job.py", line 1230, in __init__
base_client_job.__init__(self, *args, **kwargs)
File "/usr/local/autotest/bin/job.py", line 130, in __init__
self._pre_record_init(control, options)
File "/usr/local/autotest/bin/job.py", line 207, in _pre_record_init
self._load_state()
File "/usr/local/autotest/bin/job.py", line 869, in _load_state
self._state.set_backing_file(self._state_file)
File "/usr/local/autotest/common_lib/base_job.py", line 313, in set_backing_file
self._synchronize_backing_file()
File "/usr/local/autotest/common_lib/base_job.py", line 137, in wrapped_method
return method(self, *args, **dargs)
File "/usr/local/autotest/common_lib/base_job.py", line 158, in wrapped_method
self._read_from_backing_file()
File "/usr/local/autotest/common_lib/base_job.py", line 281, in _read_from_backing_file
self.read_from_file(self._backing_file, merge=merge_backing_file)
File "/usr/local/autotest/common_lib/base_job.py", line 228, in read_from_file
on_disk_state = pickle.load(open(file_path))
ImportError: No module named site_sysinfo
,
Feb 16 2017
tl;dr the pickle gets created by a version of Autotest that still has site_sysinfo, and then the client (DUT) tries to unpack it using the build of Autotest that doesn't have site_sysinfo any more.
,
Feb 16 2017
right, the client side of Autotest is "versioned". We need some backwards compatibility.
,
Feb 27 2017
I think this is still happening? http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=103257222
,
Feb 27 2017
No, nm, this above was different root cause I think.
,
Feb 28 2017
,
Mar 10 2017
,
Jun 20 2017
,
Jun 30 2017
,
Jun 30 2017
,
Jul 5 2017
,
Jan 22 2018
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by shuqianz@chromium.org
, Feb 14 2017