autoserv: crash collection crash when removing destination file |
|||
Issue descriptionhttps://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/136377018-chromeos-test/chromeos6-row1-rack6-host5/debug/ This hides real errors, and in this case I suspect left the logs with some weird indentation that parsing then failed to parse --> the job got aborted. The first log is an "informational" stacktrace, followed by the real one: 08/22 03:53:54.716 ERROR| runner:0121| Non-critical failure: Failed to create directory summary for /dev/pstore. Traceback (most recent call last): File "/usr/local/autotest/client/bin/result_tools/runner.py", line 114, in run_on_client timeout=_BUILD_DIR_SUMMARY_TIMEOUT) File "/usr/local/autotest/server/hosts/ssh_host.py", line 316, in run return self.run_very_slowly(*args, **kwargs) File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 482, in wrapper return fn(*args, **kwargs) File "/usr/local/autotest/server/hosts/ssh_host.py", line 307, in run_very_slowly ssh_failure_retry_ok) File "/usr/local/autotest/server/hosts/ssh_host.py", line 258, in _run raise error.AutoservRunError("command execution error", result) AutoservRunError: command execution error * Command: /usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_k_Osf4ssh- master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos6-row1-rack6-host5 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_on_client|run|wrapper] -> ssh_run(/usr/local/autotest/result_tools/utils.py -p /dev/pstore -m 20000)\";fi; /usr/local/autotest/result_tools/utils.py -p /dev/pstore -m 20000" Exit status: 1 Duration: 0.229590892792 stdout: 2017-08-22 03:53:53,673 Running result_tools/utils on path: /dev/pstore 2017-08-22 03:53:53,673 Throttle result size to : 19 MB stderr: Traceback (most recent call last): File "/usr/local/autotest/result_tools/utils.py", line 424, in <module> main() File "/usr/local/autotest/result_tools/utils.py", line 420, in main execute(options.path, options.max_size_KB) File "/usr/local/autotest/result_tools/utils.py", line 373, in execute (free_space, len(summary_json))) utils_lib.NotEnoughDiskError: Not enough disk space after saving the summary file. Available free disk: 0 bytes. Summary file size: 63 bytes. 08/22 03:53:54.717 DEBUG| abstract_ssh:0363| get_file. source: /dev/pstore, dest: /usr/local/autotest/results/136377018-chromeos-test/chromeos6-row1-rack6-host5/crashinfo.chromeos6-row1-rack6-host5/pstore, delete_dest: False,preserve_perm: False, preserve_symlinks:False 08/22 03:53:54.717 DEBUG| abstract_ssh:0375| Using Rsync. 08/22 03:53:54.717 DEBUG| utils:0212| Running 'rsync -L --timeout=1800 --rsh='/usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_k_Osf4ssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g root@chromeos6-row1-rack6-host5:"/dev/pstore" "/usr/local/autotest/results/136377018-chromeos-test/chromeos6-row1-rack6-host5/crashinfo.chromeos6-row1-rack6-host5/pstore"' 08/22 03:53:55.264 DEBUG| ssh_host:0296| Running (ssh) 'rm -rf /dev/pstore' from 'wrapper|get_crashinfo|collect_log_file|run|wrapper|run_very_slowly' 08/22 03:53:55.450 ERROR| utils:0280| [stderr] rm: cannot remove '/dev/pstore': Device or resource busy 08/22 03:53:55.453 DEBUG| ssh_multiplex:0118| Nuking ssh master_job 08/22 03:53:56.457 DEBUG| ssh_multiplex:0123| Cleaning ssh master_tempdir 08/22 03:53:56.458 ERROR| traceback:0013| Traceback (most recent call last): 08/22 03:53:56.458 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 530, in run_autoserv 08/22 03:53:56.459 ERROR| traceback:0013| use_packaging=(not no_use_packaging)) 08/22 03:53:56.459 ERROR| traceback:0013| File "/usr/local/autotest/server/server_job.py", line 891, in run 08/22 03:53:56.459 ERROR| traceback:0013| self._collect_crashes(namespace, collect_crashinfo) 08/22 03:53:56.459 ERROR| traceback:0013| File "/usr/local/autotest/server/server_job.py", line 759, in _collect_crashes 08/22 03:53:56.460 ERROR| traceback:0013| self._execute_code(crash_control_file, namespace) 08/22 03:53:56.460 ERROR| traceback:0013| File "/usr/local/autotest/server/server_job.py", line 1357, in _execute_code 08/22 03:53:56.460 ERROR| traceback:0013| execfile(code_file, namespace, namespace) 08/22 03:53:56.461 ERROR| traceback:0013| File "/usr/local/autotest/server/control_segments/crashinfo", line 18, in <module> 08/22 03:53:56.461 ERROR| traceback:0013| job.parallel_simple(crashinfo, machines, log=False) 08/22 03:53:56.461 ERROR| traceback:0013| File "/usr/local/autotest/server/server_job.py", line 666, in parallel_simple 08/22 03:53:56.462 ERROR| traceback:0013| log=log, timeout=timeout, return_results=return_results) 08/22 03:53:56.462 ERROR| traceback:0013| File "/usr/local/autotest/server/subcommand.py", line 98, in parallel_simple 08/22 03:53:56.462 ERROR| traceback:0013| function(arg) 08/22 03:53:56.462 ERROR| traceback:0013| File "/usr/local/autotest/server/control_segments/crashinfo", line 12, in crashinfo 08/22 03:53:56.463 ERROR| traceback:0013| crashcollect.get_crashinfo(host, test_start_time) 08/22 03:53:56.463 ERROR| traceback:0013| File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 482, in wrapper 08/22 03:53:56.463 ERROR| traceback:0013| return fn(*args, **kwargs) 08/22 03:53:56.464 ERROR| traceback:0013| File "/usr/local/autotest/server/crashcollect.py", line 220, in get_crashinfo 08/22 03:53:56.464 ERROR| traceback:0013| clean=True) 08/22 03:53:56.464 ERROR| traceback:0013| File "/usr/local/autotest/server/crashcollect.py", line 122, in collect_log_file 08/22 03:53:56.464 ERROR| traceback:0013| host.run('rm -rf %s' % (pipes.quote(log_path),)) 08/22 03:53:56.465 ERROR| traceback:0013| File "/usr/local/autotest/server/hosts/ssh_host.py", line 316, in run 08/22 03:53:56.465 ERROR| traceback:0013| return self.run_very_slowly(*args, **kwargs) 08/22 03:53:56.465 ERROR| traceback:0013| File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 482, in wrapper 08/22 03:53:56.466 ERROR| traceback:0013| return fn(*args, **kwargs) 08/22 03:53:56.466 ERROR| traceback:0013| File "/usr/local/autotest/server/hosts/ssh_host.py", line 307, in run_very_slowly 08/22 03:53:56.466 ERROR| traceback:0013| ssh_failure_retry_ok) 08/22 03:53:56.467 ERROR| traceback:0013| File "/usr/local/autotest/server/hosts/ssh_host.py", line 258, in _run 08/22 03:53:56.467 ERROR| traceback:0013| raise error.AutoservRunError("command execution error", result) 08/22 03:53:56.468 ERROR| traceback:0013| AutoservRunError: command execution error 08/22 03:53:56.468 ERROR| traceback:0013| * Command: 08/22 03:53:56.468 ERROR| traceback:0013| /usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_k_Osf4ssh- 08/22 03:53:56.468 ERROR| traceback:0013| master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null 08/22 03:53:56.469 ERROR| traceback:0013| -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o 08/22 03:53:56.469 ERROR| traceback:0013| ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 08/22 03:53:56.469 ERROR| traceback:0013| chromeos6-row1-rack6-host5 "export LIBC_FATAL_STDERR_=1; if type 08/22 03:53:56.469 ERROR| traceback:0013| \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" 08/22 03:53:56.470 ERROR| traceback:0013| \"server[stack::collect_log_file|run|wrapper] -> ssh_run(rm -rf 08/22 03:53:56.470 ERROR| traceback:0013| /dev/pstore)\";fi; rm -rf /dev/pstore" 08/22 03:53:56.470 ERROR| traceback:0013| Exit status: 1 08/22 03:53:56.470 ERROR| traceback:0013| Duration: 0.136911869049 The quickest fix is to ignore AutotestHostErrrors here: https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/server/crashcollect.py?type=cs&q=LOG_PSTORE_DIRS&sq=package:chromeos+file:src/third_party/autotest/files&l=129
,
Aug 22 2017
https://chromium-review.googlesource.com/626949
,
Aug 23 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/0c432d6a895a8b8bd4dcd3a4887da229dcd0ea4b commit 0c432d6a895a8b8bd4dcd3a4887da229dcd0ea4b Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Wed Aug 23 02:05:10 2017 autotest: Ignore failures in cleanup during crash collection crash collection is best effort and errors in cleanup should not fail a test. We already ignore failures, but then try to cleanup some more without ignoring errors. FixIt. BUG= chromium:757861 TEST=None Change-Id: I922d6e6d761fbc7d577f7cc542e3a97e7ad1e2bf Reviewed-on: https://chromium-review.googlesource.com/626949 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Allen Li <ayatane@chromium.org> [modify] https://crrev.com/0c432d6a895a8b8bd4dcd3a4887da229dcd0ea4b/server/crashcollect.py
,
Aug 23 2017
|
|||
►
Sign in to add a comment |
|||
Comment 1 by pprabhu@chromium.org
, Aug 22 2017