Banon device failing multiple tests with the "[Errno 28] No space left on device" error |
||||||
Issue descriptionLogs@ https://stainless.corp.google.com/search?view=list&first_date=2018-06-16&last_date=2018-07-01&suite=wifi_matfunc%7Cwifi_release&board=banon&status=FAIL&status=ERROR&status=ABORT&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false Sample failure: Command: rsync -L --timeout=1800 --rsh='/usr/bin/ssh -a -x -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22' -az --no-o --no-g "/tmp/tmpYFoSRj" "root@chromeos15-row1-rack6-host4:"/tmp/sysinfo/autoserv- yot9hR/global_config.ini"" Exit status: 11 Duration: 0.367069005966 stderr: rsync: write failed on "/tmp/sysinfo/autoserv-yot9hR/global_config.ini": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.2] 07/01 18:29:45.584 DEBUG| abstract_ssh:0549| Trying scp. 07/01 18:29:45.585 DEBUG| utils:0218| Running 'scp -rq -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/tmpExxBTl -P 22 "/tmp/tmpYFoSRj" 'root@chromeos15-row1-rack6-host4:"/tmp/sysinfo/autoserv-yot9hR/global_config.ini"'' 07/01 18:29:45.993 DEBUG| ssh_host:0301| Running (ssh) 'nohup /tmp/sysinfo/autoserv-yot9hR/bin/autotestd /tmp/autoserv-LzqWfv -H autoserv --verbose --hostname=chromeos15-row1-rack6-host4 --user=chromeos-test /tmp/sysinfo/autoserv-yot9hR/control.autoserv >/dev/null 2>/dev/null &' from '_do_run|execute_control|execute_section|_execute_daemon|run|run_very_slowly' 07/01 18:29:46.446 DEBUG| ssh_host:0301| Running (ssh) '/tmp/sysinfo/autoserv-yot9hR/bin/autotestd_monitor /tmp/autoserv-LzqWfv 0 0' from '_do_run|execute_control|execute_section|_execute_daemon|run|run_very_slowly' 07/01 18:29:47.153 DEBUG| autotest:1281| Traceback (most recent call last): 07/01 18:29:47.191 INFO | autotest:1340| Traceback (most recent call last): 07/01 18:29:47.200 DEBUG| autotest:1281| File "/tmp/sysinfo/autoserv-yot9hR/bin/autotestd_monitor", line 12, in <module> 07/01 18:29:47.200 INFO | autotest:1340| File "/tmp/sysinfo/autoserv-yot9hR/bin/autotestd_monitor", line 12, in <module> 07/01 18:29:47.200 DEBUG| autotest:1281| print >> stderr, 'Entered autotestd_monitor.' 07/01 18:29:47.200 INFO | autotest:1340| print >> stderr, 'Entered autotestd_monitor.' 07/01 18:29:47.200 DEBUG| autotest:1281| IOError: [Errno 28] No space left on device 07/01 18:29:47.201 INFO | autotest:1340| IOError: [Errno 28] No space left on device 07/01 18:29:47.202 DEBUG| autotest:0956| Result exit status is 1. 07/01 18:29:47.203 DEBUG| utils:0218| Running 'ping chromeos15-row1-rack6-host4 -w1 -c1' 07/01 18:29:47.218 DEBUG| utils:0286| [stdout] PING chromeos15-row1-rack6-host4.cros.corp.google.com (100.115.124.153) 56(84) bytes of data. 07/01 18:29:47.219 DEBUG| utils:0286| [stdout] 64 bytes from 100.115.124.153: icmp_seq=1 ttl=55 time=3.47 ms 07/01 18:29:47.219 DEBUG| utils:0286| [stdout] 07/01 18:29:47.219 DEBUG| utils:0286| [stdout] --- chromeos15-row1-rack6-host4.cros.corp.google.com ping statistics --- 07/01 18:29:47.219 DEBUG| utils:0286| [stdout] 1 packets transmitted, 1 received, 0% packet loss, time 0ms 07/01 18:29:47.219 DEBUG| utils:0286| [stdout] rtt min/avg/max/mdev = 3.473/3.473/3.473/0.000 ms 07/01 18:29:47.219 INFO | server_job:0216| END ABORT ---- ---- timestamp=1530494987 localtime=Jul 01 18:29:47 Autotest client terminated unexpectedly: DUT is pingable, could not determine if an un-expected reboot occured during the test. 07/01 18:29:47.220 DEBUG| autotest:1108| Autotest job finishes running. Below is the post-processing operations. 07/01 18:29:47.229 DEBUG| ssh_host:0301| Running (ssh) 'true' from 'collect_client_job_results|wait_up|is_up|ssh_ping|run|run_very_slowly' 07/01 18:29:47.640 DEBUG| abstract_ssh:0670| Host chromeos15-row1-rack6-host4 is now up 07/01 18:29:47.641 DEBUG| runner:0089| result tools are already deployed to chromeos15-row1-rack6-host4. 07/01 18:29:47.641 DEBUG| runner:0100| Getting directory summary for /tmp/sysinfo/autoserv-yot9hR/results/default 07/01 18:29:47.649 DEBUG| ssh_host:0301| Running (ssh) '/usr/local/autotest/result_tools/utils.py -p /tmp/sysinfo/autoserv-yot9hR/results/default -m 20000' from '_do_run|execute_control|collect_client_job_results|run_on_client|run|run_very_slowly' 07/01 18:29:48.052 DEBUG| utils:0286| [stdout] 2018-07-01 18:29:47,986 Running result_tools/utils on path: /tmp/sysinfo/autoserv-yot9hR/results/default 07/01 18:29:48.053 DEBUG| utils:0286| [stdout] 2018-07-01 18:29:47,986 Throttle result size to : 19 MB 07/01 18:29:48.098 ERROR| utils:0286| [stderr] Traceback (most recent call last): 07/01 18:29:48.098 ERROR| utils:0286| [stderr] File "/usr/local/autotest/result_tools/utils.py", line 428, in <module> 07/01 18:29:48.098 ERROR| utils:0286| [stderr] main() 07/01 18:29:48.098 ERROR| utils:0286| [stderr] File "/usr/local/autotest/result_tools/utils.py", line 424, in main 07/01 18:29:48.098 ERROR| utils:0286| [stderr] execute(options.path, options.max_size_KB) 07/01 18:29:48.099 ERROR| utils:0286| [stderr] File "/usr/local/autotest/result_tools/utils.py", line 377, in execute 07/01 18:29:48.099 ERROR| utils:0286| [stderr] (free_space, len(summary_json))) 07/01 18:29:48.099 ERROR| utils:0286| [stderr] utils_lib.NotEnoughDiskError: Not enough disk space after saving the summary file. Available free disk: 0 bytes. Summary file size: 1201 bytes. 07/01 18:29:48.100 ERROR| runner:0121| Non-critical failure: Failed to create directory summary for /tmp/sysinfo/autoserv-yot9hR/results/default. Traceback (most recent call last): File "/usr/local/autotest/client/bin/result_tools/runner.py", line 114, in run_on_client timeout=_BUILD_DIR_SUMMARY_TIMEOUT) File "/usr/local/autotest/server/hosts/ssh_host.py", line 323, in run return self.run_very_slowly(*args, **kwargs) File "/usr/local/autotest/server/hosts/ssh_host.py", line 312, in run_very_slowly ssh_failure_retry_ok) File "/usr/local/autotest/server/hosts/ssh_host.py", line 262, in _run raise error.AutoservRunError("command execution error", result) AutoservRunError: command execution error * Command: /usr/bin/ssh -a -x -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22 chromeos15-row1-rack6-host4 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::collect_client_job_results|run_on_client|run] -> ssh_run(/usr/local/autotest/result_tools/utils.py -p /tmp/sysinfo /autoserv-yot9hR/results/default -m 20000)\";fi; /usr/local/autotest/result_tools/utils.py -p /tmp/sysinfo/autoserv- yot9hR/results/default -m 20000" Exit status: 1 Duration: 0.437424898148 stdout: 2018-07-01 18:29:47,986 Running result_tools/utils on path: /tmp/sysinfo/autoserv-yot9hR/results/default 2018-07-01 18:29:47,986 Throttle result size to : 19 MB stderr: Traceback (most recent call last): File "/usr/local/autotest/result_tools/utils.py", line 428, in <module> main() File "/usr/local/autotest/result_tools/utils.py", line 424, in main execute(options.path, options.max_size_KB) File "/usr/local/autotest/result_tools/utils.py", line 377, in execute (free_space, len(summary_json))) utils_lib.NotEnoughDiskError: Not enough disk space after saving the summary file. Available free disk: 0 bytes. Summary file size: 1201 bytes. 07/01 18:29:48.101 DEBUG| abstract_ssh:0413| get_file. source: /tmp/sysinfo/autoserv-yot9hR/results/default/, dest: /usr/local/autotest/results/213361755-chromeos-test, delete_dest: False,preserve_perm: True, preserve_symlinks:True 07/01 18:29:48.102 DEBUG| abstract_ssh:0425| Using Rsync. 07/01 18:29:48.102 DEBUG| utils:0218| Running 'rsync -l --timeout=1800 --rsh='/usr/bin/ssh -a -x -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22' -az --no-o --no-g root@chromeos15-row1-rack6-host4:"/tmp/sysinfo/autoserv-yot9hR/results/default/" "/usr/local/autotest/results/213361755-chromeos-test"' 07/01 18:29:48.494 DEBUG| server_job:1370| Client state file /usr/local/autotest/results/213361755-chromeos-test/control.autoserv.state not found 07/01 18:29:48.811 DEBUG| base_job:0399| Persistent state client.* deleted 07/01 18:29:48.845 DEBUG| autotest:1122| Autotest job finishes. 07/01 18:29:48.845 ERROR| log:0027| post-test iteration server sysinfo error: 07/01 18:29:48.846 ERROR| traceback:0013| Traceback (most recent call last): 07/01 18:29:48.846 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/log.py", line 25, in decorated_func 07/01 18:29:48.846 ERROR| traceback:0013| fn(*args, **dargs) 07/01 18:29:48.847 ERROR| traceback:0013| File "/usr/local/autotest/server/test.py", line 76, in wrapper 07/01 18:29:48.847 ERROR| traceback:0013| func(self, mytest, host, at, outputdir) 07/01 18:29:48.847 ERROR| traceback:0013| File "/usr/local/autotest/server/test.py", line 214, in after_iteration_hook 07/01 18:29:48.848 ERROR| traceback:0013| results_dir=self.job.resultdir) 07/01 18:29:48.848 ERROR| traceback:0013| File "/usr/local/autotest/server/autotest.py", line 479, in run 07/01 18:29:48.848 ERROR| traceback:0013| client_disconnect_timeout, use_packaging=use_packaging) 07/01 18:29:48.848 ERROR| traceback:0013| File "/usr/local/autotest/server/autotest.py", line 562, in _do_run 07/01 18:29:48.849 ERROR| traceback:0013| client_disconnect_timeout=client_disconnect_timeout) 07/01 18:29:48.849 ERROR| traceback:0013| File "/usr/local/autotest/server/autotest.py", line 1054, in execute_control 07/01 18:29:48.849 ERROR| traceback:0013| logger, client_disconnect_timeout) 07/01 18:29:48.849 ERROR| traceback:0013| File "/usr/local/autotest/server/autotest.py", line 999, in execute_section 07/01 18:29:48.850 ERROR| traceback:0013| raise err 07/01 18:29:48.850 ERROR| traceback:0013| AutotestRunError: client job was aborted 07/01 18:29:48.851 DEBUG| test:0420| after_iteration_hooks completed 07/01 18:29:48.851 INFO | wifi_client:1318| ======= WiFi autotest complete. Cleaning up... =======
,
Jul 13
,
Jul 14
I experienced this yesterday. In the short term, running `rm -rf /tmp/sysinfo/autoserv*` on the DUT should get the tests running again. It seems like for server tests, autotest is being unconditionally installed on the test machine(s) (see autotest_lib.server.test._install), with each autoserv-* folder being roughly 30M in size. It makes sense to first compare timestamps between any existing autotest installations on the test machine and the timestamp of the autotest installation on the machine calling the test before proceeding with installation. At the very least, existing autotest installations should be removed before sending over another one. I would be willing to take this one if no one is opposed. What I'm wondering is why this functionality has only now started causing out of space errors. Anyone have some insight into that?
,
Jul 17
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/0c31f63aa8845799054bce592e2ea5219449942e commit 0c31f63aa8845799054bce592e2ea5219449942e Author: Alex Khouderchah <akhouderchah@chromium.org> Date: Tue Jul 17 19:11:27 2018 autotest: Perform server test shutdown even when receiving SIGINT While server tests, when run normally, will clean up their temporary installation and some large output directories, the same is not true when a user uses ctrl-c to force-close the test. This change modifies server tests to catch signals like SIGTERM and SIGBREAK, and to run the cleanup as expected. BUG= chromium:863601 TEST=Ran server tests to completion, ran a modified server test with a syntax error, and interrupted running server tests with SIGINT. The first two cases remain unchanged, as the test framework performed cleanup as expected. Ensured that the third test case was actually triggering a cleanup. Change-Id: Ida0334ebdf964fa4ee0ae730a2598686e8909c96 Reviewed-on: https://chromium-review.googlesource.com/1138649 Commit-Ready: Alex Khouderchah <akhouderchah@chromium.org> Tested-by: Alex Khouderchah <akhouderchah@chromium.org> Reviewed-by: Xixuan Wu <xixuan@chromium.org> [modify] https://crrev.com/0c31f63aa8845799054bce592e2ea5219449942e/server/test.py [modify] https://crrev.com/0c31f63aa8845799054bce592e2ea5219449942e/client/common_lib/utils.py
,
Jul 17
As an update, the unconditional installation of autotest in server tests only proved to be an issue when the test was sent a signal like SIGINT, as the test would clean up its files when ending normally or with an exception. This might explain why we weren't getting no space errors before, as it would only occur after forcing multiple test runs to stop without cleaning up. That being said, this change will not fix an existing out-of-space issue, it would only prevent new ones. Depending on how the DUT clears its /tmp directory, it still might be necessary to run `rm -rf /tmp/autoserv-* /tmp/sysinfo/autoserv-*`
,
Jul 23
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/b7f9680f44f32e59c8d42d4b5e93ad7f022722b2 commit b7f9680f44f32e59c8d42d4b5e93ad7f022722b2 Author: Allen Li <ayatane@chromium.org> Date: Mon Jul 23 20:14:56 2018 Revert "autotest: Perform server test shutdown even when receiving SIGINT" This reverts commit 0c31f63aa8845799054bce592e2ea5219449942e. Reason for revert: Suspect for causing autoserv leak, crbug.com/866543 Original change's description: > autotest: Perform server test shutdown even when receiving SIGINT > > While server tests, when run normally, will clean up their temporary > installation and some large output directories, the same is not true > when a user uses ctrl-c to force-close the test. > > This change modifies server tests to catch signals like SIGTERM and > SIGBREAK, and to run the cleanup as expected. > > BUG= chromium:863601 > TEST=Ran server tests to completion, ran a modified server test with > a syntax error, and interrupted running server tests with SIGINT. > The first two cases remain unchanged, as the test framework > performed cleanup as expected. Ensured that the third test case was > actually triggering a cleanup. > > Change-Id: Ida0334ebdf964fa4ee0ae730a2598686e8909c96 > Reviewed-on: https://chromium-review.googlesource.com/1138649 > Commit-Ready: Alex Khouderchah <akhouderchah@chromium.org> > Tested-by: Alex Khouderchah <akhouderchah@chromium.org> > Reviewed-by: Xixuan Wu <xixuan@chromium.org> Bug: chromium:863601 Change-Id: Ie9c80954de622294ecd1d311e6a75d8ff3f6d597 Reviewed-on: https://chromium-review.googlesource.com/1147321 Reviewed-by: Allen Li <ayatane@chromium.org> Commit-Queue: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> [modify] https://crrev.com/b7f9680f44f32e59c8d42d4b5e93ad7f022722b2/server/test.py [modify] https://crrev.com/b7f9680f44f32e59c8d42d4b5e93ad7f022722b2/client/common_lib/utils.py
,
Jul 23
,
Aug 10
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/c44e777d9c6ee299d66ab1a9f6d9e6b3a392bd30 commit c44e777d9c6ee299d66ab1a9f6d9e6b3a392bd30 Author: Alex Khouderchah <akhouderchah@chromium.org> Date: Fri Aug 10 05:04:21 2018 autotest: Remove existing autoserv dirs on startup While server tests, when run normally, will clean up their temporary installation and some large output directories, the same is not true when a user uses ctrl-c to force-close the test. This change modifies server tests to remove existing /tmp/autoserv-* and /tmp/sysinfo/autoserv-* directories before creating new ones, such that existing left-over directories will not cause a DUT's /tmp filesystem to run out of space. BUG= chromium:863601 TEST=Ran server tests to completion, ran a modified server test with a syntax error, and interrupted running server tests with SIGINT. The first two cases remain unchanged, as the test framework performed cleanup as expected. Ensured that the third test case was actually triggering a cleanup. TEST=Used get_tmp_dir to create multiple temp dirs, including a set of nested temporary directories. Then alternated between printing self.tmp_dirs and calling delete_all_tmp_dirs with various parent directories to ensure the expected behavior was occuring both on the host and with regards to the contents of self.tmp_dirs. Change-Id: I82a1619d4c8976547792f3cac84b6ed41148b484 Reviewed-on: https://chromium-review.googlesource.com/1147500 Commit-Ready: Alex Khouderchah <akhouderchah@chromium.org> Tested-by: Alex Khouderchah <akhouderchah@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com> [modify] https://crrev.com/c44e777d9c6ee299d66ab1a9f6d9e6b3a392bd30/server/test.py [modify] https://crrev.com/c44e777d9c6ee299d66ab1a9f6d9e6b3a392bd30/server/hosts/remote.py
,
Nov 16
Sounds like comment #8 means it's Fixed? |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by harpreet@chromium.org
, Jul 13