New issue
Advanced search Search tips

Issue 677851 link

Starred by 1 user

Issue metadata

Status: Archived
Owner: ----
Closed: Mar 2018
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

failure to remove temporary autotest file results in later mkdir failure

Project Member Reported by semenzato@chromium.org, Jan 2 2017

Issue description

https://uberchromegw.corp.google.com/i/chromeos/builders/swanky-release/builds/725/steps/AUTest%20%5Bau%5D/logs/stdio

autoserv.DEBUG:

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/93972045-chromeos-test/chromeos4-row8-rack11-host1/debug/

We see this:

01/02 05:54:43.988 DEBUG|          ssh_host:0177| Running (ssh) 'ls "/tmp/autoserv-pNJ1es" > /dev/null'
01/02 05:54:44.868 DEBUG|          ssh_host:0177| Running (ssh) 'find "/tmp/autoserv-pNJ1es" -mindepth 1 -maxdepth 1 -print0 | xargs -0 rm -rf'
01/02 06:09:45.712 DEBUG|        base_utils:0280| [stderr] Write failed: Broken pipe
01/02 06:09:45.713 DEBUG|      abstract_ssh:0443| send_file. source: /tmp/tmprp0ZEw, dest: /tmp/autoserv-pNJ1es/sysinfo.pickle, delete_dest: False,preserve_symlinks:False
01/02 06:09:45.713 DEBUG|      abstract_ssh:0460| Using Rsync.
01/02 06:09:45.714 DEBUG|        base_utils:0185| Running 'rsync -L  --timeout=1800 --rsh='/usr/bin/ssh -a -x    -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g "/tmp/tmprp0ZEw" "root@chromeos4-row8-rack11-host1:"/tmp/autoserv-pNJ1es/sysinfo.pickle""'
01/02 06:09:46.862 DEBUG|          ssh_host:0177| Running (ssh) 'true'
01/02 06:09:47.738 DEBUG|      abstract_ssh:0581| Host chromeos4-row8-rack11-host1 is now up

The "find | xargs rm" starts and 0554 and aborts (broken pipe) at 0609, likely without completing the removal.  (Probably without even starting it.)

Then a few seconds later:

01/02 06:09:57.892 DEBUG|          ssh_host:0177| Running (ssh) 'nohup /tmp/sysinfo/autoserv-KL6bCK/bin/autotestd /tmp/autoserv-QGV7VE -H autoserv --verbose --hostname=chromeos4-row8-rack11-host1 --user=chromeos-test /tmp/sysinfo/autoserv-KL6bCK/control.autoserv >/dev/null 2>/dev/null &'
01/02 06:09:58.767 DEBUG|          ssh_host:0177| Running (ssh) '/tmp/sysinfo/autoserv-KL6bCK/bin/autotestd_monitor /tmp/autoserv-QGV7VE 0 0'
01/02 06:09:59.738 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::START	----	----	timestamp=1483366198	localtime=Jan 02 06:09:58	
01/02 06:09:59.739 INFO |        server_job:0153| 	START	----	----	timestamp=1483366198	localtime=Jan 02 06:09:58	
01/02 06:09:59.739 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::END ABORT	----	----	timestamp=1483366198	localtime=Jan 02 06:09:58	Unhandled OSError: [Errno 17] File exists: '/tmp/autoserv-pNJ1es/results'
01/02 06:09:59.740 INFO |        server_job:0153| 	END ABORT	----	----	timestamp=1483366198	localtime=Jan 02 06:09:58	Unhandled OSError: [Errno 17] File exists: '/tmp/autoserv-pNJ1es/results'
01/02 06:09:59.740 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::  Traceback (most recent call last):
01/02 06:09:59.740 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/tmp/sysinfo/autoserv-KL6bCK/bin/job.py", line 1013, in step_engine
01/02 06:09:59.740 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      execfile(self.control, global_control_vars, global_control_vars)
01/02 06:09:59.740 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/tmp/sysinfo/autoserv-KL6bCK/control.autoserv", line 5, in <module>
01/02 06:09:59.740 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      mytest = test.test(job, '', '/tmp/autoserv-pNJ1es')
01/02 06:09:59.740 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/tmp/sysinfo/autoserv-KL6bCK/common_lib/test.py", line 39, in __init__
01/02 06:09:59.741 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      os.mkdir(self.resultsdir)
01/02 06:09:59.741 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::  OSError: [Errno 17] File exists: '/tmp/autoserv-pNJ1es/results'
01/02 06:09:59.741 DEBUG|     site_autotest:0194| Entered autotestd_monitor.

Two problems:

1. likely there should be a much shorter timeout around the "find | xarg rm" command, and a retry
2. when that fails, the test should fail, instead of producing a secondary failure which is harder to debug.




 
Owner: nxia@chromium.org

Comment 2 by nxia@chromium.org, Jan 9 2017

Cc: -nxia@chromium.org
Owner: xixuan@chromium.org
xixuan@, does this belong to your area? feel free to pass it to its owner.

Comment 3 by xixuan@chromium.org, Jan 19 2017

Labels: Hotlist-Fixit

Comment 4 by xixuan@chromium.org, Jan 30 2017

Owner: ----
Status: Archived (was: Untriaged)

Sign in to add a comment