test_that hangs with a misleading log message |
|||
Issue description
1. Lock a host (chromeos2-row9-rack9-host13.cros in my case).
2. Run "test_that --board=lumpy --fast chromeos2-row9-rack9-host13.cros network_ProxyResolver" in my chroot and see a claim that the test passed.
3. Make a trivial change to the test in my autotest checkout to try to make it fail.
4. Run "test_that --board=lumpy --fast chromeos2-row9-rack9-host13.cros network_ProxyResolver" again.
5. See test_that hang indefinitely (?) after logging "Bundling /build/lumpy/usr/local/build/autotest/client/site_tests/network_ProxyResolver into test-network_ProxyResolver.tar.bz2".
My workstation is idle. It's not bundling anything.
The DUT also appears to be idle. Well, Chrome's eating ~20% of a CPU, but I assume it's blinking a cursor or something like that.
When I Ctrl-C the test_that command, it looks like it doesn't exit cleanly, and it leaves a bunch of SSH connections to the DUT lying around.
^C20:03:19 WARNI| Received SIGINT or SIGTERM. Cleaning up and exiting.
20:03:19 WARNI| Sending SIGINT to autoserv process. Waiting up to 5 seconds for cleanup.
20:03:24 WARNI| Timed out waiting for autoserv to handle SIGINT. Killed autoserv.
Here's what pstree showed before I killed test_that:
test_that,32868 /usr/bin/test_that --board=lumpy --fast chromeos2-row9-rack9-host13.cros network_ProxyResolver
`-python2.7,32869 /mnt/host/source/src/third_party/autotest/files/site_utils/test_that.py --board=lumpy --fast...
|-python2.7,32905 /build/lumpy/usr/local/build/autotest/site_utils/test_that.py --board=lumpy --fast...
| `-python2.7,32925 -u /build/lumpy/usr/local/build/autotest/server/autoserv -p -r...
| |-python2.7,32933 -u /build/lumpy/usr/local/build/autotest/server/autoserv -p -r...
| |-python2.7,32934 -u /build/lumpy/usr/local/build/autotest/server/autoserv -p -r...
| |-ssh,32943 -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_xVy1Rgssh-master/socket -o ...
| |-ssh,32951 -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_iXowGsssh-master/socket -o ...
| `-ssh,32983 -a -x -o ControlPath=/tmp/_autotmp_iXowGsssh-master/socket -o StrictHostKeyChecking=no -o ...
`-ssh-agent,32870 /mnt/host/source/src/third_party/autotest/files/site_utils/test_that.py --board=lumpy --fast...
Here's the output from the second invocation of the command:
----
$ test_that --board=lumpy --fast chromeos2-row9-rack9-host13.cros network_ProxyResolver
ERROR:root:Could not import tzlocal.
INFO:root:Identity added: /tmp/test_that_results_a8GAs3/testing_rsa (/tmp/test_that_results_a8GAs3/testing_rsa)
18:51:36 INFO | Running autotest_quickmerge step.
18:51:36 INFO | quickmerge| 18:51:36: INFO: RunCommand: sudo -- /usr/bin/python2 /mnt/host/source/chromite/bin/autotest_quickmerge '--board=lumpy'
18:51:37 INFO | quickmerge| 18:51:37: INFO: RunCommand: rsync -a --no-p '--chmod=ugo=rwX' -u -i '--exclude=**.pyc' '--exclude=**.pyo' '--exclude=** -> *' '--include-from=/mnt/host/source/chromite/scripts/autotest-quickmerge-includepatterns' '--exclude=*' /mnt/host/source/src/third_party/autotest/files/ /build/lumpy/usr/local/build/autotest/
18:51:37 INFO | quickmerge| 18:51:37: INFO: Updating portage database.
18:51:38 INFO | quickmerge| 18:51:38: INFO: RunCommand: rsync -a --no-p '--chmod=ugo=rwX' -u -i '--exclude=**.pyc' '--exclude=**.pyo' '--exclude=** -> *' '--include-from=/mnt/host/source/chromite/scripts/autotest-quickmerge-includepatterns' '--exclude=*' // /build/lumpy/usr/local/build/autotest/
18:51:38 INFO | quickmerge| 18:51:38: INFO: Updating portage database.
18:51:39 INFO | quickmerge| 18:51:39: INFO: RunCommand: touch /build/lumpy/usr/local/build/autotest/.quickmerge_sentinel
18:51:39 INFO | quickmerge| 18:51:39: INFO: Quickmerge complete. Created or modified 1 files.
18:51:39 INFO | Re-running test_that script in /build/lumpy/usr/local/build/autotest copy of autotest.
ERROR:root:Could not import tzlocal.
INFO:root:Identity added: /tmp/test_that_results__4Fnv3/testing_rsa (/tmp/test_that_results__4Fnv3/testing_rsa)
18:51:40 INFO | Began logging to /tmp/test_that_results__4Fnv3
19:51:40 WARNI| /build/lumpy/usr/local/build/autotest/client/common_lib/site_utils.py:18: UserWarning: autotest_lib.client.common_lib.site_utils module is deprecated; use the equivalent utils module instead
% __name__)
Adding labels [u'cros-version:ad_hoc_build', u'board:lumpy'] to host chromeos2-row9-rack9-host13.cros
19:51:40 WARNI| /build/lumpy/usr/local/build/autotest/server/cros/dynamic_suite/suite.py:932: UserWarning: Calling this method from Suite is deprecated
warnings.warn('Calling this method from Suite is deprecated')
19:51:40 INFO | Fetching suite for job named network_ProxyResolver...
19:51:43 INFO | Scheduling suite for job named network_ProxyResolver...
19:51:43 INFO | ... scheduled 1 job(s).
19:51:44 INFO | autoserv| ERROR:root:Could not import tzlocal.
19:51:44 INFO | autoserv| Results placed in /tmp/test_that_results__4Fnv3/results-1-network_ProxyResolver
19:51:44 INFO | autoserv| Logged pid 17547 to /tmp/test_that_results__4Fnv3/results-1-network_ProxyResolver/.autoserv_execute
19:51:44 INFO | autoserv| Starting new HTTP connection (1): metadata.google.internal
19:51:44 INFO | autoserv| Configuration file does not exist, ignoring: /etc/chrome-infra/ts-mon.json
19:51:44 INFO | autoserv| ts_mon monitoring is disabled because the endpoint provided is invalid or not supported:
19:51:44 INFO | autoserv| ts_mon was set up.
19:51:44 INFO | autoserv| I am PID 17547
19:51:44 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_YTYEgIssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/tmp4puxBL -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=300 -l root -p 22 chromeos2-row9-rack9-host13.cros'
19:51:47 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_xVy1Rgssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row9-rack9-host13.cros'
19:51:48 INFO | autoserv| get_network_stats: at-start RXbytes 4328665094 TXbytes 177773091
19:51:48 INFO | autoserv| Not checking if job_repo_url contains autotest packages on ['chromeos2-row9-rack9-host13.cros']
19:51:48 INFO | autoserv| Processing control file
19:51:48 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_ozmDOessh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/tmp1d6Ur7 -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=300 -l root -p 22 chromeos2-row9-rack9-host13.cros'
19:51:51 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_iXowGsssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row9-rack9-host13.cros'
19:51:52 INFO | autoserv| INFO ---- ---- kernel=3.8.11 localtime=Apr 26 19:51:52 timestamp=1493254312
19:51:52 INFO | autoserv| Installing autotest on chromeos2-row9-rack9-host13.cros
19:51:53 INFO | autoserv| Using installation dir /usr/local/autotest
19:51:55 INFO | autoserv| Installation of autotest completed from /build/lumpy/usr/local/build/autotest/client/
19:51:55 INFO | autoserv| Installing updated global_config.ini.
19:51:58 INFO | autoserv| Executing /usr/local/autotest/bin/autotest /usr/local/autotest/control phase 0
19:52:00 INFO | autoserv| Entered autotestd_monitor.
19:52:00 INFO | autoserv| Finished launching tail subprocesses.
19:52:00 INFO | autoserv| Finished waiting on autotestd to start.
19:52:01 INFO | autoserv| START ---- ---- timestamp=1493254321 localtime=Apr 26 19:52:01
19:52:02 INFO | autoserv| START network_ProxyResolver network_ProxyResolver timestamp=1493254321 localtime=Apr 26 19:52:01
19:52:02 INFO | autoserv| Bundling /build/lumpy/usr/local/build/autotest/client/site_tests/network_ProxyResolver into test-network_ProxyResolver.tar.bz2
,
Apr 27 2017
,
Apr 27 2017
I don't think it's related to --fast. Something about "bundling" is hanging. Why didn't hang the first time you tried? My guess -- the bz2 file already existed that time. But on your second attempt, you modified a file, causing autotest_quickmerge to kick in, which deletes all those bz2s as a side effect because they are potentially stale. pmalani has touched some code around the packaging lately, he may have insight.
,
Apr 27 2017
Are you sure that it's related to bundling and that that doesn't just happen to be the last message that's logged before we run the test? Here's what I see in a successful run: ... 14:31:31 INFO | autoserv| Finished launching tail subprocesses. 14:31:31 INFO | autoserv| Finished waiting on autotestd to start. 14:31:32 INFO | autoserv| START ---- ---- timestamp=1493321491 localtime=Apr 27 14:31:31 14:31:32 INFO | autoserv| START network_ProxyResolver network_ProxyResolver timestamp=1493321491 localtime=Apr 27 14:31:31 14:31:32 INFO | autoserv| Bundling /build/lumpy/usr/local/build/autotest/client/site_tests/network_ProxyResolver into test-network_ProxyResolver.tar.bz2 14:31:42 INFO | autoserv| GOOD network_ProxyResolver network_ProxyResolver timestamp=1493321500 localtime=Apr 27 14:31:40 completed successfully 14:31:42 INFO | autoserv| END GOOD network_ProxyResolver network_ProxyResolver timestamp=1493321500 localtime=Apr 27 14:31:40 14:31:43 INFO | autoserv| END GOOD ---- ---- timestamp=1493321502 localtime=Apr 27 14:31:42 14:31:43 INFO | autoserv| Got lock of exit_code_file. 14:31:43 INFO | autoserv| Released lock of exit_code_file and closed it. ... Note that we appear to run the test immediately after logging the "Bundling" message.
,
Oct 24
I'm running in to this bug consistently on my Grunt and Careena DUTs. |
|||
►
Sign in to add a comment |
|||
Comment 1 by derat@chromium.org
, Apr 27 2017