Issue metadata
Sign in to add a comment
|
Server-side test failed with timeout from download_extract
Reported by
jrbarnette@chromium.org,
Apr 17 2017
|
||||||||||||||||||||||||
Issue description
This CQ run failed:
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/14280
Because of this slave paladin run:
https://luci-milo.appspot.com/buildbot/chromeos/peppy-paladin/14856
Because of this test failure:
http://cautotest/tko/retrieve_logs.cgi?job=/results/112455541-chromeos-test/
The error reporting is (shall we say) obtuse, but if you dig
you'll find what you want is at this path:
chromeos4-row9-rack11-host3/ssp_logs/debug/autoserv.DEBUG
The most relevant part seems to be this:
04/13 00:06:04.552 INFO | server_job:0184| FAIL ---- ---- timestamp=1492067164 localtime=Apr 13 00:06:04 Failed to setup container for test: retry exception (function="download_extract()"), timeout = 300s. Check logs in ssp_logs folder for more details.
04/13 00:06:04.553 DEBUG| base_utils:0185| Running 'sudo -n chown -R 99158 "/usr/local/autotest/results/112455541-chromeos-test/chromeos4-row9-rack11-host3"'
04/13 00:06:05.188 DEBUG| base_utils:0185| Running 'sudo -n chgrp -R 5000 "/usr/local/autotest/results/112455541-chromeos-test/chromeos4-row9-rack11-host3"'
04/13 00:06:05.833 ERROR| traceback:0013| Traceback (most recent call last):
04/13 00:06:05.834 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 530, in run_autoserv
04/13 00:06:05.835 ERROR| traceback:0013| machines)
04/13 00:06:05.835 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 163, in _run_with_ssp
04/13 00:06:05.836 ERROR| traceback:0013| dut_name=dut_name)
04/13 00:06:05.836 ERROR| traceback:0013| File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 274, in wrapper
04/13 00:06:05.837 ERROR| traceback:0013| return fn(*args, **kwargs)
04/13 00:06:05.837 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc.py", line 235, in func_cleanup_if_fail
04/13 00:06:05.838 ERROR| traceback:0013| return func(*args, **kwargs)
04/13 00:06:05.839 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc.py", line 946, in setup_test
04/13 00:06:05.839 ERROR| traceback:0013| download_extract(server_package_url, autotest_pkg_path, usr_local_path)
04/13 00:06:05.840 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/cros/retry.py", line 240, in func_retry
04/13 00:06:05.840 ERROR| traceback:0013| raise error.TimeoutException(exception_message)
04/13 00:06:05.841 ERROR| traceback:0013| TimeoutException: retry exception (function="download_extract()"), timeout = 300s
That happened apparently because of a failure around this
event:
04/13 00:01:02.571 DEBUG| base_utils:0185| Running 'ssh 172.25.65.217 'curl "http://172.25.65.217:8082/static/peppy-paladin/R59-9457.0.0-rc2/autotest_server_package.tar.bz2"''
Host 172.25.65.217 is chromeos-gt-devserver1.cbf; it was serving an image
to the shard, not the DUT.
I haven't been able to figure out what really happened. I suspect
that the devserver was merely overloaded. This could be related
to bug 604473.
,
Apr 17 2017
Outgoing network traffic on the shard surged about a day before the first failures. Also, the failures in c#1 were accompanied by other less distinct failures in the same suite. So, this symptom is looking more like it's shard load specific.
,
Apr 21 2017
This looks a lot like issue 712283 ? We just saw this on two consecutive runs for cyan in the PFQ: https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-chrome-pfq/builds/1046 https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-chrome-pfq/builds/1047
,
Apr 21 2017
> This looks a lot like issue 712283 ? Yeah. It seems the left hand doesn't know what the right hand is doing. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by jrbarnette@chromium.org
, Apr 17 2017Another failed CQ master: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/14283 The corresponding slave: https://luci-milo.appspot.com/buildbot/chromeos/veyron_speedy-paladin/4991 These tests: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=112503832 http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=112503830 This time, the devserver in the logs was 100.120.7.235 (chromeos11-infra-devserver2). This problem happened on the same shard (chromeos-server14.mtv) as the previous failure, so the problem could be shard load instead of devserver load.