timeout for downloading autotest_server_package.tar.bz2 after 5 mins |
|||||||||
Issue descriptionhttps://luci-milo.appspot.com/buildbot/chromeos/veyron_speedy-paladin/5027 http://cautotest/tko/retrieve_logs.cgi?job=/results/113133368-chromeos-test/ 04/17 11:01:46.444 DEBUG| base_utils:0185| Running 'sudo -n chgrp -R 5000 "/usr/local/autotest/results/113133368-chromeos-test/chromeos4-row4-rack12-host18"' 04/17 11:01:47.048 ERROR| traceback:0013| Traceback (most recent call last): 04/17 11:01:47.049 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 530, in run_autoserv 04/17 11:01:47.050 ERROR| traceback:0013| machines) 04/17 11:01:47.050 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 163, in _run_with_ssp 04/17 11:01:47.051 ERROR| traceback:0013| dut_name=dut_name) 04/17 11:01:47.051 ERROR| traceback:0013| File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 274, in wrapper 04/17 11:01:47.052 ERROR| traceback:0013| return fn(*args, **kwargs) 04/17 11:01:47.052 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc.py", line 235, in func_cleanup_if_fail 04/17 11:01:47.053 ERROR| traceback:0013| return func(*args, **kwargs) 04/17 11:01:47.054 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc.py", line 946, in setup_test 04/17 11:01:47.054 ERROR| traceback:0013| download_extract(server_package_url, autotest_pkg_path, usr_local_path) 04/17 11:01:47.055 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/cros/retry.py", line 240, in func_retry 04/17 11:01:47.055 ERROR| traceback:0013| raise error.TimeoutException(exception_message) 04/17 11:01:47.056 ERROR| traceback:0013| TimeoutException: retry exception (function="download_extract()"), timeout = 300s 04/17 11:01:47.070 INFO | client:0570| Attempting refresh to obtain initial access_token 04/17 11:01:47.172 INFO | client:0872| Refreshing access_token 04/17 11:01:48.185 ERROR| autoserv:0765| 1 Traceback (most recent call last): File "/usr/local/autotest/server/autoserv", line 761, in main use_ssp) File "/usr/local/autotest/server/autoserv", line 574, in run_autoserv sys.exit(exit_code) SystemExit: 1
,
Apr 17 2017
,
Apr 17 2017
how do I follow the download timeout error?
,
Apr 17 2017
It fails at https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/site_utils/lxc.py?q=download_extract&sq=package:%5Echromeos_(internal%7Cpublic)$&l=281 04/17 10:56:44.153 DEBUG| base_utils:0185| Running 'ssh 100.120.7.235 'curl "http://100.120.7.235:8082/static/veyron_speedy-paladin/R60-9470.0.0-rc3/autotest_server_package.tar.bz2"'' --- 300s --- 04/17 11:01:44.158 DEBUG| base_utils:0185| Running 'sudo lxc-ls -P /usr/local/autotest/containers -f -F name,state' (following cleanup) .... But I tested it locally on server14.mtv just now by calling "dev_server.ImageServerBase.download_file('http://100.120.7.235:8082/static/veyron_speedy-paladin/R60-9470.0.0-rc3/autotest_server_package.tar.bz2', '/tmp/autotest.tar.bz2', timeout=300)" and it successfully finished. The package is downloaded at about 10:40 (has 8 hour difference between that devserver and our shard) chromeos-test@chromeos11-infra-devserver2:/var/log/devserver$ ls -ltr ~/images/veyron_speedy-paladin/R60-9470.0.0-rc3/ total 14840 -rw-r--r-- 1 chromeos-test chromeos-test 15192987 Apr 17 18:40 autotest_server_package.tar.bz2 Conclusion: 1. package is staged well. 2. the devserver seems work normally ( http://shortn/_bJ7UGdYmHI ) 3. the devserver (chromeos11-infra-devserver2) seems to be an android devserver. In the log, there's several exception/errors, but not related to this one. 4. the download_extract works well now, but not at that time. looks like a flakeness. 100.120.7.235 doesn't work at that time, and this command (ssh 100.120.7.235 'curl "http://100.120.7.235:8082/static/veyron_speedy-paladin/R60-9470.0.0-rc3/autotest_server_package.tar.bz2"') hang there, not actually being retried.
,
Apr 17 2017
Can we extend the retry.retry timeout to maybe 10 or 15 mins? As dev_server.ImageServerBase.download_file(url, tmp_file, timeout=300) takes up the 300s timeout, so this method isn't really being retried? http://shortn/_4w2JKKELJQ
,
Apr 18 2017
Another example on veyron_minnie-paladin which timed out at downloading. https://luci-milo.appspot.com/buildbot/chromeos/veyron_minnie-paladin/2288 http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=112903534 04/15 15:25:22.379 DEBUG| base_utils:0185| Running 'ssh 100.120.7.235 'curl "http://100.120.7.235:8082/static/veyron_minnie-paladin/R60-9465.0.0-rc2/autotest_server_package.tar.bz2"'' 04/15 15:30:22.407 DEBUG| base_utils:0185| Running 'sudo lxc-ls -P /usr/local/autotest/containers -f -F name,state' 04/15 15:30:26.184 DEBUG| base_utils:0185| Running 'sudo -n chgrp -R 5000 "/usr/local/autotest/results/112903534-chromeos-test/chromeos4-row9-rack10-host22"' 04/15 15:30:26.825 ERROR| traceback:0013| Traceback (most recent call last): 04/15 15:30:26.826 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 530, in run_autoserv 04/15 15:30:26.828 ERROR| traceback:0013| machines) 04/15 15:30:26.828 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 163, in _run_with_ssp 04/15 15:30:26.829 ERROR| traceback:0013| dut_name=dut_name) 04/15 15:30:26.829 ERROR| traceback:0013| File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 274, in wrapper 04/15 15:30:26.830 ERROR| traceback:0013| return fn(*args, **kwargs) 04/15 15:30:26.831 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc.py", line 235, in func_cleanup_if_fail 04/15 15:30:26.832 ERROR| traceback:0013| return func(*args, **kwargs) 04/15 15:30:26.833 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc.py", line 946, in setup_test 04/15 15:30:26.834 ERROR| traceback:0013| download_extract(server_package_url, autotest_pkg_path, usr_local_path) 04/15 15:30:26.834 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/cros/retry.py", line 240, in func_retry 04/15 15:30:26.836 ERROR| traceback:0013| raise error.TimeoutException(exception_message) 04/15 15:30:26.836 ERROR| traceback:0013| TimeoutException: retry exception (function="download_extract()"), timeout = 300s 04/15 15:30:26.854 INFO | client:0570| Attempting refresh to obtain initial access_token 04/15 15:30:26.952 INFO | client:0872| Refreshing access_token 04/15 15:30:32.744 ERROR| autoserv:0765| 1
,
Apr 18 2017
,
Apr 18 2017
Another example: https://luci-milo.appspot.com/buildbot/chromeos/veyron_speedy-paladin/5008 http://cautotest/tko/retrieve_logs.cgi?job=/results/113077836-chromeos-test/ 04/17 00:01:26.629 DEBUG| base_utils:0185| Running 'ssh 100.120.7.236 'curl "http://100.120.7.236:8082/static/veyron_mighty-paladin/R60-9469.0.0-rc3/autotest_server_package.tar.bz2"'' 04/17 00:06:26.684 DEBUG| base_utils:0185| Running 'sudo lxc-ls -P /usr/local/autotest/containers -f -F name,state' 04/17 00:06:30.054 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/cros/retry.py", line 240, in func_retry 04/17 00:06:30.055 ERROR| traceback:0013| raise error.TimeoutException(exception_message) 04/17 00:06:30.055 ERROR| traceback:0013| TimeoutException: retry exception (function="download_extract()"), timeout = 30
,
Apr 18 2017
https://luci-milo.appspot.com/buildbot/chromeos/veyron_mighty-paladin/5027 http://cautotest/tko/retrieve_logs.cgi?job=/results/113077836-chromeos-test/ 04/17 00:01:26.629 DEBUG| base_utils:0185| Running 'ssh 100.120.7.236 'curl "http://100.120.7.236:8082/static/veyron_mighty-paladin/R60-9469.0.0-rc3/autotest_server_package.tar.bz2"'' 04/17 00:06:26.684 DEBUG| base_utils:0185| Running 'sudo lxc-ls -P /usr/local/autotest/containers -f -F name,state'
,
Apr 20 2017
another example: https://luci-milo.appspot.com/buildbot/chromeos/x86-alex-paladin/27534 FAIL ---- ---- timestamp=1492649674 localtime=Apr 19 17:54:34 Failed to setup container for test: retry exception (function="download_extract()"), timeout = 300s. Check logs in ssp_logs folder for more details.
,
Apr 21 2017
This looks similar to issue 712274 , we just saw "Failed to setup container for test: retry exception" on the last two cyan-pfq runs (links in issue 712274 ).
,
Apr 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d364662aa4eb1e82174de580ab49a8793eebe857 commit d364662aa4eb1e82174de580ab49a8793eebe857 Author: xixuan <xixuan@chromium.org> Date: Fri Apr 21 20:37:26 2017 autotest: add timeout for downloading files for ssp. Current the retry timeout limit is equal to the timeout for downloading ssp files, which make retry not retryable. So increase the timeout for retry to be 3 times of the timeout of download_file. BUG= chromium:712283 TEST=None Change-Id: I639cfd24e400e6a4e5ab046dddb0d808205fb721 Reviewed-on: https://chromium-review.googlesource.com/480541 Reviewed-by: Ningning Xia <nxia@chromium.org> Commit-Queue: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> [modify] https://crrev.com/d364662aa4eb1e82174de580ab49a8793eebe857/site_utils/lxc.py
,
Apr 21 2017
I chump the fix CL for this. But since push-to-prod is blocked for now, it won't take any effect.
,
Apr 21 2017
Issue 712274 has been merged into this issue.
,
Apr 21 2017
,
Apr 21 2017
,
Apr 25 2017
,
May 30 2017
,
Aug 1 2017
,
Jan 22 2018
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by nxia@chromium.org
, Apr 17 2017Labels: -Pri-3 OS-Chrome Pri-2