ssp fails to stage container artifact because of restricted_subnet |
|||||||||||
Issue descriptionThis means that tests on the CQ/BVT pool fail because some devserver that is not even maintained by the cros-infra team is misbehaving. See for example: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=86138893 This is a paladin request, that failed because of: wget --timeout=300 -nv http://100.107.227.251:8082/static/lumpy-paladin/R56-8998.0.0-rc2/autotest_server_package.tar.bz2 -O /usr/local/autotest/containers/test_86138893_1479400611_7813/delta0/usr/local/autotest_server_package.tar.bz2 where 100.107.227.251 is chromeos1-infra-devserver1. It's a devserver in a restricted_subnet that should not be used for staging artifacts except for some specialized team specific needs.
,
Nov 17 2016
Issue 666351 has been merged into this issue.
,
Nov 17 2016
+dshi: Is there a workaround that doesn't require a push-to-prod?
,
Nov 17 2016
This also affects testing push. The powerwash test keeps failing for this. Similar log:
32_1479413308_26829/delta0/usr/local/autotest_server_package.tar.bz2'
11/17 12:08:31.290 WARNI| retry:0218| <class 'autotest_lib.client.common_lib.error.CmdError'>(Comman
d <sudo wget --timeout=300 -nv http://100.107.227.251:8082/static/quawks-release/R54-8743.44.0/autotest_server_p
ackage.tar.bz2 -O /usr/local/autotest/containers/test_1832_1479413308_26829/delta0/usr/local/autotest_server_pac
kage.tar.bz2> failed, rc=4, Command returned non-zero exit status
* Command:
sudo wget --timeout=300 -nv http://100.107.227.251:8082/static/quawks-
release/R54-8743.44.0/autotest_server_package.tar.bz2 -O /usr/local/autote
st/containers/test_1832_1479413308_26829/delta0/usr/local/autotest_server_
package.tar.bz2
Exit status: 4
Duration: 0.243041992188
)
11/17 12:08:31.293 WARNI| retry:0173| Retrying in 3.777172 seconds...
11/17 12:08:35.087 DEBUG| base_utils:0185| Running 'sudo wget --timeout=300 -nv http://100.107.227.251:80
82/static/quawks-release/R54-8743.44.0/autotest_server_package.tar.bz2 -O /usr/local/autotest/containers/test_18
32_1479413308_26829/delta0/usr/local/autotest_server_package.tar.bz2'
11/17 12:10:44.656 WARNI| retry:0218| <class 'autotest_lib.client.common_lib.error.CmdError'>(Comman
d <sudo wget --timeout=300 -nv http://100.107.227.251:8082/static/quawks-release/R54-8743.44.0/autotest_server_p
ackage.tar.bz2 -O /usr/local/autotest/containers/test_1832_1479413308_26829/delta0/usr/local/autotest_server_pac
kage.tar.bz2> failed, rc=4, Command returned non-zero exit status
* Command:
sudo wget --timeout=300 -nv http://100.107.227.251:8082/static/quawks-
release/R54-8743.44.0/autotest_server_package.tar.bz2 -O /usr/local/autote
st/containers/test_1832_1479413308_26829/delta0/usr/local/autotest_server_
package.tar.bz2
Exit status: 4
Duration: 0.243041992188
)
,
Nov 17 2016
The job_repo_url on one of the affected test: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/86168234-chromeos-test/chromeos4-row6-rack11-host17/ssp_logs/debug/ is job_repo_url : http://100.115.219.133:8082/static/veyron_mighty-release/R54-8743.44.0/autotest/packages That is a devserver from the correct subnet. So SSP alone is using incorrect devserver.
,
Nov 17 2016
Issue 666498 has been merged into this issue.
,
Nov 17 2016
Some workarounds are in-flight: https://chromium-review.googlesource.com/#/c/412028/ https://chromium-review.googlesource.com/#/c/412384/ https://chromium-review.googlesource.com/#/c/412385/ Root cause still unknown.
,
Nov 17 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/4e116826369f95645151f157e07b8256f7602329 commit 4e116826369f95645151f157e07b8256f7602329 Author: xixuan <xixuan@chromium.org> Date: Thu Nov 17 23:32:10 2016 autotest: fix bugs for devserver function: download_file BUG= chromium:666414 TEST=Call lxc.download_extract() in hot, successfully download files. Change-Id: Ibbc70333c14cbfca0f7e4db3899eb49979b3a726 Reviewed-on: https://chromium-review.googlesource.com/412331 Reviewed-by: Dan Shi <dshi@google.com> Commit-Queue: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> [modify] https://crrev.com/4e116826369f95645151f157e07b8256f7602329/server/hosts/cros_host.py [modify] https://crrev.com/4e116826369f95645151f157e07b8256f7602329/site_utils/lxc.py [modify] https://crrev.com/4e116826369f95645151f157e07b8256f7602329/client/common_lib/cros/dev_server.py
,
Nov 18 2016
,
Nov 18 2016
We're still hitting issues in test_push. In particular: we get permission denied errors when trying to write the downloaded file in the new way: 11/17 16:24:43.998 INFO | ts_mon_config:0150| Waiting for ts_mon flushing process to finish... 11/17 16:24:44.015 ERROR| traceback:0013| Traceback (most recent call last): 11/17 16:24:44.015 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 549, in run_autoserv 11/17 16:24:44.015 ERROR| traceback:0013| machines) 11/17 16:24:44.016 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 183, in _run_with_ssp 11/17 16:24:44.016 ERROR| traceback:0013| dut_name=dut_name) 11/17 16:24:44.016 ERROR| traceback:0013| File "/usr/local/autotest/site-packages/statsd/timer.py", line 95, in _decorator 11/17 16:24:44.017 ERROR| traceback:0013| return function(*args, **kwargs) 11/17 16:24:44.017 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc.py", line 235, in func_cleanup_if_fail 11/17 16:24:44.018 ERROR| traceback:0013| return func(*args, **kwargs) 11/17 16:24:44.018 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc.py", line 917, in setup_test 11/17 16:24:44.019 ERROR| traceback:0013| download_extract(server_package_url, autotest_pkg_path, usr_local_path) 11/17 16:24:44.019 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/cros/retry.py", line 208, in func_retry 11/17 16:24:44.019 ERROR| traceback:0013| remaining_time) 11/17 16:24:44.020 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/cros/retry.py", line 114, in timeout 11/17 16:24:44.020 ERROR| traceback:0013| default_result = func(*args, **kwargs) 11/17 16:24:44.020 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc.py", line 276, in download_extract 11/17 16:24:44.021 ERROR| traceback:0013| dev_server.ImageServerBase.download_file(url, target, timeout=300) 11/17 16:24:44.021 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 891, in download_file 11/17 16:24:44.022 ERROR| traceback:0013| with open(local_file, 'w') as out_log: 11/17 16:24:44.022 ERROR| traceback:0013| IOError: [Errno 13] Permission denied: '/usr/local/autotest/containers/test_1882_1479428678_27175/delta0/usr/local/autotest_server_package.tar.bz2' I tried to force creating the parent directories first, but no go: 11/17 17:01:33.661 ERROR| traceback:0013| Traceback (most recent call last): 11/17 17:01:33.661 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 549, in run_autoserv 11/17 17:01:33.662 ERROR| traceback:0013| machines) 11/17 17:01:33.662 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 183, in _run_with_ssp 11/17 17:01:33.662 ERROR| traceback:0013| dut_name=dut_name) 11/17 17:01:33.662 ERROR| traceback:0013| File "/usr/local/autotest/site-packages/statsd/timer.py", line 95, in _decorator 11/17 17:01:33.663 ERROR| traceback:0013| return function(*args, **kwargs) 11/17 17:01:33.663 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc.py", line 235, in func_cleanup_if_fail 11/17 17:01:33.664 ERROR| traceback:0013| return func(*args, **kwargs) 11/17 17:01:33.664 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc.py", line 921, in setup_test 11/17 17:01:33.665 ERROR| traceback:0013| download_extract(server_package_url, autotest_pkg_path, usr_local_path) 11/17 17:01:33.665 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/cros/retry.py", line 208, in func_retry 11/17 17:01:33.665 ERROR| traceback:0013| remaining_time) 11/17 17:01:33.665 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/cros/retry.py", line 114, in timeout 11/17 17:01:33.666 ERROR| traceback:0013| default_result = func(*args, **kwargs) 11/17 17:01:33.666 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc.py", line 278, in download_extract 11/17 17:01:33.666 ERROR| traceback:0013| os.makedirs(dirpath) 11/17 17:01:33.666 ERROR| traceback:0013| File "/usr/lib/python2.7/os.py", line 150, in makedirs 11/17 17:01:33.666 ERROR| traceback:0013| makedirs(head, mode) 11/17 17:01:33.666 ERROR| traceback:0013| File "/usr/lib/python2.7/os.py", line 150, in makedirs 11/17 17:01:33.667 ERROR| traceback:0013| makedirs(head, mode) 11/17 17:01:33.667 ERROR| traceback:0013| File "/usr/lib/python2.7/os.py", line 157, in makedirs 11/17 17:01:33.667 ERROR| traceback:0013| mkdir(name, mode) 11/17 17:01:33.667 ERROR| traceback:0013| OSError: [Errno 13] Permission denied: '/usr/local/autotest/containers/test_1894_1479430890_2466/delta0' I have no issues creating those directories directly on the server. The SSP setup bit runs via python's exec, and may be running with reduced permissions.
,
Nov 18 2016
The real issue here is everything executed in '/usr/local/autotest/container' should have 'sudo' previlege.
Originally it's 'sudo wget ...'.
I manually change the code to:
dev_server.ImageServerBase.download_file(url, '/tmp/tmp.tar.bz2', timeout=300)
utils.run('sudo mv /tmp/tmp.tar.bz2 %s' % target)
it will pass the dummy_PassServer:ssp test.
Since we may need to have this backup plan in case that devservers outside the subnet are not available to approach in the future, I write a CL for add this fix?
,
Nov 18 2016
Thanks xixuan! Please write that CL. I'll try to do a push tonight to fix this.
,
Nov 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/017ac5279707712475db35657ca1adcdfef6c7cf commit 017ac5279707712475db35657ca1adcdfef6c7cf Author: xixuan <xixuan@chromium.org> Date: Fri Nov 18 04:14:48 2016 autotest: Fix bugs in fetching packages for ssp. This CL fix two things: 1. Only use ssh for devserver-related package download, continue using wget for other server urls, like 'http://storage.googleapis.com'. 2. When using ssh for downloading, first download it as a temporary file, then mv it to target file with sudo privilege. This is due to 'sudo' is required in all operations inside container. BUG= chromium:666414 TEST=Run 'python ./site_utils/lxc_functional_test.py -s -v' on hot. Run test dummy_PassServer.ssp on hot. Change-Id: I98307768923809e02b2e559dec9064d67c686563 Reviewed-on: https://chromium-review.googlesource.com/412427 Commit-Queue: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Dan Shi <dshi@google.com> [modify] https://crrev.com/017ac5279707712475db35657ca1adcdfef6c7cf/site_utils/lxc.py
,
Nov 18 2016
Starting a push for this now.
,
Nov 18 2016
A manual run of the consistently failing provision_AutoUpdate.double test just passed: http://chromeos-server13.cbf.corp.google.com/afe/#tab_id=view_job&object_id=86234559 We may be out of the woods here. Let's see what happens with this paladin run.
,
Nov 18 2016
The storm has passed. I still don't know why we started seeing these suddenly yesterday.
,
Nov 19 2016
,
Dec 13 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d72455b37f5b3cdea815e6f48e7b94e7e3b4cb51 commit d72455b37f5b3cdea815e6f48e7b94e7e3b4cb51 Author: xixuan <xixuan@chromium.org> Date: Wed Nov 23 19:17:47 2016 autotest: improve docstring of func download_file in devserver. BUG= chromium:666414 TEST=None Change-Id: I6921c7ec63ae222ee4c4ff2c9beb5f7ff69e1658 Reviewed-on: https://chromium-review.googlesource.com/414287 Commit-Ready: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Wai-Hong Tam <waihong@google.com> [modify] https://crrev.com/d72455b37f5b3cdea815e6f48e7b94e7e3b4cb51/client/common_lib/cros/dev_server.py
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by pprabhu@chromium.org
, Nov 17 2016Status: Started (was: Available)