Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Starred by 7 users
Status: Archived
Owner:
Closed: Nov 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment
ssp fails to stage container artifact because of restricted_subnet
Project Member Reported by pprabhu@chromium.org, Nov 17 2016 Back to list
This means that tests on the CQ/BVT pool fail because some devserver that is not even maintained by the cros-infra team is misbehaving.

See for example: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=86138893

This is a paladin request, that failed because of:
wget --timeout=300 -nv http://100.107.227.251:8082/static/lumpy-paladin/R56-8998.0.0-rc2/autotest_server_package.tar.bz2 -O /usr/local/autotest/containers/test_86138893_1479400611_7813/delta0/usr/local/autotest_server_package.tar.bz2


where 100.107.227.251 is chromeos1-infra-devserver1. It's a devserver in a restricted_subnet that should not be used for staging artifacts except for some specialized team specific needs.
 
Labels: -Pri-1 Pri-0
Status: Started
chromeos7-infra-devserver1.cros is still getting picked for ssp staging AND it is still not working correctly.
This just causes a paladin failure.
Cc: pgeorgi@chromium.org pprabhu@chromium.org skau@chromium.org ntang@chromium.org
 Issue 666351  has been merged into this issue.
Cc: dshi@chromium.org
Owner: pprabhu@chromium.org
+dshi: Is there a workaround that doesn't require a push-to-prod?
This also affects testing push. The powerwash test keeps failing for this. Similar log:
32_1479413308_26829/delta0/usr/local/autotest_server_package.tar.bz2'
11/17 12:08:31.290 WARNI|             retry:0218| <class 'autotest_lib.client.common_lib.error.CmdError'>(Comman
d <sudo wget --timeout=300 -nv http://100.107.227.251:8082/static/quawks-release/R54-8743.44.0/autotest_server_p
ackage.tar.bz2 -O /usr/local/autotest/containers/test_1832_1479413308_26829/delta0/usr/local/autotest_server_pac
kage.tar.bz2> failed, rc=4, Command returned non-zero exit status
* Command: 
    sudo wget --timeout=300 -nv http://100.107.227.251:8082/static/quawks-
    release/R54-8743.44.0/autotest_server_package.tar.bz2 -O /usr/local/autote
    st/containers/test_1832_1479413308_26829/delta0/usr/local/autotest_server_
    package.tar.bz2
Exit status: 4
Duration: 0.243041992188
)
11/17 12:08:31.293 WARNI|             retry:0173| Retrying in 3.777172 seconds...
11/17 12:08:35.087 DEBUG|        base_utils:0185| Running 'sudo wget --timeout=300 -nv http://100.107.227.251:80
82/static/quawks-release/R54-8743.44.0/autotest_server_package.tar.bz2 -O /usr/local/autotest/containers/test_18
32_1479413308_26829/delta0/usr/local/autotest_server_package.tar.bz2'
11/17 12:10:44.656 WARNI|             retry:0218| <class 'autotest_lib.client.common_lib.error.CmdError'>(Comman
d <sudo wget --timeout=300 -nv http://100.107.227.251:8082/static/quawks-release/R54-8743.44.0/autotest_server_p
ackage.tar.bz2 -O /usr/local/autotest/containers/test_1832_1479413308_26829/delta0/usr/local/autotest_server_pac
kage.tar.bz2> failed, rc=4, Command returned non-zero exit status
* Command: 
    sudo wget --timeout=300 -nv http://100.107.227.251:8082/static/quawks-
    release/R54-8743.44.0/autotest_server_package.tar.bz2 -O /usr/local/autote
    st/containers/test_1832_1479413308_26829/delta0/usr/local/autotest_server_
    package.tar.bz2
Exit status: 4
Duration: 0.243041992188
)
The job_repo_url on one of the affected test:
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/86168234-chromeos-test/chromeos4-row6-rack11-host17/ssp_logs/debug/

is job_repo_url : http://100.115.219.133:8082/static/veyron_mighty-release/R54-8743.44.0/autotest/packages

That is a devserver from the correct subnet. So SSP alone is using incorrect devserver.
Cc: rohi...@chromium.org dhadd...@chromium.org josa...@chromium.org
 Issue 666498  has been merged into this issue.
Project Member Comment 8 by bugdroid1@chromium.org, Nov 17 2016
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/4e116826369f95645151f157e07b8256f7602329

commit 4e116826369f95645151f157e07b8256f7602329
Author: xixuan <xixuan@chromium.org>
Date: Thu Nov 17 23:32:10 2016

autotest: fix bugs for devserver function: download_file

BUG= chromium:666414 
TEST=Call lxc.download_extract() in hot, successfully download files.

Change-Id: Ibbc70333c14cbfca0f7e4db3899eb49979b3a726
Reviewed-on: https://chromium-review.googlesource.com/412331
Reviewed-by: Dan Shi <dshi@google.com>
Commit-Queue: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>

[modify] https://crrev.com/4e116826369f95645151f157e07b8256f7602329/server/hosts/cros_host.py
[modify] https://crrev.com/4e116826369f95645151f157e07b8256f7602329/site_utils/lxc.py
[modify] https://crrev.com/4e116826369f95645151f157e07b8256f7602329/client/common_lib/cros/dev_server.py

Cc: ihf@chromium.org kinaba@chromium.org
 Issue 666545  has been merged into this issue.
We're still hitting issues in test_push.

In particular: we get permission denied errors when trying to write the downloaded file in the new way:
11/17 16:24:43.998 INFO |     ts_mon_config:0150| Waiting for ts_mon flushing process to finish...
11/17 16:24:44.015 ERROR|         traceback:0013| Traceback (most recent call last):
11/17 16:24:44.015 ERROR|         traceback:0013|   File "/usr/local/autotest/server/autoserv", line 549, in run_autoserv
11/17 16:24:44.015 ERROR|         traceback:0013|     machines)
11/17 16:24:44.016 ERROR|         traceback:0013|   File "/usr/local/autotest/server/autoserv", line 183, in _run_with_ssp
11/17 16:24:44.016 ERROR|         traceback:0013|     dut_name=dut_name)
11/17 16:24:44.016 ERROR|         traceback:0013|   File "/usr/local/autotest/site-packages/statsd/timer.py", line 95, in _decorator
11/17 16:24:44.017 ERROR|         traceback:0013|     return function(*args, **kwargs)
11/17 16:24:44.017 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc.py", line 235, in func_cleanup_if_fail
11/17 16:24:44.018 ERROR|         traceback:0013|     return func(*args, **kwargs)
11/17 16:24:44.018 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc.py", line 917, in setup_test
11/17 16:24:44.019 ERROR|         traceback:0013|     download_extract(server_package_url, autotest_pkg_path, usr_local_path)
11/17 16:24:44.019 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/retry.py", line 208, in func_retry
11/17 16:24:44.019 ERROR|         traceback:0013|     remaining_time)
11/17 16:24:44.020 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/retry.py", line 114, in timeout
11/17 16:24:44.020 ERROR|         traceback:0013|     default_result = func(*args, **kwargs)
11/17 16:24:44.020 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc.py", line 276, in download_extract
11/17 16:24:44.021 ERROR|         traceback:0013|     dev_server.ImageServerBase.download_file(url, target, timeout=300)
11/17 16:24:44.021 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 891, in download_file
11/17 16:24:44.022 ERROR|         traceback:0013|     with open(local_file, 'w') as out_log:
11/17 16:24:44.022 ERROR|         traceback:0013| IOError: [Errno 13] Permission denied: '/usr/local/autotest/containers/test_1882_1479428678_27175/delta0/usr/local/autotest_server_package.tar.bz2'



I tried to force creating the parent directories first, but no go:

11/17 17:01:33.661 ERROR|         traceback:0013| Traceback (most recent call last):
11/17 17:01:33.661 ERROR|         traceback:0013|   File "/usr/local/autotest/server/autoserv", line 549, in run_autoserv
11/17 17:01:33.662 ERROR|         traceback:0013|     machines)
11/17 17:01:33.662 ERROR|         traceback:0013|   File "/usr/local/autotest/server/autoserv", line 183, in _run_with_ssp
11/17 17:01:33.662 ERROR|         traceback:0013|     dut_name=dut_name)
11/17 17:01:33.662 ERROR|         traceback:0013|   File "/usr/local/autotest/site-packages/statsd/timer.py", line 95, in _decorator
11/17 17:01:33.663 ERROR|         traceback:0013|     return function(*args, **kwargs)
11/17 17:01:33.663 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc.py", line 235, in func_cleanup_if_fail
11/17 17:01:33.664 ERROR|         traceback:0013|     return func(*args, **kwargs)
11/17 17:01:33.664 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc.py", line 921, in setup_test
11/17 17:01:33.665 ERROR|         traceback:0013|     download_extract(server_package_url, autotest_pkg_path, usr_local_path)
11/17 17:01:33.665 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/retry.py", line 208, in func_retry
11/17 17:01:33.665 ERROR|         traceback:0013|     remaining_time)
11/17 17:01:33.665 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/retry.py", line 114, in timeout
11/17 17:01:33.666 ERROR|         traceback:0013|     default_result = func(*args, **kwargs)
11/17 17:01:33.666 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc.py", line 278, in download_extract
11/17 17:01:33.666 ERROR|         traceback:0013|     os.makedirs(dirpath)
11/17 17:01:33.666 ERROR|         traceback:0013|   File "/usr/lib/python2.7/os.py", line 150, in makedirs
11/17 17:01:33.666 ERROR|         traceback:0013|     makedirs(head, mode)
11/17 17:01:33.666 ERROR|         traceback:0013|   File "/usr/lib/python2.7/os.py", line 150, in makedirs
11/17 17:01:33.667 ERROR|         traceback:0013|     makedirs(head, mode)
11/17 17:01:33.667 ERROR|         traceback:0013|   File "/usr/lib/python2.7/os.py", line 157, in makedirs
11/17 17:01:33.667 ERROR|         traceback:0013|     mkdir(name, mode)
11/17 17:01:33.667 ERROR|         traceback:0013| OSError: [Errno 13] Permission denied: '/usr/local/autotest/containers/test_1894_1479430890_2466/delta0'



I have no issues creating those directories directly on the server.

The SSP setup bit runs via python's exec, and may be running with reduced permissions.
The real issue here is everything executed in '/usr/local/autotest/container' should have 'sudo' previlege. 

Originally it's 'sudo wget ...'. 

I manually change the code to: 

dev_server.ImageServerBase.download_file(url, '/tmp/tmp.tar.bz2', timeout=300)
utils.run('sudo mv /tmp/tmp.tar.bz2 %s' % target)

it will pass the dummy_PassServer:ssp test.

Since we may need to have this backup plan in case that devservers outside the subnet are not available to approach in the future, I write a CL for add this fix?

Thanks xixuan!
Please write that CL.
I'll try to do a push tonight to fix this.
Project Member Comment 13 by bugdroid1@chromium.org, Nov 18 2016
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/017ac5279707712475db35657ca1adcdfef6c7cf

commit 017ac5279707712475db35657ca1adcdfef6c7cf
Author: xixuan <xixuan@chromium.org>
Date: Fri Nov 18 04:14:48 2016

autotest: Fix bugs in fetching packages for ssp.

This CL fix two things:
1. Only use ssh for devserver-related package download, continue using wget for
other server urls, like 'http://storage.googleapis.com'.
2. When using ssh for downloading, first download it as a temporary file, then
mv it to target file with sudo privilege. This is due to 'sudo' is required in
all operations inside container.

BUG= chromium:666414 
TEST=Run 'python ./site_utils/lxc_functional_test.py -s -v' on hot.
Run test dummy_PassServer.ssp on hot.

Change-Id: I98307768923809e02b2e559dec9064d67c686563
Reviewed-on: https://chromium-review.googlesource.com/412427
Commit-Queue: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Dan Shi <dshi@google.com>

[modify] https://crrev.com/017ac5279707712475db35657ca1adcdfef6c7cf/site_utils/lxc.py

Starting a push for this now.
A manual run of the consistently failing provision_AutoUpdate.double test just passed: http://chromeos-server13.cbf.corp.google.com/afe/#tab_id=view_job&object_id=86234559

We may be out of the woods here.
Let's see what happens with this paladin run.
Labels: -Pri-0 Pri-1
The storm has passed.

I still don't know why we started seeing these suddenly yesterday.
Status: Fixed
Summary: ssp fails to stage container artifact because of restricted_subnet (was: ssp picks devserver arbitrarily, does not respect restricted_subnet)
Project Member Comment 18 by bugdroid1@chromium.org, Dec 13 2016
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d72455b37f5b3cdea815e6f48e7b94e7e3b4cb51

commit d72455b37f5b3cdea815e6f48e7b94e7e3b4cb51
Author: xixuan <xixuan@chromium.org>
Date: Wed Nov 23 19:17:47 2016

autotest: improve docstring of func download_file in devserver.

BUG= chromium:666414 
TEST=None

Change-Id: I6921c7ec63ae222ee4c4ff2c9beb5f7ff69e1658
Reviewed-on: https://chromium-review.googlesource.com/414287
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Wai-Hong Tam <waihong@google.com>

[modify] https://crrev.com/d72455b37f5b3cdea815e6f48e7b94e7e3b4cb51/client/common_lib/cros/dev_server.py

Comment 19 by dchan@google.com, Mar 4 2017
Labels: VerifyIn-58
Labels: VerifyIn-59
Labels: VerifyIn-60
Labels: VerifyIn-61
Comment 23 by dchan@chromium.org, Oct 14 (3 days ago)
Status: Archived
Sign in to add a comment