[chaos] "All devservers are currently down" |
||||||
Issue description
05/14 18:38:47.052 DEBUG| dev_server:0554| The host chromeos3-row1-rack2-host16 (172.22.38.62) is in a restricted subnet. Try to locate a devserver inside subnet 172.22.38.0:23.
05/14 18:38:47.054 DEBUG| base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/check_health?"''
05/14 18:38:53.055 ERROR| dev_server:0286| Devserver call failed: "http://172.22.39.238:8082/check_health?", timeout: 6.0 seconds, Error: Call is timed out.
05/14 18:38:53.057 ERROR| dev_server:0593| All devservers are currently down: []
05/14 18:38:53.057 WARNI| test:0606| Autotest caught exception when running test:
Traceback (most recent call last):
File "/usr/local/autotest/client/common_lib/test.py", line 600, in _exec
_call_test_function(self.execute, *p_args, **p_dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 804, in _call_test_function
return func(*args, **dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 461, in execute
dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 347, in _call_run_once_with_retry
postprocess_profiled_run, args, dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 376, in _call_run_once
self.run_once(*args, **dargs)
File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 125, in run_once
raise error.TestFail(str(e))
TestFail: All devservers are currently down: []
devserver *is* pingable (I can't ssh with testing_rsa key from CRD).
Sample logs from this job: http://cautotest/afe/#tab_id=view_job&object_id=63343297
Pri-1 as this blocks interop testing.
,
May 16 2016
I encountered this error once but on a re-run the test was able to pick up a dev server. Try re-runing.
,
May 16 2016
,
May 16 2016
A run from 2016-05-15 17:08:17 (http://cautotest/afe/#tab_id=view_job&object_id=63438627) shows this new failure message: Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 600, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/client/common_lib/test.py", line 804, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/client/common_lib/test.py", line 461, in execute dargs) File "/usr/local/autotest/client/common_lib/test.py", line 347, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/client/common_lib/test.py", line 376, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 139, in run_once raise error.TestFail(str(e)) TestFail: Update server at 172.22.39.238:8082 not available A manually invoked test_that :lab: run "passed" but really the test didn't run. http://cautotest/afe/#tab_id=view_job&object_id=63514052
,
May 16 2016
,
May 17 2016
Somehow the devserver "disappeared" when attempting to provision.
05/16 23:48:15.073 DEBUG| dev_server:0554| The host chromeos3-row1-rack1-host13 (172.22.38.43) is in a restricted subnet. Try to locate a devserver inside subnet 172.22.38.0:23.
05/16 23:48:15.074 DEBUG| base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/check_health?"''
05/16 23:48:18.723 INFO | dev_server:0931| Staging artifacts on devserver http://172.22.39.238:8082: build=nyan_big-release/R52-8334.0.0, artifacts=['full_payload', 'stateful', 'autotest_packages'], files=, archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0
05/16 23:48:18.724 DEBUG| base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/stage?artifacts=full_payload,stateful,autotest_packages&files=&async=True&archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0"''
05/16 23:48:22.218 DEBUG| base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0"''
05/16 23:48:35.760 DEBUG| base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0"''
05/16 23:48:44.508 DEBUG| base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0"''
05/16 23:48:53.024 DEBUG| base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0"''
05/16 23:49:01.572 DEBUG| base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0"''
05/16 23:49:10.048 INFO | dev_server:0953| Finished staging artifacts: build=nyan_big-release/R52-8334.0.0, artifacts=['full_payload', 'stateful', 'autotest_packages'], files=, archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0
05/16 23:49:10.086 DEBUG|provision_AutoUpda:0131| Installing image
================CUT========================
05/16 23:49:44.961 INFO | autoupdater:0483| Updating from version 8120.0.0 to R52-8334.0.0.
05/16 23:49:44.963 WARNI| cros_host:0755| Autoupdate did not complete.
05/16 23:49:44.963 ERROR|provision_AutoUpda:0138| Update server at 172.22.39.238:8082 not available
05/16 23:49:44.964 WARNI| test:0606| Autotest caught exception when running test:
Traceback (most recent call last):
File "/usr/local/autotest/client/common_lib/test.py", line 600, in _exec
_call_test_function(self.execute, *p_args, **p_dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 804, in _call_test_function
return func(*args, **dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 461, in execute
dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 347, in _call_run_once_with_retry
postprocess_profiled_run, args, dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 376, in _call_run_once
self.run_once(*args, **dargs)
File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 139, in run_once
raise error.TestFail(str(e))
TestFail: Update server at 172.22.39.238:8082 not available
,
May 17 2016
dschimmels, can we have a second devserver in chromeos3? A single point of failure is rather risky. Let me know when I can set it up.
,
May 22 2016
I have added 3 more autotest devserver hosts to this lab. chromeos3-infra-devserver - address has changed to 172.22.39.233 chromeos3-infra-devserver1- address 172.22.39.234 chromeos3-infra-devserver2- address 172.22.39.235 chromeos3-infra-devserver3 - address 172.22.39.236 Dan can you add these to your setting/environment so they can be used by the duts in their tests. Thanks David
,
Jun 2 2016
This appears to be fixed from this job results (as in, the test doesn't fail for this error anymore). http://cautotest/afe/#tab_id=view_job&object_id=65278117 This fix included adding new devservers and updating all devservers to the latest devserve code. I'll mark this as Fixed. Once we get a wifi_interop run without this problem, I will mark this as Verified.
,
Aug 12 2016
Closing. please reopen if its not fixed. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by dschimmels@chromium.org
, May 16 2016