New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 612059 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

[chaos] "All devservers are currently down"

Project Member Reported by tienchang@chromium.org, May 15 2016

Issue description

05/14 18:38:47.052 DEBUG|        dev_server:0554| The host chromeos3-row1-rack2-host16 (172.22.38.62) is in a restricted subnet. Try to locate a devserver inside subnet 172.22.38.0:23.
05/14 18:38:47.054 DEBUG|        base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/check_health?"''
05/14 18:38:53.055 ERROR|        dev_server:0286| Devserver call failed: "http://172.22.39.238:8082/check_health?", timeout: 6.0 seconds, Error: Call is timed out.
05/14 18:38:53.057 ERROR|        dev_server:0593| All devservers are currently down: []
05/14 18:38:53.057 WARNI|              test:0606| Autotest caught exception when running test:
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 600, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 804, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 461, in execute
    dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 347, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 376, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 125, in run_once
    raise error.TestFail(str(e))
TestFail: All devservers are currently down: []

devserver *is* pingable (I can't ssh with testing_rsa key from CRD).

Sample logs from this job: http://cautotest/afe/#tab_id=view_job&object_id=63343297

Pri-1 as this blocks interop testing.
 
not sure about this error message. 

System reports an uptime of 42 days.
I had no issues with logging in or pinging it

chromeos-test@chromeos3-infra-devserver:~$ uptime
 11:04:10 up 42 days, 20 min,  2 users,  load average: 0.00, 0.02, 0.05
chromeos-test@chromeos3-infra-devserver:~$ 

ping out ok

chromeos-test@chromeos3-infra-devserver:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=56 time=2.30 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=56 time=2.25 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=56 time=2.24 ms
^C
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 2.249/2.271/2.309/0.047 ms

Ping it as well by address and hostname

ping 172.22.39.238
PING 172.22.39.238 (172.22.39.238) 56(84) bytes of data.
64 bytes from 172.22.39.238: icmp_seq=1 ttl=60 time=0.622 ms
64 bytes from 172.22.39.238: icmp_seq=2 ttl=60 time=0.627 ms
64 bytes from 172.22.39.238: icmp_seq=3 ttl=60 time=0.744 ms
^C
--- 172.22.39.238 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.622/0.664/0.744/0.060 ms
dschimmels@dschimmels3:~$ ping chromeos3-infra-devserver
PING chromeos3-infra-devserver.cros.corp.google.com (172.22.39.238) 56(84) bytes of data.
64 bytes from 172.22.39.238: icmp_seq=1 ttl=60 time=0.688 ms
64 bytes from 172.22.39.238: icmp_seq=2 ttl=60 time=0.733 ms
64 bytes from 172.22.39.238: icmp_seq=3 ttl=60 time=0.742 ms
^C
--- chromeos3-infra-devserver.cros.corp.google.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.688/0.721/0.742/0.023 ms




I encountered this error once but on a re-run the test was able to pick up a dev server. Try re-runing.
Cc: akes...@chromium.org
A run from 2016-05-15 17:08:17 (http://cautotest/afe/#tab_id=view_job&object_id=63438627)
shows this new failure message:

Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 600, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 804, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 461, in execute
    dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 347, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 376, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 139, in run_once
    raise error.TestFail(str(e))
TestFail: Update server at 172.22.39.238:8082 not available


A manually invoked test_that :lab: run "passed" but really the test didn't run.
http://cautotest/afe/#tab_id=view_job&object_id=63514052

Comment 5 by autumn@chromium.org, May 16 2016

Cc: dshi@chromium.org
Owner: akes...@chromium.org
Status: Assigned (was: Untriaged)
Somehow the devserver "disappeared" when attempting to provision.

05/16 23:48:15.073 DEBUG|        dev_server:0554| The host chromeos3-row1-rack1-host13 (172.22.38.43) is in a restricted subnet. Try to locate a devserver inside subnet 172.22.38.0:23.
05/16 23:48:15.074 DEBUG|        base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/check_health?"''
05/16 23:48:18.723 INFO |        dev_server:0931| Staging artifacts on devserver http://172.22.39.238:8082: build=nyan_big-release/R52-8334.0.0, artifacts=['full_payload', 'stateful', 'autotest_packages'], files=, archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0
05/16 23:48:18.724 DEBUG|        base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/stage?artifacts=full_payload,stateful,autotest_packages&files=&async=True&archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0"''
05/16 23:48:22.218 DEBUG|        base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0"''
05/16 23:48:35.760 DEBUG|        base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0"''
05/16 23:48:44.508 DEBUG|        base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0"''
05/16 23:48:53.024 DEBUG|        base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0"''
05/16 23:49:01.572 DEBUG|        base_utils:0176| Running 'ssh 172.22.39.238 'curl "http://172.22.39.238:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0"''
05/16 23:49:10.048 INFO |        dev_server:0953| Finished staging artifacts: build=nyan_big-release/R52-8334.0.0, artifacts=['full_payload', 'stateful', 'autotest_packages'], files=, archive_url=gs://chromeos-image-archive/nyan_big-release/R52-8334.0.0
05/16 23:49:10.086 DEBUG|provision_AutoUpda:0131| Installing image

================CUT========================

05/16 23:49:44.961 INFO |       autoupdater:0483| Updating from version 8120.0.0 to R52-8334.0.0.
05/16 23:49:44.963 WARNI|         cros_host:0755| Autoupdate did not complete.
05/16 23:49:44.963 ERROR|provision_AutoUpda:0138| Update server at 172.22.39.238:8082 not available
05/16 23:49:44.964 WARNI|              test:0606| Autotest caught exception when running test:
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 600, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 804, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 461, in execute
    dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 347, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 376, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 139, in run_once
    raise error.TestFail(str(e))
TestFail: Update server at 172.22.39.238:8082 not available

Comment 7 by dshi@chromium.org, May 17 2016

dschimmels, can we have a second devserver in chromeos3? A single point of failure is rather risky. Let me know when I can set it up.

I have added 3 more autotest devserver hosts to this lab.

chromeos3-infra-devserver - address has changed to 172.22.39.233
chromeos3-infra-devserver1- address 172.22.39.234
chromeos3-infra-devserver2- address 172.22.39.235
chromeos3-infra-devserver3 - address 172.22.39.236

Dan can you add these to your setting/environment so they can be used by the duts in their tests.

Thanks
David
Status: Fixed (was: Assigned)
This appears to be fixed from this job results (as in, the test doesn't fail for this error anymore). http://cautotest/afe/#tab_id=view_job&object_id=65278117

This fix included adding new devservers and updating all devservers to the latest devserve code.

I'll mark this as Fixed. Once we get a wifi_interop run without this problem, I will mark this as Verified.
Status: Verified (was: Fixed)
Closing. please reopen if its not fixed.

Sign in to add a comment