Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Starred by 2 users
Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment
master-paladin build 16475 failed due to DNS failures
Project Member Reported by djkurtz@chromium.org, Oct 3 Back to list
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/16475

All failures are in HWTest [bvt-inline] stage:
---------------------

veyron_mighty-paladin: The HWTest [bvt-inline] stage failed: ** HWTest failed (code 1) **@https://luci-milo.appspot.com/buildbot/chromeos/veyron_mighty-paladin/6806
reef-paladin: The HWTest [bvt-inline] stage failed: ** HWTest failed (code 1) **@https://luci-milo.appspot.com/buildbot/chromeos/reef-paladin/3842
elm-paladin: The HWTest [bvt-inline] stage failed: ** HWTest failed (code 1) **@https://luci-milo.appspot.com/buildbot/chromeos/elm-paladin/4261
wolf-paladin: The HWTest [bvt-inline] stage failed: ** HWTest failed (code 1) **@https://luci-milo.appspot.com/buildbot/chromeos/wolf-paladin/15852
winky-paladin: The HWTest [bvt-inline] stage failed: ** HWTest failed (code 1) **@https://luci-milo.appspot.com/buildbot/chromeos/winky-paladin/3150
cave-paladin: The HWTest [bvt-inline] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) **@https://luci-milo.appspot.com/buildbot/chromeos/cave-paladin/1771
nyan_big-paladin: The HWTest [bvt-inline] stage failed: ** HWTest failed (code 1) **@https://luci-milo.appspot.com/buildbot/chromeos/nyan_big-paladin/3151
kevin-paladin: The HWTest [bvt-inline] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) **@https://luci-milo.appspot.com/buildbot/chromeos/kevin-paladin/2612
lumpy-paladin: The HWTest [bvt-inline] stage failed: ** HWTest failed (code 1) **@https://luci-milo.appspot.com/buildbot/chromeos/lumpy-paladin/29812
reef-uni-paladin: The HWTest [bvt-inline] [snappy] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) **@https://luci-milo.appspot.com/buildbot/chromeos/reef-uni-paladin/582
peach_pit-paladin: The HWTest [bvt-inline] stage failed: ** HWTest failed (code 1) **@https://luci-milo.appspot.com/buildbot/chromeos/peach_pit-paladin/17253
link-paladin: The HWTest [bvt-inline] stage failed: ** HWTest failed (code 1) **@https://luci-milo.appspot.com/buildbot/chromeos/link-paladin/29843

---------------------
All build failures are in  provision_AutoUpdate.double:

veyron_mighty-paladin/6806
** HWTest failed (code 1) **
provision_AutoUpdate.double: retry_count: 2, FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host17: SSHConnectionError: ssh: Could not resolve hostname chromeos4-row6-rack11-host17: Name or service not known

reef-paladin/3842
** HWTest failed (code 1) **
[Test-Logs]: provision_AutoUpdate.double: retry_count: 2, FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos2-row7-rack6-host21: SSHConnectionError: ssh: Could not resolve hostname chromeos2-row7-rack6-host21: Name or service not known

elm-paladin/4261
** HWTest failed (code 1) **
provision_AutoUpdate.double: retry_count: 1, FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos2-row7-rack7-host1: SSHConnectionError: ssh: Could not resolve hostname chromeos2-row7-rack7-host1: Name or service not known

wolf-paladin/15852
** HWTest failed (code 1) **
provision_AutoUpdate.double: retry_count: 2, FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row1-rack4-host1: SSHConnectionError: ssh: Could not resolve hostname chromeos4-row1-rack4-host1: Name or service not known

winky-paladin/3150
** HWTest failed (code 1) **
[Test-Logs]: provision_AutoUpdate.double: retry_count: 2, FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row3-rack13-host5: SSHConnectionError: ssh: Could not resolve hostname chromeos4-row3-rack13-host5: Name or service not known

cave-paladin/1771
** HWTest did not complete due to infrastructure issues (code 3) **
[Test-Logs]: provision_AutoUpdate.double: retry_count: 2, ABORT: None

nyan_big-paladin/3151
** HWTest failed (code 1) **
[Test-Logs]: provision_AutoUpdate.double: retry_count: 2, FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row5-rack10-host5: SSHConnectionError: ssh: Could not resolve hostname chromeos4-row5-rack10-host5: Name or service not known

kevin-paladin/2612
 ** HWTest did not complete due to infrastructure issues (code 3) **
[Test-Logs]: provision_AutoUpdate.double: retry_count: 2, ABORT: None

lumpy-paladin/29812
[Test-Logs]: provision_AutoUpdate.double: retry_count: 1, FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack8-host10: SSHConnectionError: ssh: Could not resolve hostname chromeos6-row2-rack8-host10: Name or service not known

reef-uni-paladin/582
 ** HWTest did not complete due to infrastructure issues (code 3) **
[Test-Logs]: provision_AutoUpdate.double: retry_count: 2, ABORT: None

peach_pit-paladin/17253
** HWTest failed (code 1) **
[Test-Logs]: provision_AutoUpdate.double: retry_count: 2, FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack10-host8: SSHConnectionError: ssh: Could not resolve hostname chromeos6-row2-rack10-host8: Name or service not known

link-paladin/29843
** HWTest failed (code 1) **
[Test-Logs]: provision_AutoUpdate.double: retry_count: 2, FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row5-rack13-host9: SSHConnectionError: ssh: Could not resolve hostname chromeos4-row5-rack13-host9: Name or service not known


---------------------
Details from veyron_mighty-paladin/6806 autoserv.DEBUG:

https://storage.cloud.google.com/chromeos-autotest-results/146703469-chromeos-test/chromeos4-row6-rack11-host17/debug/autoserv.DEBUG?_ga=2.31926264.-734044362.1501703718


10/03 14:02:57.897 DEBUG|        dev_server:2124| Start CrOS auto-update for host chromeos4-row6-rack11-host17 at 1 time(s).
10/03 14:02:57.900 DEBUG|             utils:0212| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/cros_au?full_update=True&force_update=True&build_name=veyron_mighty-paladin/R63-9999.0.0-rc1&host_name=chromeos4-row6-rack11-host17&async=True&clobber_stateful=True"''
10/03 14:03:06.426 INFO |        dev_server:1852| Received response from devserver for cros_au call: '[true, 31124]'
10/03 14:03:06.428 DEBUG|        dev_server:1975| start process 31124 for auto_update in devserver
10/03 14:03:06.429 DEBUG|        dev_server:1873| Check the progress for auto-update process 31124
10/03 14:03:06.430 DEBUG|             utils:0212| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/get_au_status?full_update=True&force_update=True&pid=31124&build_name=veyron_mighty-paladin/R63-9999.0.0-rc1&host_name=chromeos4-row6-rack11-host17&clobber_stateful=True"''
10/03 14:03:14.952 DEBUG|        dev_server:1909| Current CrOS auto-update status: CrOS update is just started.
10/03 14:03:25.001 DEBUG|             utils:0212| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/get_au_status?full_update=True&force_update=True&pid=31124&build_name=veyron_mighty-paladin/R63-9999.0.0-rc1&host_name=chromeos4-row6-rack11-host17&clobber_stateful=True"''
10/03 14:03:33.546 DEBUG|        dev_server:1978| Failed to trigger auto-update process on devserver
10/03 14:03:33.549 DEBUG|             utils:0212| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/collect_cros_au_log?pid=31124&host_name=chromeos4-row6-rack11-host17"''
10/03 14:03:42.045 DEBUG|        dev_server:1789| Saving auto-update logs into /usr/local/autotest/results/146703469-chromeos-test/autoupdate_logs/CrOS_update_chromeos4-row6-rack11-host17_31124.log
10/03 14:03:42.049 DEBUG|             utils:0212| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/handler_cleanup?pid=31124&host_name=chromeos4-row6-rack11-host17"''
10/03 14:03:42.096 DEBUG|        dev_server:0936| Error occurred with exit_code 255 when executing the ssh call: ssh_exchange_identification: Connection closed by remote host
...


10/03 14:04:55.712 DEBUG|             utils:0212| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/kill_au_proc?pid=31124&host_name=chromeos4-row6-rack11-host17"''
10/03 14:05:04.189 DEBUG|        dev_server:2189| Exception raised on auto_update attempt #1:
 Traceback (most recent call last):
  File "/home/chromeos-test/chromiumos/src/platform/dev/cros_update.py", line 219, in TriggerAU
    clobber_stateful=self.clobber_stateful)
  File "/home/chromeos-test/chromiumos/chromite/lib/auto_updater.py", line 1001, in __init__
    payload_filename=payload_filename)
  File "/home/chromeos-test/chromiumos/chromite/lib/auto_updater.py", line 244, in __init__
    self.device_dev_dir = os.path.join(self.device.work_dir, 'src')
  File "/home/chromeos-test/chromiumos/chromite/lib/remote_access.py", line 674, in work_dir
    capture_output=True).output.strip()
  File "/home/chromeos-test/chromiumos/chromite/lib/remote_access.py", line 904, in BaseRunCommand
    return self.GetAgent().RemoteSh(cmd, **kwargs)
  File "/home/chromeos-test/chromiumos/chromite/lib/remote_access.py", line 345, in RemoteSh
    raise SSHConnectionError(e.result.error)
SSHConnectionError: ssh: Could not resolve hostname chromeos4-row6-rack11-host17: Name or service not known




Johndhong - is this the same DNS failure that you just discovered on https://crbug.com/770632#c12 ?
 
Considering I just fixed the devservers just now
https://b.corp.google.com/issues/67379413
Owner: pho...@chromium.org
So in terms of timeline.
Sept. 28-29 until Oct. 3 15:40 high probability of devserver DNS issues.

Not sure if I should be owner now as I did the fix vs someone who should monitor this next run so kicking it over to Infra deputy
so dnsmasq was just coincidental ?  i've been seeing failures since, and i thought before too ...
Is the DNS fixed? 
There was one DNS error in the latest run (https://luci-milo.appspot.com/buildbot/chromeos/elm-paladin/4270)
Sign in to add a comment