Issue metadata
Sign in to add a comment
|
provision failure "Device XXX is not pingable" |
||||||||||||||||||||||||
Issue descriptionExample build: https://uberchromegw.corp.google.com/i/chromeos/builders/wolf-tot-paladin/builds/7988 Reading from the CrOS_XXX autoupdate log for both attempts: 2016/10/03 01:27:10.698 INFO | cros_build_lib:0565| RunCommand: ping -c 1 -w 20 chromeos4-row1-rack5-host17 2016/10/03 01:27:30.769 DEBUG| cros_update:0224| Error happens in CrOS auto-update: DeviceNotPingableError('Device chromeos4-row1-rack5-host17 is not pingable.',)
,
Oct 3 2016
According to firmware event log, the device rebooted in Recovery mode (because it thought something/somebody was pressing recovery button), that why it was not accessible through network: 205 | 2016-10-02 23:17:10 | System Reset 206 | 2016-10-02 23:35:06 | System boot | 36058 207 | 2016-10-02 23:35:06 | SUS Power Fail 208 | 2016-10-02 23:35:06 | System Reset 209 | 2016-10-02 23:35:06 | ACPI Wake | S5 210 | 2016-10-02 23:35:06 | Wake Source | Power Button | 0 211 | 2016-10-02 23:35:07 | Chrome OS Recovery Mode | Recovery Button Pressed <<<<<< 212 | 2016-10-03 01:45:28 | System boot | 36059 213 | 2016-10-03 01:45:28 | SUS Power Fail 214 | 2016-10-03 01:45:28 | System Reset 215 | 2016-10-03 01:45:28 | ACPI Wake | S5 216 | 2016-10-03 01:45:28 | Wake Source | Power Button | 0
,
Oct 3 2016
,
Oct 4 2016
Similar issue in https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-paladin/builds/546 The follow up repair job is: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row6-rack6-host12/1733148-repair/20160310043232/ vapalatin@ which exact log file are you finding that in? Is this the same issue?
,
Oct 4 2016
Note I actually meant this build https://uberchromegw.corp.google.com/i/chromeos/builders/x86-alex-paladin/builds/25758
,
Oct 4 2016
Another form of device not pingable happened in https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-paladin/builds/550 provision job: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/79143018-chromeos-test/chromeos4-row6-rack9-host2/sysinfo/ Looks like during the first provision attempt, ssh connection was hung or dropped somewhere halfway though the process. Then, on the second attempt, device was not pingable. repair job: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row6-rack9-host2/209100-repair/20160310181654/
,
Oct 4 2016
,
Oct 4 2016
,
Oct 11 2016
,
Oct 11 2016
This appears to have been repeated today on falco-chrome-pfq https://uberchromegw.corp.google.com/i/chromeos/builders/falco-chrome-pfq/builds/4827 The provision job: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/80285076-chromeos-test/chromeos2-row4-rack6-host9/ The repair job: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row4-rack6-host9/1003388-repair/20161110020839/
,
Oct 12 2016
Latest run on falco-chrome-pfq died due to this failure https://uberchromegw.corp.google.com/i/chromeos/builders/falco-chrome-pfq/builds/4828 Raising the priority since this is now the single blocker of Chrome PFQ.
,
Oct 12 2016
For #6, that's just the classical cyan reboot issue ie crbug.com/639301 For #10 and #11, they seem to be totally unrelated auto-update errors, I haven't found the ping failure there (and the board was actually already up and running properly when the repair happened) That's yet another catch-all bug for all the possible issues happening when you update a machine, then reboot and try to connect over ssh (and there are dozens of possibles causes). Even at P0 priority level, there is nothing to do about such a bug (or mixed bag of random unanalyzed events)
,
Oct 12 2016
From the latest failure: https://uberchromegw.corp.google.com/i/chromeos/builders/falco-chrome-pfq/builds/4829 This provision failed because the devserver couldn't ping the dut: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/80452190-chromeos-test/chromeos2-row4-rack6-host7/ From sysinfo/CrOS_update_chromeos2-row4-rack6-host7_12381.log: 2016/10/12 00:10:53.349 DEBUG| cros_update:0225| Error happens in CrOS auto-update: DeviceNotPingableError('Device chromeos2-row4-rack6-host7 is not pingable.',) The devserver trying to call the dut is 100.115.245.198 If I try to ping the dut (chromeos2-row4-rack6-host7 or 100.115.226.238) from the devserver, there is no response, but I can ping both successfully from my workstation. Maybe a network switch misconfiguration? Haowei/John, Do you know if it's expected that the devserver (100.115.245.198) can't ping the dut (chromeos2-row4-rack6-host7 or 100.115.226.238)?
,
Oct 12 2016
100.115.245.198 -> chromeos2-devserver6
,
Oct 12 2016
,
Oct 12 2016
,
Oct 12 2016
Not sure how you tested kevcheng.... chromeos-test@chromeos2-devserver6:~$ ping chromeos2-row4-rack6-host7 PING chromeos2-row4-rack6-host7.cros.corp.google.com (100.115.226.238) 56(84) bytes of data. 64 bytes from 100.115.226.238: icmp_seq=1 ttl=63 time=3.93 ms 64 bytes from 100.115.226.238: icmp_seq=2 ttl=63 time=0.631 ms
,
Oct 12 2016
hmm... luckily I still have the screen: chromeos-test@chromeos2-devserver6:~$ ping chromeos2-row4-rack6-host7 PING chromeos2-row4-rack6-host7.cros.corp.google.com (100.115.226.238) 56(84) bytes of data. ^C --- chromeos2-row4-rack6-host7.cros.corp.google.com ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2015ms chromeos-test@chromeos2-devserver6:~$ ping 100.115.226.238 PING 100.115.226.238 (100.115.226.238) 56(84) bytes of data. ^C --- 100.115.226.238 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1006ms but at the same time: kevcheng@bmwm3:~/work/chromiumos-internal/src/third_party/hdctools$ ping chromeos2-row4-rack6-host7 PING chromeos2-row4-rack6-host7.cros.corp.google.com (100.115.226.238) 56(84) bytes of data. 64 bytes from 100.115.226.238: icmp_seq=1 ttl=59 time=0.762 ms 64 bytes from 100.115.226.238: icmp_seq=2 ttl=59 time=0.849 ms ^C --- chromeos2-row4-rack6-host7.cros.corp.google.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.762/0.805/0.849/0.051 ms network flakiness?
,
Oct 12 2016
Maybe or the DUT wasn't up/in weird state?
,
Oct 12 2016
A symptom like "sometimes the devserver can't reach a working DUT" also shows up in bug 619728.
,
Oct 13 2016
,
Oct 13 2016
https://uberchromegw.corp.google.com/i/chromeos/builders/falco-chrome-pfq/builds/4833 Still occurring every time on falco-chrome-pfq, and the failure here has blocked Chrome rolls.
,
Oct 13 2016
Hmm...I cloned (on cautotest doesn't work on shard apparently) a previously failed job and it passed... falco-chrome-pfq/R56-8892.0.0-rc1/bvt-inline/login_UserPolicyKeys * Passed * http://cautotest/afe/#tab_id=view_job&object_id=80741453 * Original failed job * http://chromeos-server12.cbf.corp.google.com/afe/#tab_id=view_job&object_id=80659092
,
Oct 14 2016
,
Oct 14 2016
,
Jan 26 2017
falco-chrome-pfq failed twice on this last night, which blocked chrome uprev: https://uberchromegw.corp.google.com/i/chromeos/builders/falco-chrome-pfq/builds/5249 https://uberchromegw.corp.google.com/i/chromeos/builders/falco-chrome-pfq/builds/5250 Does vpalatin still work on these sorts of CrOS issues? Is there another bug to dupe this to?
,
Jan 27 2017
semenzato@ has a CL for this.
,
Jan 27 2017
https://chromium-review.googlesource.com/#/c/433370/ https://chromium-review.googlesource.com/#/c/433389/
,
Jan 27 2017
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by akes...@chromium.org
, Oct 3 2016