New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 652207 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 683304
Owner:
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 653443
issue 655338
issue 655350



Sign in to add a comment

provision failure "Device XXX is not pingable"

Project Member Reported by akes...@chromium.org, Oct 3 2016

Issue description

Example build: https://uberchromegw.corp.google.com/i/chromeos/builders/wolf-tot-paladin/builds/7988

Reading from the CrOS_XXX autoupdate log for both attempts:

2016/10/03 01:27:10.698 INFO |    cros_build_lib:0565| RunCommand: ping -c 1 -w 20 chromeos4-row1-rack5-host17
2016/10/03 01:27:30.769 DEBUG|       cros_update:0224| Error happens in CrOS auto-update: DeviceNotPingableError('Device chromeos4-row1-rack5-host17 is not pingable.',)
 
Cc: vpalatin@chromium.org jrbarnette@chromium.org
Have not had a chance to dig into why this was not pingable. Seems to have been fixed by powerwash though. 

http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row1-rack5-host17/1732425-repair/

Anyone want to dig into the crash logs there?
According to firmware event log, the device rebooted in Recovery mode (because it thought something/somebody was pressing recovery button), that why it was not accessible through network:
205 | 2016-10-02 23:17:10 | System Reset
206 | 2016-10-02 23:35:06 | System boot | 36058
207 | 2016-10-02 23:35:06 | SUS Power Fail
208 | 2016-10-02 23:35:06 | System Reset
209 | 2016-10-02 23:35:06 | ACPI Wake | S5
210 | 2016-10-02 23:35:06 | Wake Source | Power Button | 0
211 | 2016-10-02 23:35:07 | Chrome OS Recovery Mode | Recovery Button Pressed <<<<<<
212 | 2016-10-03 01:45:28 | System boot | 36059
213 | 2016-10-03 01:45:28 | SUS Power Fail
214 | 2016-10-03 01:45:28 | System Reset
215 | 2016-10-03 01:45:28 | ACPI Wake | S5
216 | 2016-10-03 01:45:28 | Wake Source | Power Button | 0


Cc: cywang@chromium.org kevcheng@chromium.org
Owner: vpalatin@chromium.org
Similar issue in https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-paladin/builds/546

The follow up repair job is: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row6-rack6-host12/1733148-repair/20160310043232/

vapalatin@ which exact log file are you finding that in? Is this the same issue?


Another form of device not pingable happened in https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-paladin/builds/550

provision job: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/79143018-chromeos-test/chromeos4-row6-rack9-host2/sysinfo/

Looks like during the first provision attempt, ssh connection was hung or dropped somewhere halfway though the process. Then, on the second attempt, device was not pingable.

repair job: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row6-rack9-host2/209100-repair/20160310181654/
Status: Assigned (was: Untriaged)
Labels: -current-issue

Comment 9 by lpique@chromium.org, Oct 11 2016

Blockedon: 653443
Labels: -Pri-2 Pri-1
Latest run on falco-chrome-pfq died due to this failure https://uberchromegw.corp.google.com/i/chromeos/builders/falco-chrome-pfq/builds/4828


Raising the priority since this is now the single blocker of Chrome PFQ.
For #6, that's just the classical cyan reboot issue ie crbug.com/639301
For #10 and #11, they seem to be totally unrelated auto-update errors, I haven't found the ping failure there  (and the board was actually already up and running properly when the repair happened) 

That's yet another catch-all bug for all the possible issues happening when you update a machine, then reboot and try to connect over ssh (and there are dozens of possibles causes).
Even at P0 priority level, there is nothing to do about such a bug (or mixed bag of random unanalyzed events)


Cc: haoweiw@chromium.org dshi@chromium.org johndhong@chromium.org
From the latest failure: https://uberchromegw.corp.google.com/i/chromeos/builders/falco-chrome-pfq/builds/4829

This provision failed because the devserver couldn't ping the dut: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/80452190-chromeos-test/chromeos2-row4-rack6-host7/

From sysinfo/CrOS_update_chromeos2-row4-rack6-host7_12381.log:
2016/10/12 00:10:53.349 DEBUG|       cros_update:0225| Error happens in CrOS auto-update: DeviceNotPingableError('Device chromeos2-row4-rack6-host7 is not pingable.',)

The devserver trying to call the dut is 100.115.245.198

If I try to ping the dut (chromeos2-row4-rack6-host7 or 100.115.226.238) from the devserver, there is no response, but I can ping both successfully from my workstation.  Maybe a network switch misconfiguration? 

Haowei/John,
Do you know if it's expected that the devserver (100.115.245.198) can't ping the dut (chromeos2-row4-rack6-host7 or 100.115.226.238)?
100.115.245.198 -> chromeos2-devserver6
Blocking: 653443
Blockedon: -653443
Not sure how you tested kevcheng....

chromeos-test@chromeos2-devserver6:~$ ping chromeos2-row4-rack6-host7
PING chromeos2-row4-rack6-host7.cros.corp.google.com (100.115.226.238) 56(84) bytes of data.
64 bytes from 100.115.226.238: icmp_seq=1 ttl=63 time=3.93 ms
64 bytes from 100.115.226.238: icmp_seq=2 ttl=63 time=0.631 ms

hmm... luckily I still have the screen:

chromeos-test@chromeos2-devserver6:~$ ping chromeos2-row4-rack6-host7
PING chromeos2-row4-rack6-host7.cros.corp.google.com (100.115.226.238) 56(84) bytes of data.
^C
--- chromeos2-row4-rack6-host7.cros.corp.google.com ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2015ms

chromeos-test@chromeos2-devserver6:~$ ping 100.115.226.238
PING 100.115.226.238 (100.115.226.238) 56(84) bytes of data.
^C
--- 100.115.226.238 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1006ms


but at the same time:
kevcheng@bmwm3:~/work/chromiumos-internal/src/third_party/hdctools$ ping chromeos2-row4-rack6-host7
PING chromeos2-row4-rack6-host7.cros.corp.google.com (100.115.226.238) 56(84) bytes of data.
64 bytes from 100.115.226.238: icmp_seq=1 ttl=59 time=0.762 ms
64 bytes from 100.115.226.238: icmp_seq=2 ttl=59 time=0.849 ms
^C
--- chromeos2-row4-rack6-host7.cros.corp.google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.762/0.805/0.849/0.051 ms


network flakiness?
Maybe or the DUT wasn't up/in weird state?
A symptom like "sometimes the devserver can't reach a working DUT"
also shows up in bug 619728.

Comment 21 by bccheng@google.com, Oct 13 2016

Cc: bccheng@chromium.org
https://uberchromegw.corp.google.com/i/chromeos/builders/falco-chrome-pfq/builds/4833

Still occurring every time on falco-chrome-pfq, and the failure here has blocked Chrome rolls.
Hmm...I cloned (on cautotest doesn't work on shard apparently) a previously failed job and it passed...

falco-chrome-pfq/R56-8892.0.0-rc1/bvt-inline/login_UserPolicyKeys

* Passed *
http://cautotest/afe/#tab_id=view_job&object_id=80741453


* Original failed job *
http://chromeos-server12.cbf.corp.google.com/afe/#tab_id=view_job&object_id=80659092
Blocking: 655338
Blocking: 655350
Cc: pprabhu@chromium.org warx@chromium.org
falco-chrome-pfq failed twice on this last night, which blocked chrome uprev:
https://uberchromegw.corp.google.com/i/chromeos/builders/falco-chrome-pfq/builds/5249
https://uberchromegw.corp.google.com/i/chromeos/builders/falco-chrome-pfq/builds/5250

Does vpalatin still work on these sorts of CrOS issues? Is there another bug to dupe this to?

Owner: semenzato@chromium.org
semenzato@ has a CL for this.
Cc: -johndhong@chromium.org englab-sys-cros@google.com

Sign in to add a comment