kip shard (chromeos-server42.cbf) is down
Reported by
jrbarnette@chromium.org,
Jan 12 2017
|
||||||||||||
Issue description
The shard serving board:kip is down. The problem caused the
daily lab inventory to fail to send e-mail. Attempts to get
status of kip boards (with dut-status) fail similarly.
Test login to the server times out:
$ become chromeos-test@chromeos-server42.cbf
ssh: connect to host chromeos-server42.cbf.corp.google.com port 22: Connection timed out
I haven't checked the waterfall status, but this is bound
to be affecting the CQ and the kip canary and release builders.
,
Jan 12 2017
Please remember to add sheriffs to these bugs!
,
Jan 12 2017
How rude of me... and thank you for opening it!
,
Jan 12 2017
By the way, how would the sheriff tell that the shard is down? The logs show mostly timeouts. Are we missing an opportunity for a clearer error message? Thanks!
,
Jan 12 2017
That's an excellent question. This is my first duty shift in which this has been a problem.
,
Jan 12 2017
,
Jan 12 2017
,
Jan 12 2017
I don't understand WHY it was locked up, but the machine wasn't responding to ping or anything else. I used "cham --off <host>", "cham --on <host>" to reset it, and it seems to be recovering. Reverifying the duts now. I'm running reverify against all kip duts, and have seen some of them succeed. I'm calling this fixed.
,
Jan 12 2017
It's up and running, be appears to be really slow. "balance-pool cq kip" timed out once, and finished the second time but took multiple minutes to run. Also we are still seeing CQ failures because test suites on kip are timing out.
,
Jan 13 2017
Remarking as fixed after investigation.
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by jrbarnette@chromium.org
, Jan 12 2017The CQ has been failing because the kip-paladin can't test. This has been going on since this build last night: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/13332