Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Starred by 3 users
Status: Archived
Owner:
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug



Sign in to add a comment
kip shard (chromeos-server42.cbf) is down
Project Member Reported by jrbarnette@chromium.org, Jan 12 2017 Back to list
The shard serving board:kip is down.  The problem caused the
daily lab inventory to fail to send e-mail.  Attempts to get
status of kip boards (with dut-status) fail similarly.

Test login to the server times out:
    $ become chromeos-test@chromeos-server42.cbf
    ssh: connect to host chromeos-server42.cbf.corp.google.com port 22: Connection timed out

I haven't checked the waterfall status, but this is bound
to be affecting the CQ and the kip canary and release builders.

 
The CQ has been failing because the kip-paladin can't test.
This has been going on since this build last night:
    https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/13332

Cc: rspangler@chromium.org semenzato@chromium.org snanda@chromium.org kinaba@chromium.org
Please remember to add sheriffs to these bugs!

How rude of me... and thank you for opening it!

By the way, how would the sheriff tell that the shard is down?  The logs show mostly timeouts.  Are we missing an opportunity for a clearer error message?

Thanks!

That's an excellent question.

This is my first duty shift in which this has been a problem.
Comment 6 by sjg@chromium.org, Jan 12 2017
Cc: sjg@chromium.org
Labels: Hotlist-TreeCloser
Status: Fixed
I don't understand WHY it was locked up, but the machine wasn't responding to ping or anything else.

I used "cham --off <host>", "cham --on <host>" to reset it, and it seems to be recovering. Reverifying the duts now.

I'm running reverify against all kip duts, and have seen some of them succeed. I'm calling this fixed.
Cc: shuqianz@chromium.org
Status: Started
It's up and running, be appears to be really slow.

"balance-pool cq kip" timed out once, and finished the second time but took multiple minutes to run.

Also we are still seeing CQ failures because test suites on kip are timing out.
Status: Fixed
Remarking as fixed after investigation.
Comment 11 by dchan@google.com, Mar 4 2017
Labels: VerifyIn-58
Labels: VerifyIn-59
Labels: VerifyIn-60
Labels: VerifyIn-61
Comment 15 by dchan@chromium.org, Oct 14 (4 days ago)
Status: Archived
Sign in to add a comment