New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 720358 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug



Sign in to add a comment

lab swarming proxy server is down, all builder suite requests failing

Project Member Reported by akes...@chromium.org, May 10 2017

Issue description

$ become chromeos-test@chromeos-server22.cbf
ssh: connect to host chromeos-server22.cbf.corp.google.com port 22: Connection timed out
It's responding to ping, but still not ssh-able.
Cc: dshi@chromium.org
If we can't ssh into it soon, we'll need to bring up a replacement.

https://sites.google.com/a/google.com/chromeos/for-team-members/infrastructure/chromeos-admin/chromeos-proxy

This claims that we have an existing puppet rule to do this.
Specifically the steps "Provision bot server that runs the swarming bots."
It's ssh'able now! and reappared on sysmon
This problem may be fixed, but we really need to investigate what brought this server down. We have atop logs in /var/log/atop and probably other useful evidence around.

Handing off.
Status: Assigned (was: Untriaged)
Exactly when did the server go down and what is the expected impact on the various waterfalls?

I need to know which of the various Chrome PFQ and PFQ informational breakage I can safely ignore. Otherwise I'm going to waste a bunch of time investigating more-or-less known issues.

The failure mode is all HWTests failing before being able to create the test suite:

https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/14537
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/14537

It's pretty distinct

Comment 10 Deleted

Looks like the server dropped around 01:00 due to CPU and memory spike

https://viceroy.corp.google.com/chromeos/machines?duration=1d&hostname=chromeos-server22&refresh=-1&utc_end=1494436696.14

Comment 12 by aut...@google.com, May 16 2017

Labels: -current-issue
Project Member

Comment 13 by sheriffbot@chromium.org, May 24 2017

Pri-0 bugs are critical regressions or serious emergencies, and this bug has not been updated in three days. Could you please provide an update, or adjust the priority to a more appropriate level if applicable?

If a fix is in active development, please set the status to Started.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Status: Fixed (was: Assigned)
Labels: VerifyIn-61

Comment 16 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment