lab swarming proxy server is down, all builder suite requests failing |
|||||||
Issue description
,
May 10 2017
Trying a reboot on https://portal.corp.google.com/devices/search?q=chromeos-server22&ent=prefix
,
May 10 2017
It's responding to ping, but still not ssh-able.
,
May 10 2017
If we can't ssh into it soon, we'll need to bring up a replacement. https://sites.google.com/a/google.com/chromeos/for-team-members/infrastructure/chromeos-admin/chromeos-proxy This claims that we have an existing puppet rule to do this.
,
May 10 2017
Specifically the steps "Provision bot server that runs the swarming bots."
,
May 10 2017
It's ssh'able now! and reappared on sysmon
,
May 10 2017
This problem may be fixed, but we really need to investigate what brought this server down. We have atop logs in /var/log/atop and probably other useful evidence around. Handing off.
,
May 10 2017
Exactly when did the server go down and what is the expected impact on the various waterfalls? I need to know which of the various Chrome PFQ and PFQ informational breakage I can safely ignore. Otherwise I'm going to waste a bunch of time investigating more-or-less known issues.
,
May 10 2017
The failure mode is all HWTests failing before being able to create the test suite: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/14537 https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/14537 It's pretty distinct
,
May 10 2017
Looks like the server dropped around 01:00 due to CPU and memory spike https://viceroy.corp.google.com/chromeos/machines?duration=1d&hostname=chromeos-server22&refresh=-1&utc_end=1494436696.14
,
May 16 2017
,
May 24 2017
Pri-0 bugs are critical regressions or serious emergencies, and this bug has not been updated in three days. Could you please provide an update, or adjust the priority to a more appropriate level if applicable? If a fix is in active development, please set the status to Started. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 24 2017
,
Aug 1 2017
,
Jan 22 2018
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by akes...@chromium.org
, May 10 2017