New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 873326 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 15
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

CBF Ganeti instances will shut down for maintenance

Project Member Reported by jkop@chromium.org, Aug 10

Issue description

Hello Ganeti user!

This message is regarding the following machine(s):
  * chromeos-golo-server-test.cbf.corp.google.com
  * chromeos-golo-server1.cbf.corp.google.com
  * chromeos-golo-server3.cbf.corp.google.com
  * chromeos-gt-devserver11.cbf.corp.google.com
  * chromeos-gt-devserver12.cbf.corp.google.com
  * chromeos-gt-devserver13.cbf.corp.google.com
  * chromeos-gt-devserver14.cbf.corp.google.com
  * chromeos-gt-devserver15.cbf.corp.google.com
  * chromeos-gt-devserver16.cbf.corp.google.com
  * chromeos-gt-devserver17.cbf.corp.google.com
  * chromeos-gt-devserver18.cbf.corp.google.com
  * chromeos-server151.cbf.corp.google.com
  * chromeos-server155.cbf.corp.google.com
  * chromeos-server156.cbf.corp.google.com
  * chromeos-server158.cbf.corp.google.com
  * chromeos-server159.cbf.corp.google.com
  * chromeos-server160.cbf.corp.google.com
  * chromeos-staging-shard2.cbf.corp.google.com
  * cros-skylab-drone-10.cbf.corp.google.com
  * cros-skylab-drone-12.cbf.corp.google.com
  * cros-skylab-drone-14.cbf.corp.google.com
  * cros-skylab-drone-16.cbf.corp.google.com
  * cros-skylab-drone-2.cbf.corp.google.com
  * cros-skylab-drone-4.cbf.corp.google.com
  * cros-skylab-drone-6.cbf.corp.google.com
  * cros-skylab-drone-8.cbf.corp.google.com
  * cros-skylab-suite-server2.cbf.corp.google.com
  * cros-skylab-suite-server4.cbf.corp.google.com
  * guocb-dev-autotest2.cbf.corp.google.com
  * pprabhu-rpmserver-test-1.cbf.corp.google.com
  * pprabhu-skylab-drone-2.cbf.corp.google.com
  * puppet-testing.cbf.corp.google.com

Your instance(s) listed above will be shutdown between 2018-08-24 21:00:00 US/Pacific and
2018-08-25 20:00:00 US/Pacific.

Why?
Your instance (virtual machine) runs on a physical machine. That physical machine requires service that will take hours or days. We will move your instance to a different physical machine so that you continue to receive service. The move will not be visible to you except that the machine will be rebooted. The outage will effectively be the length of a reboot as your instance. This is needed for Electrical Maintenance. See t/33619050 for details.

Your instance(s) listed above will be started again after the maintenance is finished.
 
Labels: -Pri-2 -Chase-Pending Chase Pri-1
Owner: xixuan@chromium.org
Status: Assigned (was: Untriaged)
-> xixuan to correlate which of these have what role today (shard for what board, or other purpose)?

Then we can evaluate scope and take a planned outage if necessary.
------------------------------------------------------------------------------------------

Golo proxy server:

chromeos-golo-server1.cbf.corp.google.com  (For prod)
chromeos-golo-server3.cbf.corp.google.com  (For suite-scheduler)
chromeos-golo-server-test.cbf.corp.google.com  (For suite-scheduler)

What we still have:

chromeos-golo-server2.hot.corp.google.com  (For prod)
chromeos-golo-server4.hot.corp.google.com  (For suite-scheduler)

------------------------------------------------------------------------------------------

Devservers:

chromeos-gt-devserver<11-13>.cbf.corp.google.com

What we still have:

chromeos-gt-devserver21.hot.corp.google.com
chromeos-gt-devserver7.hot.corp.google.com

------------------------------------------------------------------------------------------

Crash_server:

chromeos-gt-devserver<14-16>.cbf.corp.google.com

What we still have:

chromeos-gt-devserver5.hot.corp.google.com
chromeos-gt-devserver6.hot.corp.google.com
chromeos-gt-devserver20.hot.corp.google.com

------------------------------------------------------------------------------------------

AFE:

chromeos-server151.cbf.corp.google.com (Not cautotest/cautotest-prod)
chromeos-server158.cbf.corp.google.com (Dedicated afe server for suite-scheduler)

What we still have:

cros-full-0034.mtv.corp.google.com (cautotest)
cros-full-0035.mtv.corp.google.com
cros-full-0036.mtv.corp.google.com (cautotest-prod)
cros-full-0037.mtv.corp.google.com

------------------------------------------------------------------------------------------

Shard:

chromeos-server155.cbf.corp.google.com       board:falco_li, board:whirlwind, board:storm, board:earth, board:eureka, board:glados

3 CQ boards: falco_li, whirlwind, glados

------------------------------------------------------------------------------------------

Sentinel Server:

chromeos-server156.cbf.corp.google.com

------------------------------------------------------------------------------------------

Staging lab:

chromeos-staging-shard2.cbf.corp.google.com (staging shard)

------------------------------------------------------------------------------------------

Doesn't-matter Servers:

chromeos-gt-devserver17.cbf.corp.google.com
chromeos-gt-devserver18.cbf.corp.google.com
chromeos-server159.cbf.corp.google.com
chromeos-server160.cbf.corp.google.com
cros-skylab-drone-*.cbf.corp.google.com
cros-skylab-suite-server*.cbf.corp.google.com
guocb-dev-autotest2.cbf.corp.google.com
pprabhu-rpmserver-test-1.cbf.corp.google.com
pprabhu-skylab-drone-2.cbf.corp.google.com
puppet-testing.cbf.corp.google.com


Owner: akes...@chromium.org
-> akeshet to announce outage
Status: Fixed (was: Assigned)
Cc: jrbarnette@chromium.org jkop@chromium.org
Status: Available (was: Fixed)
reopening, as the maintenance was postponed. email notificaiton:

"http://t/33619050 has been postponed. You may have seen a previous notification about this planned maintenance affecting the Ganeti instances cited above on 2018-08-24. This is no longer scheduled for 2018-08-24. The new schedule for this maintenance is TBD. Apologies for any inconvenience."
Status: WontFix (was: Available)
Not chase until there is something to chase - add back when the new maintenance window is announced.
Labels: -Pri-1 -Chase Pri-2
Status: Assigned (was: WontFix)
Reopening this bug, as it has more context on it than the new one. I still think we should just take the downtime. I can announce that.
Issue 883132 has been merged into this issue.
>  Issue 883132  has been merged into this issue.

Wait.  Double checking, I see that the hosts in the merged issue
are all in .hot, whereas the hosts considered here are all in .cbf.

So, I think this isn't actually a duplicate.

Labels: Chase-Pending
The latest notification:

Hello Ganeti user!

This message is regarding the following machine(s):
  * chromeos-golo-server-test.cbf.corp.google.com
  * chromeos-golo-server1.cbf.corp.google.com
  * chromeos-golo-server3.cbf.corp.google.com
  * chromeos-gt-devserver11.cbf.corp.google.com
  * chromeos-gt-devserver12.cbf.corp.google.com
  * chromeos-gt-devserver13.cbf.corp.google.com
  * chromeos-gt-devserver14.cbf.corp.google.com
  * chromeos-gt-devserver15.cbf.corp.google.com
  * chromeos-gt-devserver16.cbf.corp.google.com
  * chromeos-gt-devserver17.cbf.corp.google.com
  * chromeos-gt-devserver18.cbf.corp.google.com
  * chromeos-server151.cbf.corp.google.com
  * chromeos-server155.cbf.corp.google.com
  * chromeos-server156.cbf.corp.google.com
  * chromeos-server158.cbf.corp.google.com
  * chromeos-server159.cbf.corp.google.com
  * chromeos-server160.cbf.corp.google.com
  * chromeos-staging-shard2.cbf.corp.google.com
  * cros-skylab-drone-10.cbf.corp.google.com
  * cros-skylab-drone-12.cbf.corp.google.com
  * cros-skylab-drone-14.cbf.corp.google.com
  * cros-skylab-drone-16.cbf.corp.google.com
  * cros-skylab-drone-2.cbf.corp.google.com
  * cros-skylab-drone-4.cbf.corp.google.com
  * cros-skylab-drone-6.cbf.corp.google.com
  * cros-skylab-drone-8.cbf.corp.google.com
  * cros-skylab-staging-2.cbf.corp.google.com
  * cros-skylab-suite-server2.cbf.corp.google.com
  * cros-skylab-suite-server4.cbf.corp.google.com
  * guocb-dev-autotest2.cbf.corp.google.com
  * pprabhu-rpmserver-test-1.cbf.corp.google.com
  * pprabhu-skylab-drone-2.cbf.corp.google.com
  * puppet-testing.cbf.corp.google.com

The machines listed above will be SHUTDOWN sometime after 2018-10-12 18:00 PST and started back up sometime before 2018-10-13 16:00 PST.

Why?
CBF is scheduled to undergo electrical system maintenance at this time. See http://t/33619050. Apologies for the short notice. Due to this work having been rescheduled the short notice was unavoidable.

Questions?
If you have any questions please open a GUTS ticket for 'sre-ganeti': go/ganeti-ticket

Thanks,
Ganeti Virtualization SRE

Why did I get this email?
We send notifications to the individuals/groups listed in the service_contacts Rigging property associated with the affected machines, or the MDB group who owns the machines.
Please see: http://google3/ops/sysops/cmt/scripts/get_machine_contacts.py
Status: Fixed (was: Assigned)

Sign in to add a comment