chromeos-server18.cbf: Resurrect sentinel_service, add ts_mon metrics. |
||||||||
Issue description**** Nagios ***** Notification Type: PROBLEM Host: chromeos-server18.cbf State: DOWN Address: 172.25.66.231 Info: PING CRITICAL - Packet loss = 100% Date/Time: Tue Mar 14 12:32:15 PDT 2017 Should we stop checking to see if this server is down? Or do we need to fix it?
,
Mar 16 2017
Prathmesh restarted the nagios server, and no new alerts have been generated.
,
Mar 16 2017
oh smart. How to do it? any document for nagios?
,
Mar 16 2017
The landing page for admin tasks has a link to the nagios instructions: https://sites.google.com/a/google.com/chromeos/for-team-members/infrastructure/chromeos-admin#TOC-Lab-server-and-infrastructure-tasks links to https://sites.google.com/a/google.com/chromeos/for-team-members/infrastructure/chromeos-admin/server-monitoring I've updated the nagios page as it was a little bit-rotted.
,
Mar 16 2017
Just saw this bug. do we no longer care about db consistency? sentinel service is running in that server.
,
Mar 16 2017
,
Mar 16 2017
,
Mar 16 2017
,
Mar 17 2017
The Nagios alerts for this service are still being generated, though they did stop for a while. ***** Nagios ***** Notification Type: PROBLEM Host: chromeos-server18.cbf State: DOWN Address: 172.25.66.231 Info: PING CRITICAL - Packet loss = 100% Date/Time: Fri Mar 17 09:00:30 PDT 2017
,
Mar 21 2017
,
Mar 24 2017
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by shuqianz@chromium.org
, Mar 15 2017Owner: xixuan@chromium.org