Alerting
We have a variety of automated alerting at UGCS to let us know when things are breaking or already broken.
Contents |
Notes on alerts
Some alerts are critical, so it is nice if they go to a cell phone ("paging device"). Each carrier usually has their own way of sending an sms to a phone via an email address. As of 2011, Verizon is tendigitnumber@vtext.net (like 6505555555@vtext.net). AT&T is tendigitnumber@txt.att.net.
Sometimes these alerts from nagios will get really spammy and you may get hundreds of texts (some tweaking of nagios alert rate limiting should be done). Make sure this will not bankrupt you if you put your phone in these config files.
Nagios Alerting
Most of our alerts come from Nagios. These include things like host down, service not running, or other problems. You should edit (in cfengine) nagios3/conf.d/contacts.cfg and nagios3/conf.d/critical_notices.cfg to add yourself to the list. There is also a list of 'sms-all' that contains mail aliases for pagers.
Some services in nagios have a separate alert like "Critical load", etc. These are the alerts that will get sent to paging devices, so they typically have higher thresholds or longer hold times before they fire. By default, they will go through IMSS's mail servers instead of ours so we can still get notified if our mail system is down.
Splunk Alerts
Splunk does regular scans of all of our logs and can alert based on log messages it sees. See Splunk Alerts for more information.
Kabta ping test
There is a script running on Kabta that pings UGCS and complains if it can't. This should definitely go to a paging device if possible. You will have to ssh to kabta to edit it.
Email Heartbeat
We have an end-to-end email testing system that sends a message through UGCS once every 5 minutes and complains if it is too late. You should edit the config file in hermes:/etc/email_heartbeat to add yourself.
By default these go through IMSS's mail server (this is pretty clear from the config file). You should probably send them to a paging device and a non-UGCS email.