Splunk Alerts

From UGCS
(Difference between revisions)
Jump to: navigation, search
m (moved Alerts to Splunk Alerts)
 

Latest revision as of 06:32, 12 September 2011

Splunk runs a bunch of saved searches that can activate alerts. Log in to splunk and go to "admin" and then "saved searches". Splunk saved search scripts are located at charon:/opt/splunk/bin/scripts These alerts are designed to let us know about problems so they can be fixed quickly and we can improve our overall service level. If you find a problem that can be found through a log search, please add a saved search.

Contents

Users logging into coreservers

This saved search runs once a minute and scans between one and two minutes ago for invalid users logging into coreservers. It then runs a script (muggles_trying_coreservers.py) that sends the user a helpful email. It has a couple of protections to protect people from getting notices about ssh brute-force attempts. This doesn't actually work and doesn't currently run.

LDAP server down

This alert sends sysadmins and sysadmins non-ugcs addresses notices if there are too many "ldap server down" messages. If you get it, double-check that Hera and Zeus are up and running correctly

Mail forwarded in past 15 min

This alert checks to see if we've forwarded any mail in the past 15min. If we haven't for a few periods, it is a likely indication of problems. It emails sysadmins and external sysadmins if it finds a problem. If you get one in the middle of the night, it's not a big deal. If you get 3 or 4 in a row, look through postfix logs for errors.

Client key expired

This alert lets you know if a Kerberos principal has expired. If one has, you should go reset its expiration date. This is especially important for server principals but also causes a lot of user pain.

IMAP Folder too full

When a user's mailbox fills up, they can't check their mailbox through IMAP. This alert lets us know if we need to increase their quota a little bit so they can check their mailbox and clean it up.

Email heartbeat

Hermes runs a cron job that tries sending an email through the system, and seeing if it gets all the way through. If there is too much delay, it sends alerts. The code is in hermes:/usr/local/sbin/email_tester.py. See Email Heartbeat

See Also

Personal tools