Nagios

From UGCS
Jump to: navigation, search

We currently use Nagios as a service monitor. It periodically checks nearly every service in UGCS, and makes sure that the service is at least responsive. It is currently running on Dionysus.

Contents

Website

You can view the current status of nagios at https://nagios.ugcs.caltech.edu It requires a valid UGCS login to access, since nagios's website is notorious for vulnerabilities. If you are a sysadmin and your name is in /etc/nagios3/cgi.conf at the appropriate places, you will be able to run commands from the website (things like ignoring problems or re-scheduling the next update of a host). See where current sysadmins are to see where to place your name (as username@UGCS.CALTECH.EDU)

check_ldap

check_ldap won't work unless you modify its plugin command. You need to modify /etc/nagios-plugins/config/ldap.conf so that it gives $HOSTNAME$.ugcs.caltech.edu instead of $HOSTADDRESS$. Otherwise it just won't work.


Nagios configurator

There is a set of scripts to automatically generate nagios service and hostgroups. They are in the configurator directory. The base python file is NagiosInfo.py. It contains two classes, Hostgroup and Servicegroup that contain information on the appropriate services.

A hostgroup is a bunch of hosts that are similar. A hostgroup can have classes (from configurator) added to it with hgroup.add_class(classname). You can also include/exclude specific machines by adding their names to the lists hgroup.includes or hgroup.excludes.

New Nagios configuration

We will be overhauling the Nagios configuration setup to make it easier to manage and add servers/services. In particular, we will (at first) only be using configurator to generate the basic hostgroups coreservers, shellservers, and mortals. We will do this instead of generating all obvious hostgroups from configurator, as we don't add servers often enough to require quick reconfiguration.

Each group of hosts and associated services (such as standard load checks, or all nfs checks) will be grouped into individual files. Then, we will have a servicegroups file to hold all service not in other files.

See Also

Personal tools