Heartbeat
From UGCS
Heartbeat is a daemon that handles all the failover in the cluster. For starters, see http://linux-ha.org/ . We are running Heartbeat V2.
Contents |
[edit] Basics
Heartbeat works by managing resources on nodes. A node is a computer that runs stuff. A resource is any type of service that gets moved around. Examples of services include failover IP's, drbd disks (who is primary/secondary), filesystems (you use these to mount drbd stuff), and services. Resources are usually put into "Resource Groups". All the services in a resource group will be run on the same host, and they will be started/stopped sequentially.
There are also rules that help specify where services can be run. The most common is a "location" rule.
[edit] Quick reference to resource types
[edit] drbddisk
Parameter: 1, value: <name of DRBD resource>
[edit] Filesystem
- Parameter: device, value /dev/whatever
- Parameter: directory, value: /mountpoint
- Parameter: fstype, value xfs|reiserfs|ext3|etc
[edit] IPaddr
Parameter: 1, value: <ip address>
[edit] Commands
[edit] crm_resource
crm_resource is a command that lets you manage resources in the cluster. To use it, you must be a member of the haclient group.
crm_resource -W -r <resource>
Tells you where the specified resource is running
crm_resource -M -r <resource> [-h host]
Migrates the specified resource off of its current host. If -h is specified, it moves it to that host. This adds a location constraint with a score of -INFINITY for the resource and its current host (translation: the resource will never be run on its current host again), so you probably want to run
crm_resource -U -r <resource>
to remove this rule.
crm_resource -r <resource> -H <host> -C
"Cleans up" a resource. You must use a real resource name, not a resource group.
crm_resource -r <resource> -p target_role -v (started|stopped)
Sets a resource's target role to either started or stopped.
[edit] hb_gui
hb_gui is a graphical interface to the heartbeat cluster. It's quite nice, and is also very useful for configuring services.
[edit] crm_mon
crm_mon is a command-line program that pretty-prints the current cluster status. You may also want to try crm_mon -n to show resources by nodes, or crm_mon -1 to just give one-shot info (not try to update it every 15sec or so)
[edit] crm_standby
crm_standyb allows you to set/clear the standby status of a machine. To put a machine into standby,
crm_standby -U <host> -v on
To take it out of standby, use either of the following commands
crm_standby -U <host> -D crm_standby -U <host> -v off
[edit] Notes
- The raw configuration file is in /var/lib/heartbeat/crm/cib.xml . Never edit this file by hand- use cibadmin to add stuff to it. Better yet, use the gui.
- When configuring drbd stuff, use "drbddisk" instead of "drbd". The "drbd" resource is a V2 one that uses some complex master-slave stuff. Supposedly it can do cool stuff if it's set up correctly, but otherwise it's just confusing. "drbddisk" is much simpler and Just Works.
[edit] AFS
Doing stuff with AFS on heartbeat is kinda tricky. You need a shared IP, and the VLDB must reference the shared IP in the VLDB. See the ha-openafs scripts to see what's going on.
[edit] Postgres
You need to patch the init scripts so that status returns 3 instead of 4 if no clusters are defined (/usr/share/postgresql-common/init.d-functions, in status(), it should exit 3 instead of exit 4 if no clusters are defined)

