Heartbeat

From UGCS

Jump to: navigation, search

Heartbeat is a daemon that handles all the failover in the cluster. For starters, see http://linux-ha.org/ . We are running Heartbeat V2.

Contents

[edit] Basics

Heartbeat works by managing resources on nodes. A node is a computer that runs stuff. A resource is any type of service that gets moved around. Examples of services include failover IP's, drbd disks (who is primary/secondary), filesystems (you use these to mount drbd stuff), and services. Resources are usually put into "Resource Groups". All the services in a resource group will be run on the same host, and they will be started/stopped sequentially.

There are also rules that help specify where services can be run. The most common is a "location" rule.

[edit] Quick reference to resource types

[edit] drbddisk

Parameter: 1, value: <name of DRBD resource>

[edit] Filesystem

  • Parameter: device, value /dev/whatever
  • Parameter: directory, value: /mountpoint
  • Parameter: fstype, value xfs|reiserfs|ext3|etc

[edit] IPaddr

Parameter: 1, value: <ip address>

[edit] Commands

[edit] crm_resource

crm_resource is a command that lets you manage resources in the cluster. To use it, you must be a member of the haclient group.

crm_resource -W -r <resource>

Tells you where the specified resource is running

crm_resource -M -r <resource> [-h host]

Migrates the specified resource off of its current host. If -h is specified, it moves it to that host. This adds a location constraint with a score of -INFINITY for the resource and its current host (translation: the resource will never be run on its current host again), so you probably want to run

crm_resource -U -r <resource>

to remove this rule.

crm_resource -r <resource> -H <host> -C 

"Cleans up" a resource. You must use a real resource name, not a resource group.

crm_resource -r <resource> -p target_role -v (started|stopped)

Sets a resource's target role to either started or stopped.

[edit] hb_gui

hb_gui is a graphical interface to the heartbeat cluster. It's quite nice, and is also very useful for configuring services.

[edit] crm_mon

crm_mon is a command-line program that pretty-prints the current cluster status. You may also want to try crm_mon -n to show resources by nodes, or crm_mon -1 to just give one-shot info (not try to update it every 15sec or so)

[edit] crm_standby

crm_standyb allows you to set/clear the standby status of a machine. To put a machine into standby,

crm_standby -U <host> -v on

To take it out of standby, use either of the following commands

crm_standby -U <host> -D
crm_standby -U <host> -v off

[edit] Notes

  • The raw configuration file is in /var/lib/heartbeat/crm/cib.xml . Never edit this file by hand- use cibadmin to add stuff to it. Better yet, use the gui.
  • When configuring drbd stuff, use "drbddisk" instead of "drbd". The "drbd" resource is a V2 one that uses some complex master-slave stuff. Supposedly it can do cool stuff if it's set up correctly, but otherwise it's just confusing. "drbddisk" is much simpler and Just Works.

[edit] AFS

Doing stuff with AFS on heartbeat is kinda tricky. You need a shared IP, and the VLDB must reference the shared IP in the VLDB. See the ha-openafs scripts to see what's going on.

[edit] Postgres

You need to patch the init scripts so that status returns 3 instead of 4 if no clusters are defined (/usr/share/postgresql-common/init.d-functions, in status(), it should exit 3 instead of exit 4 if no clusters are defined)

Personal tools