Heartbeat
(→crm_resource) |
|||
| Line 7: | Line 7: | ||
==crm_resource== | ==crm_resource== | ||
| − | crm_resource is a command that lets you manage resources in the cluster. | + | crm_resource is a command that lets you manage resources in the cluster. To use it, you must be a member of the haclient group. |
crm_resource -W -r <resource> | crm_resource -W -r <resource> | ||
Revision as of 00:34, 8 June 2008
Heartbeat is a daemon that handles all the failover in the cluster. For starters, see http://linux-ha.org/ . We are running Heartbeat V2.
Contents |
Basics
Heartbeat works by managing resources on nodes. A node is a computer that runs stuff. A resource is any type of service that gets moved around. Examples of services include failover IP's, drbd disks (who is primary/secondary), filesystems (you use these to mount drbd stuff), and services. Resources are usually put into "Resource Groups". All the services in a resource group will be run on the same host, and they will be started/stopped sequentially.
There are also rules that help specify where services can be run. The most common is a "location" rule.
crm_resource
crm_resource is a command that lets you manage resources in the cluster. To use it, you must be a member of the haclient group.
crm_resource -W -r <resource>
Tells you where the specified resource is running
crm_resource -M -r <resource> [-h host]
Migrates the specified resource off of its current host. If -h is specified, it moves it to that host. This adds a location constraint with a score of -INFINITY for the resource and its current host (translation: the resource will never be run on its current host again), so you probably want to run
crm_resource -U -r <resource>
to remove this rule.
hb_gui
hb_gui is a graphical interface to the heartbeat cluster. It's quite nice, and is also very useful for configuring services.
Notes
- The raw configuration file is in /var/lib/heartbeat/crm/cib.xml . Never edit this file by hand- use cibadmin to add stuff to it. Better yet, use the gui.
- When configuring drbd stuff, use "drbddisk" instead of "drbd". The "drbd" resource is a V2 one that uses some complex master-slave stuff. Supposedly it can do cool stuff if it's set up correctly, but otherwise it's just confusing. "drbddisk" is much simpler and Just Works.
AFS
Doing stuff with AFS on heartbeat is kinda tricky. The problem is that the VLDB expects the volumes to be on the first server (server1, the primary). If you suddenly move them to the backup (server2), the VLDB doesn't know about it, and clients will try to talk to server1. The simple way to solve this is to run "vol syncvldb" on both, and then run "vos syncserv" on both. The problem is that you can't do this if one of the servers is down, and just running it on one server doesn't fix it.
Moving AFS server1-> server2
This is the easy case. server1 has failed and cannot be contacted (bad hardware failure, network cable got unplugged, etc). heartbeat sees this, and makes server2 primary drbd, mounts the filesystem, and restarts the fs. Then, you have to do (with actual IP addresses),
vos changeaddr -old server1 -new server2
The VLDB will now look at server2 for everything on server1. You should also run 'vos syncvldb' and 'vos syncserv' on server2 (yet to be determined if this is necessary)
Moving back
This is the trickier part, as we are assuming that server2 has other stuff on it. First, get drbd back on server1, and mount the partition.
Then, you will have to run
vos changeaddr -old server2 -remove bos restart server1 fs bos restart server2 fs vos syncvldb ... vos syncserv ...
This procedure is still under development Jdhutchin@ugcs.caltech.edu 17:33, 7 June 2008 (PDT)