June 2008 Downtime Plan
From UGCS
We need to start planning for our downtime June 27-29. I would like to start doing stuff early Friday evening (maybe 6 or 7pm?), and then working through the night. By Saturday morning, we should have the UPS's in place and working with network shutdowns.
Don't forget to bring in some laptops with wifi so we can look stuff up on the internet when our connection is down.
Before this, I will set up a mail relay off-site (probably somewhere in Ruddock) to handle mail. We'll add it as a backup MX at least a week early so everyone will know about it. It will hold mail until hermes is back up.
Contents |
Stuff to do beforehand
- Plug in UPS (and its battery) to charge it
- Install apcupsd on all machines
- Set up kabta
- Add dionysus, kabta to our MX records
- Figure out how to get postfix to forward list stuff to lists.ugcs.caltech.edu
- Setup authenticated nsupdate on demeter
- Print out this page so we can refer to it when power is out
Power stuff
- Install new UPS by moving Zeus up and putting the new UPS where zeus currently is
- Move one of the power strips over to the new UPS
- Connect the new UPS to apollo
- Reset the BIOS password on Apollo and Athena
- Test unplugging the UPS's
Heartbeat
- Install heartbeat and basic config files on hermes, poseidon, demeter, athena, hestia
NFS Redundancy
- Use existing DRBD's images, keys, nfsstate
- Set up existing stuff as per haresources on hestia, athena
- IP nfs.ugcs.caltech.edu
- Filesystems
- NFS server
Postgres redundancy
- Shrink / on poseidon to 25G
- Setup DRBD with dionysus
- Create 8G LV for database stuff, DRBD, and format as reiserfs
- Setup resources
- DRBD, filesystem, postgres, <script to update postgres CNAME>
- Make sure appropriate keytabs are in place (move stuff from /etc/postgres/postgres.keytab to /etc/krb5.keytab)
Postfix redundancy
- Patch a new version of postfix and recompile
- Setup postfix on dionysus
- Put config files in cfengine
- Do something about mailman aliases?
- Use heartbeat to failover postfix between dionysus, hermes
- Try to get ldap failover on postfix to work again
Mailman redundancy
- Create LV for mailman stuff (10G?)
- Setup DRBD, etc with dionysus, format as reiserfs
- Setup Resources
- DRBD, filesystem, mailman
- We might want to set it up so that postfix knows to forward list emails to lists.ugcs.caltech.edu, otherwise we have to shutdown postfix where mailman isn't running
AFS redundancy
- Set up AFS failover stuff on hermes/athena
- test, test, test, especially since hermes is a DB server
Charon
- Add a second hard drive and move to lvm/raid1