Shellserver Systemimager Decision

From UGCS
Jump to: navigation, search

I think we should start working on moving our shellservers to an automated, imaged setup instead of our current read-only root over NFS. This suggestion was triggered by having to reboot almost all the shellservers due to NFS-hung processes (that appear to be unkillable despite the fs being mounted with "intr")

Contents

Pros

  • Read-only root over nfs has never worked perfectly (vt switching, etc)
  • It would give us more responsive shellservers (no more waiting forever for "man" to work)
  • Some nice things would work, like hald automounting
  • It would mostly eliminate the need for a new nfs server. Since the load would be low, we could use any old machine, and we would probably set something up with more disk space and have a 64-bit image.

And some of the reasons that are harder to fix:

  • I'm not sure that aufs (which we use for unionfs because it supports the right stuff with NFS) is going to be maintained in a state we can use it. Patching the kernel to get it to work has always been dicey, which makes kernel upgrades next to impossible. Luckily I was able to get it to compile quickly last time there was a serious kernel vulnerability.
  • We can keep shellservers up for more than 50-60 days before they develop hung processes that require a reboot.

Cons

  • No more read-only root- security wouldn't be as good. This will partially be mitigated by blocking module loads
  • I still haven't seen a convincing way to automatically do apt updates. Some debconf trickery with some custom scripting (remctl perhaps?) may solve the problem, or we'll just use mssh
  • It takes about an hour and a half to rsync the image since we don't have a gigabit switch for the shellservers (since there are lots of little files, we don't quite get 10mb/s). This time can be cut down to maybe an hour? if some stuff (like mathematica, matlab) is placed on NFS. If we wanted to do multiple machines at once we'd need their bittorrent engine. However, we wouldn't have to do this often and I think it would be worth it.

Pieces of the puzzle

SystemImager

  • One option is to set up SystemImager to setup the machines when they netboot.
  • Another option is to do it ourselves. This wouldn't be that much work (setting the right options for rsync) and would let us use our current setup instead of dealing with systemimager, which looks like it's set up for a different environment than ours. Systemimager is meant for one-time use, while we need sometime to run every time the machine is rebooted. We could also use this to ensure system integrity upon boot.

Steps needed:

  1. Have a hook so that rsync, etc get added to the initramfs
  2. In init-premount, create all the LVM's. We might want a nice python or perl script to do it intelligently
  3. After / is mounted, mount other filesystems as necessary (?)
  4. Rsync the image on. If we already have an image, check critical parts with md5's and have rsync look at mtime for the rest
  5. Run cfengine in /etc/rcS.d

System configuration

SystemImager comes with System Configurator to configure systems after they have been installed. We may just use cfengine since we already have it, and network will already be set up by the netboot.

Automated Apt

We should be able to use apt mostly automated by pre-seeding debconf. However, some packages don't behave well and still ask questions.

System integrity

We need a system integrity process to make sure that these machines don't get compromised. In particular, we need to check a few things that attackers do to cover their tracks and keep a system under their control

  • Load a rootkit, through modules or a trojaned kernel
  • Trojan executables, such as ssh or various suid programs
  • Block system logging
  • Key logging

Boot-up

Netboot makes sure that someone doesn't place a trojaned kernel in /boot. We will set the machines to not boot off of disk to prevent a trojaned kernel from making all the other efforts worthless.

Rootkit modules

This is much tougher. We can (hopefully) set the kernel to refuse future module loads- however, we have to load all the modules we might want before this. We need to

  • Make a list of modules to load for the system to be usable for users (things so USB sticks will mount, for instance)
  • Make sure the modules being loaded aren't trojaned. This could be a problem if someone trojans a module that gets loaded early. To solve this problem, we can put a list of module md5sum's in the initrd, which gets loaded through netboot. Although there are ~2000 modules, md5sum'ing them took only 20 seconds on a shellserver (and this was over nfs)

Trojaned executables

We need a system to check to make sure things like ssh aren't trojaned. This system should routinely md5sum executables and check against a known list. Luckily, dpkg comes with md5sum's of most of its files- but these can't be trusted if they're sitting on the system. We can load the known-good list over nfs and check regularly. md5sum'ing each file in /usr/bin took 108 seconds on a shellserver while loading them over nfs (there are 4818 files taking up 586mb of space!)

System logging

We have all our machines log over the network, so as long as the syslog conf doesn't get messed up, we're ok.

Key logging

We need to check to make sure that keyloggers don't get inserted into default shell startup scripts.

Personal tools