Wish List
This page aims to list current improvements we would like to make to the cluster. Ask jdhutchin if you have any questions about them.
Contents |
Good starter Projects
Squeeze Upgrade
The following computers need to be upgraded to squeeze:
- Hermes (complicated, postfix needs to be rebuilt with a small patch)
- Hera (not too bad, dns CNAMES need to be changed ahead of time)
- Charon, enlil, kabta
Fix splunk
- The upgraded to 4.x broke it.
Fix backups
- The drives on persephone are too small, so we run out of space
- Also, tape backups need to be run more often than never
Migrate to postgres 8.4
- Not too bad
Upgrade ugcs_libs
- The package is mostly built, just needs some testing
- Needs to be deployed to get a rid of deprecation warnings
Audit mailing lists
- People have signed up random accounts on them and are spying on our mail
Write auto-scanner for malware
- We need to look at our web serving and auto-detect when we are serving spam off of it.
Mediawiki upgrader
- The upgrade to 1.16 for users may break, esp if they tried to install >1 mediawiki on their account
Add news / tip of the day
Even better, write your own nice little utilities and then let people know about them.
Upgrade the juniper switch
There is a new version if JOS out that we should upgrade to.
Autofixers
Set up nagios so it more aggressively auto-restarts stuff when it is down.
Maintenance
These are things that we have to do even if there aren't full-time student sysadmins.
- Account requests and password resets SLA: 1day
- How do we know: We get emails
- Fix it when it breaks: Server down
- SLA: 1hr
- How do we know: Email alerts for most things, sms to jdhutchin's phone for really urgent things.
- Owner: jdhutchin
- Fix minor support requests for things that are broken: SLA: 5days
- Sooner would be better
- Answer user questions: SLA: Best-effort
- It would be nice if we could do this but it isn't a top priority
Software
Fix mex (matlab compiler)
Add support for distributed Mathematica on mortals
Small fixes
Small things that need to be fixed across various services/machines:
- Email heartbeat
- Hestia SSL cert
- Change kabta back to ssh keys after Alex/Raymond add theirs
- Find the sysadmins PGP key
- Fix the backup schedules to something sensible
Mail System
Automatic group creation/management
See ugcs groups
Large file hosting
Almost done! See NFS servers Server is running and exporting things correctly. All we need now is disk quotas.
Account creator / password reset
- Re-work as necessary to ensure robustness
- Add exception reporting system (email to sysadmins)
- Write full test suite to ensure quality
Network
- Write a system that shows us mac/ip/port number
- Add port mirroring to charon for deseriable traffic
- Improve firewalls
- Enable switch port security
- Fix switch names
Hardware
- Set up hestia to take over for dionysus - in progress
- network card flip: put one of the single gigabit cards into charon, and move its two-port gigabit cards into poseidon and persephone
Web hosting
- Add a failover web server
Global login records
We need to implement some stuff with ldap so we have global login records
Documentation
- We need a printed-out copy of critical wiki stuff
- We need to make more documentation about our services for disaster recovery.
- We need to update all of the core server pages with correct disk setups and currently running services.