Wish List
(→Good starter Projects) |
(→Fix backups) |
||
| Line 16: | Line 16: | ||
===Fix backups=== | ===Fix backups=== | ||
* The drives on persephone are too small, so we run out of space | * The drives on persephone are too small, so we run out of space | ||
| + | * This is urgent as we can't currently run a full backup cycle | ||
| + | * We might want to run bacula "Base" backups to save space and time (see the bacula documentation) | ||
* Also, tape backups need to be run more often than never | * Also, tape backups need to be run more often than never | ||
Revision as of 16:13, 15 September 2011
This page aims to list current improvements we would like to make to the cluster. Ask jdhutchin if you have any questions about them.
Contents |
Good starter Projects
Fix splunk
- The upgraded to 4.x broke it.
- Requires setting up access to charon (useful for other stuff too)
- Allows our log alerting, etc to get set up again.
Squeeze Upgrade
The following computers need to be upgraded to squeeze:
- Hermes (complicated, postfix needs to be rebuilt with a small patch)
- Hera (not too bad, dns CNAMES need to be changed ahead of time)
- Charon, enlil, kabta
Fix backups
- The drives on persephone are too small, so we run out of space
- This is urgent as we can't currently run a full backup cycle
- We might want to run bacula "Base" backups to save space and time (see the bacula documentation)
- Also, tape backups need to be run more often than never
Migrate to postgres 8.4
- Not too bad
- User notification required
- Test mediawiki with it.
Upgrade ugcs_libs
- The package is mostly built, just needs some testing
- Needs to be deployed to get a rid of deprecation warnings
Audit mailing lists
- People have signed up random accounts on them and are spying on our mail
Write auto-scanner for malware
- We need to look at our web serving and auto-detect when we are serving spam off of it.
Mediawiki upgrader
- The upgrade to 1.16 for users may break, esp if they tried to install >1 mediawiki on their account
More website auto-setup
- Write more utilities like setup-mediawiki for django, etc so people can easily set up their own web stuff under UGCS.
Add news / tip of the day
Even better, write your own nice little utilities and then let people know about them.
Upgrade the juniper switch
There is a new version if JOS out that we should upgrade to.
Autofixers
Set up nagios so it more aggressively auto-restarts stuff when it is down.
Maintenance
These are things that we have to do even if there aren't full-time student sysadmins.
- Account requests and password resets SLA: 1day
- How do we know: We get emails
- Fix it when it breaks: Server down
- SLA: 1hr
- How do we know: Email alerts for most things, sms to jdhutchin's phone for really urgent things.
- Owner: jdhutchin
- Fix minor support requests for things that are broken: SLA: 5days
- Sooner would be better
- Answer user questions: SLA: Best-effort
- It would be nice if we could do this but it isn't a top priority
Software
Fix mex (matlab compiler)
Add support for distributed Mathematica on mortals
Small fixes
Small things that need to be fixed across various services/machines:
- Email heartbeat
- Hestia SSL cert
- Change kabta back to ssh keys after Alex/Raymond add theirs
- Find the sysadmins PGP key
- Fix the backup schedules to something sensible
Mail System
Automatic group creation/management
See ugcs groups
Large file hosting
Almost done! See NFS servers Server is running and exporting things correctly. All we need now is disk quotas.
Account creator / password reset
- Re-work as necessary to ensure robustness
- Add exception reporting system (email to sysadmins)
- Write full test suite to ensure quality
- Fix bug where if you mis-enter your krb pw it half-creates the account anyway and is a pain to straighten out.
Network
- Write a system that shows us mac/ip/port number
- Add port mirroring to charon for deseriable traffic
- Improve firewalls
- Enable switch port security
- Fix switch names
Hardware
- Set up hestia to take over for dionysus - in progress
- network card flip: put one of the single gigabit cards into charon, and move its two-port gigabit cards into poseidon and persephone
Web hosting
- Add a failover web server
Global login records
We need to implement some stuff with ldap so we have global login records
Documentation
- We need a printed-out copy of critical wiki stuff
- We need to make more documentation about our services for disaster recovery.
- We need to update all of the core server pages with correct disk setups and currently running services.