Wish List
| Line 1: | Line 1: | ||
| − | This page aims to list current improvements we would like to make to the cluster. | + | This page aims to list current improvements we would like to make to the cluster. Ask jdhutchin if you have any questions about them. |
| + | |||
| + | ==Good starter Projects== | ||
| + | ===Squeeze Upgrade=== | ||
| + | The following computers need to be upgraded to squeeze: | ||
| + | * Hermes (complicated, postfix needs to be rebuilt with a small patch) | ||
| + | * Hera (not too bad, dns CNAMES need to be changed ahead of time) | ||
| + | * Charon, enlil, kabta | ||
| + | |||
| + | ===Fix splunk=== | ||
| + | * The upgraded to 4.x broke it. | ||
| + | |||
| + | ===Fix backups=== | ||
| + | * The drives on persephone are too small, so we run out of space | ||
| + | * Also, tape backups need to be run more often than never | ||
| + | |||
| + | ===Migrate to postgres 8.4=== | ||
| + | * Not too bad | ||
| + | |||
| + | ===Upgrade ugcs_libs=== | ||
| + | * The package is mostly built, just needs some testing | ||
| + | * Needs to be deployed to get a rid of deprecation warnings | ||
| + | |||
| + | ===Audit mailing lists=== | ||
| + | * People have signed up random accounts on them and are spying on our mail | ||
| + | |||
| + | ===Write auto-scanner for malware=== | ||
| + | * We need to look at our web serving and auto-detect when we are serving spam off of it. | ||
| + | |||
| + | ===Mediawiki upgrader=== | ||
| + | * The upgrade to 1.16 for users may break, esp if they tried to install >1 mediawiki on their account | ||
| + | |||
| + | ===Upgrade the juniper switch=== | ||
| + | There is a new version if JOS out that we should upgrade to. | ||
| + | |||
| + | ===Autofixers=== | ||
| + | Set up nagios so it more aggressively auto-restarts stuff when it is down. | ||
==Maintenance== | ==Maintenance== | ||
Revision as of 18:10, 10 September 2011
This page aims to list current improvements we would like to make to the cluster. Ask jdhutchin if you have any questions about them.
Contents |
Good starter Projects
Squeeze Upgrade
The following computers need to be upgraded to squeeze:
- Hermes (complicated, postfix needs to be rebuilt with a small patch)
- Hera (not too bad, dns CNAMES need to be changed ahead of time)
- Charon, enlil, kabta
Fix splunk
- The upgraded to 4.x broke it.
Fix backups
- The drives on persephone are too small, so we run out of space
- Also, tape backups need to be run more often than never
Migrate to postgres 8.4
- Not too bad
Upgrade ugcs_libs
- The package is mostly built, just needs some testing
- Needs to be deployed to get a rid of deprecation warnings
Audit mailing lists
- People have signed up random accounts on them and are spying on our mail
Write auto-scanner for malware
- We need to look at our web serving and auto-detect when we are serving spam off of it.
Mediawiki upgrader
- The upgrade to 1.16 for users may break, esp if they tried to install >1 mediawiki on their account
Upgrade the juniper switch
There is a new version if JOS out that we should upgrade to.
Autofixers
Set up nagios so it more aggressively auto-restarts stuff when it is down.
Maintenance
These are things that we have to do even if there aren't full-time student sysadmins.
- Account requests and password resets SLA: 1day
- How do we know: We get emails
- Fix it when it breaks: Server down
- SLA: 1hr
- How do we know: Email alerts for most things, sms to jdhutchin's phone for really urgent things.
- Owner: jdhutchin
- Fix minor support requests for things that are broken: SLA: 5days
- Sooner would be better
- Answer user questions: SLA: Best-effort
- It would be nice if we could do this but it isn't a top priority
Software
Fix mex (matlab compiler)
Add support for distributed Mathematica on mortals
Small fixes
Small things that need to be fixed across various services/machines:
- Email heartbeat
- Hestia SSL cert
- Change kabta back to ssh keys after Alex/Raymond add theirs
- Find the sysadmins PGP key
- Fix the backup schedules to something sensible
Mail System
Automatic group creation/management
See ugcs groups
Large file hosting
Almost done! See NFS servers Server is running and exporting things correctly. All we need now is disk quotas.
Account creator / password reset
- Re-work as necessary to ensure robustness
- Add exception reporting system (email to sysadmins)
- Write full test suite to ensure quality
Network
- Write a system that shows us mac/ip/port number
- Add port mirroring to charon for deseriable traffic
- Improve firewalls
- Enable switch port security
- Fix switch names
Hardware
- Set up hestia to take over for dionysus - in progress
- network card flip: put one of the single gigabit cards into charon, and move its two-port gigabit cards into poseidon and persephone
Web hosting
- Add a failover web server
Global login records
We need to implement some stuff with ldap so we have global login records
Documentation
- We need a printed-out copy of critical wiki stuff
- We need to make more documentation about our services for disaster recovery.
- We need to update all of the core server pages with correct disk setups and currently running services.