Wish List
From UGCS
(Difference between revisions)
(→Software) |
|||
| Line 1: | Line 1: | ||
This page aims to list current improvements we would like to make to the cluster. | This page aims to list current improvements we would like to make to the cluster. | ||
| + | |||
| + | ==Maintenance== | ||
| + | These are things that we have to do even if there aren't full-time student sysadmins. | ||
| + | |||
| + | * Account requests and password resets SLA: 1day | ||
| + | ** How do we know: We get emails | ||
| + | |||
| + | * Fix it when it breaks: Server down | ||
| + | ** SLA: 1hr | ||
| + | ** How do we know: Email alerts for most things, sms to jdhutchin's phone for really urgent things. | ||
| + | ** Owner: jdhutchin | ||
| + | |||
| + | * Fix minor support requests for things that are broken: SLA: 5days | ||
| + | ** Sooner would be better | ||
| + | |||
| + | * Answer user questions: SLA: Best-effort | ||
| + | ** It would be nice if we could do this but it isn't a top priority | ||
Revision as of 01:28, 8 October 2010
This page aims to list current improvements we would like to make to the cluster.
Contents |
Maintenance
These are things that we have to do even if there aren't full-time student sysadmins.
- Account requests and password resets SLA: 1day
- How do we know: We get emails
- Fix it when it breaks: Server down
- SLA: 1hr
- How do we know: Email alerts for most things, sms to jdhutchin's phone for really urgent things.
- Owner: jdhutchin
- Fix minor support requests for things that are broken: SLA: 5days
- Sooner would be better
- Answer user questions: SLA: Best-effort
- It would be nice if we could do this but it isn't a top priority
Software
Fix mex (matlab compiler)
Add support for distributed Mathematica on mortals
Small fixes
Small things that need to be fixed across various services/machines:
- Email heartbeat
- Hestia SSL cert
- Change kabta back to ssh keys after Alex/Raymond add theirs
- Find the sysadmins PGP key
- Fix the backup schedules to something sensible
Mail System
Automatic group creation/management
See ugcs groups
Large file hosting
Almost done! See NFS servers Server is running and exporting things correctly. All we need now is disk quotas.
Account creator / password reset
- Re-work as necessary to ensure robustness
- Add exception reporting system (email to sysadmins)
- Write full test suite to ensure quality
Network
- Write a system that shows us mac/ip/port number
- Add port mirroring to charon for deseriable traffic
- Improve firewalls
- Enable switch port security
- Fix switch names
Hardware
- Set up hestia to take over for dionysus - in progress
- network card flip: put one of the single gigabit cards into charon, and move its two-port gigabit cards into poseidon and persephone
Web hosting
- Add a failover web server
Global login records
We need to implement some stuff with ldap so we have global login records
Documentation
- We need a printed-out copy of critical wiki stuff
- We need to make more documentation about our services for disaster recovery.
- We need to update all of the core server pages with correct disk setups and currently running services.