Jan25 Kerberos Incident

From UGCS
Revision as of 04:17, 26 January 2010 by Jdhutchin@ugcs.caltech.edu (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

On January 25, 2010, many of the shellservers were unable to complete any kerberos operations. The cause was an upgraded kerberos library from testing which did not work well with our existing kerberos libraries. The problem was fixed about an hour after it was first noticed by both users and UGCS admins.

Contents

Symptoms

Kerberos operations failed, as did getting AFS tokens.

tobin@melpomene:~$ kinit
kinit: relocation error: /usr/lib/libdes425.so.3: symbol des_IP_table,
version k5crypto_3_MIT not defined in file libk5crypto.so.3 with link
time reference

was a common error message.

Users were generally able to log in but could not get AFS tokens, and therefore couldn't use their home directories.

Cause

The cause of the problem was that Debian testing upgraded its Kerberos libraries to krb1.8-alpha, when the rest of the cluster including userspace programs were using Kerberos 1.6.

Solution

We downgraded the appropriate packages (libkrb5support, libk5crypto3, libkrb5-3, libgssapi-krb5) to 1.7+dfsg4 using deb archives in /var/cache/apt/archives.

Prevention

To prevent this problem from happening again, we added pin lines in /etc/apt/preferences to pin those packages to 1.7+dfsg4. This was verified to prevent newer versions from being installed with an aptitude safe-upgrade install. There weren't any obvious log messages that we could alert on.

Personal tools