We ran some diagnostic software on the misbehaving server (Santa) and it's reporting some motherboard/processor errors which may be false errors. To verify the correctness of the diagnostic suite, I want to take down Chef for about 10 minutes to run that part of the test and see if it passes or fails. If it fails also, then I hope the problem is the memory in Santa that's bad (which I'll find out also when I go down to the NOC... those tests should be done).
Basically, we want to make Santa the new Cluster 3 master, but I can't do that until I know what's been causing its few random crashes.
Santa's twin, Chef, is the Cluster 2 master. While it's down, you'll still be able to read cluster 2 journals, but not comment in them or post new entries (if you're on that cluster). Right now the userinfo page doesn't say what cluster you're on, so if you're curious, go to this temporary page.
I'll let you all know what I find out. In the meantime, help out my wrists and reply to anybody that's confused here and point them at /support/. Thanks!