Brad Fitzpatrick (bradfitz) wrote in lj_maintenance,
Brad Fitzpatrick
bradfitz
lj_maintenance

Cluster 2 going read-only for a few...

We have a new server that's not yet in production that's been giving us a few problems since we got it. Its identical twin we bought at the same time is working fine and is in production (Chef; cluster 2 master).

We ran some diagnostic software on the misbehaving server (Santa) and it's reporting some motherboard/processor errors which may be false errors. To verify the correctness of the diagnostic suite, I want to take down Chef for about 10 minutes to run that part of the test and see if it passes or fails. If it fails also, then I hope the problem is the memory in Santa that's bad (which I'll find out also when I go down to the NOC... those tests should be done).

Basically, we want to make Santa the new Cluster 3 master, but I can't do that until I know what's been causing its few random crashes.

Santa's twin, Chef, is the Cluster 2 master. While it's down, you'll still be able to read cluster 2 journals, but not comment in them or post new entries (if you're on that cluster). Right now the userinfo page doesn't say what cluster you're on, so if you're curious, go to this temporary page.

I'll let you all know what I find out. In the meantime, help out my wrists and reply to anybody that's confused here and point them at /support/. Thanks!
Subscribe

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

  • 57 comments