Lisa Phillips (lisa) wrote in lj_maintenance,
Update on Santa cluster outage

The original downtime earlier today with Santa unfortunately led to some further database issues, which has caused a number of other problems for users on that cluster. We are continuing to work on the problem, and further downtime for that cluster will be required to have a final fix. We're going to do each subcluster one at a time, and while we're working on it users on that subcluster may be unable to login, post, or have their journals available for other users to view or post in.

Also, while I'd love to be able to tell you this was due to some catastrophic error and not a preventable problem, unfortunately this next round of downtime is the result of human error (mine). You have my (and our) sincerest apologies and assurances that we really do take this downtime seriously and we're doing everything we can to put measures place to keep downtime like this as rare as possible.

If you're interested in finding out about the work we're doing to help prevent long downtimes please follow news and lj_backend, where we post fairly regularly about new hardware and network infrastructure changes.
