Brad Fitzpatrick (bradfitz) wrote in lj_maintenance,
Brad Fitzpatrick


Had a few minutes of downtime earlier while I was installing another new webserver... apparently Kenny's ethernet cable wasn't plugged in tight (wasn't clicked in) so it fell out when I was moving some things around. Took awhile to figure out what it was since it was still hanging in.

Anyone of the 4 web servers can die at any moment and LJ won't miss a beat, but Kenny and Cartman aren't fully redudant.... Kenny is the load balancer for all the other machines so if it dies, everything dies. That sounds bad, but we can make any one of the other machines the main load balancer within a few minutes, so it's not like we have one central point of failure.

Well, the database is kinda a central point of failure now, but we're ordering new hard drives for Kenny so it can run an up-to-date replicated database and become the master database should Cartman ever die.

We're getting enough hardware lately that we should be able to build a really reliable setup.

