December 17th, 2003

Code update

New code is going live on the site. Nothing too exciting:

-- rearranging code from some of our bigger libraries into new, smaller ones
-- memcache/database optimizations

You may notice some errors for a few seconds while we put the code live, but then everything should be the same. If anything's messed up, assume it's widespread and give us 10-15 minutes to fix it, check again, and then report it as a problem to support if it's still happening. Thanks!

Another crash

The same crash as before happened again, taking down 25% of the memcaches and slowing the site. Prior to these crashes, this machine was running fine for years. Very sad.

Anyway, things are speeding up as the LJ code finds new homes for all the lost memory objects, re-fetching things from db and putting them into the 75% of memcaches that are still alive.

We're going to investigate this crash more. It's no longer an isolated incident and we can't trust this machine again until we know what the problem is.