April 9th, 2010

  • dwell

Website Performance - UPDATE

Hello everyone, I've got the latest update on what we're doing to help LJ's performance.

Speed up the way our user information is pulled out of the database.
The code release we were originally planning for yesterday has been pushed back to at least Monday. dnewhall asked for, and received the SQL patches separate from the main release and pushed out the patches around 60 minutes ago (12:40PM PDT). This has helped our load times since we've applied the fix and we're pleased that we've at least gotten 1 thing off our checklist.

Out of the 10 servers where we store our data, the first one is really struggling to keep up with the demands on it, we plan to move some of the users on that first server around so that the requests for information are spread around better.
We are proceeding with moving ONTD to the new cluster tonight starting on April 10, 04:00. We're expecting this to take 12 hours, possibly more. Unless you are in the ONTD community, your individual journal will NOT be affected.
ONTD is not the direct cause of their/our problems. Some of you expressed frustration as to why the Ops team didn't see this coming and/or move ONTD to their own server long before now. I understand where you're coming from. This move is not going to be a magic solution. It will help, just like the code changes helped earlier today, but there is still a lot of work we have to do for the entire site -- code, architectural and hardware. We drastically accelerated this move because of the severe and sudden performance usage, but we don't think it will completely eliminate ALL of our recent problems, simply because our problems are systemic, not just specific to 1 community, 1 server or 1 type of hardware.

Finally, for at least 5-7 days after we post an lj-maintenance, I do personally scan and read every comment you leave. We may not be responding individually but we do try to change our current or future behavior when it's in our power and when it makes sense. For instance, 1 person in the last post suggested creating an entirely new post for the update so that everyone would get the notification. It made sense over the reasons I had to just update the previous post. So here we are.

One thing that is not in our (and by our/we/me, I mean the "Operations team") control is the subject of reimbursement or compensation for degraded performance. You can continue to leave your thoughts and suggestions on this, or any other topic. I'd just like to ask for your understanding if non-technical questions or comments are not always answered or addressed in *this* community. Thank you.