dormando (dormando) wrote in lj_maintenance,
dormando
dormando
lj_maintenance

long-winded details of current progress.

--------
English

Less errors on the site. Friends and journal pages still don't work for a lot for free users. We have several leads and ideas toward fixing the problems within a matter of days.

--------
Geek

-- The other day I split the usage of the free DB slaves between hits that go to /users/ and the rest of the site. So each DB server is handling less different kinds of hits, has less blocking its replication updates, and has a higher chance of caching return data. It turns out that we can handle all BML and client accesses from one of our slave DB servers. However, all of the rest still cannot keep up with /users/ hits. This means that the DB server handling everything but /users/ is fast and is not behind in replication anymore. This fixes a myriad of annoying problems on the site related to DB replication. As time goes on, we will make the splitting more granular. This also means that people can usually do anything on the site aside from visiting /users/, which still sucks :)
Also, the DBs handling /users/ requests are not getting as behind in replication anymore. No longer are things hours behind, but at most ~20 minutes.

-- We upgraded our support contract with MySQL to include InnoDB support. Then I wrote a ~five page paper on why it makes me sad inside, and begging for some insider methods on how to make it faster. Included were big charts of running query information and the like. This morning, we got good replies from the devs stating that our worst problems were known and will be fixed in a new version of MySQL due out in a few days! They gave us suggestions for the other ones, and requested further data for further optimization.

-- We tried a long-shot experiment last night, and it failed miserably. Brad had came up with the idea of using a middleware cache, and sticking files full of sorted data into a squid cache. The friends view process would fetch the data from the squid cache, which would fetch the full data live if not in the cache, and then use that to build the friends view. It didn't go as planned, and was just as slow. We might give it another go after some tuning, but I'm not so sure on that.

-- We're going to totally redo the way the log information is stored. Brad is going to re-think the data layout and all of the indexes. Then we will have (at very least) indexes on the security info on posts. Then we will change the reading to a two-pass system, once to grab public posts, once to grab hidden posts. Then the perl processes will filter the return of mass data and create the view. Right now the process needs to keep scanning back further in time until it gathers enough itemids visible to the remote user to build the page. Once everything is indexed, hopefully this query will become completely negligable.

-- Various other things. What I listed is just what has happened in the last three days, and is not a full recount. Tons of brainstorming. We fully intend to get this site back on its feet and keep it that way as soon as possible. We've had enough of the site being unusable.
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

  • 47 comments
Previous
← Ctrl ← Alt
Next
Ctrl → Alt →
Previous
← Ctrl ← Alt
Next
Ctrl → Alt →