Brad Fitzpatrick (bradfitz) wrote in lj_maintenance,
Brad Fitzpatrick
bradfitz
lj_maintenance

Summary of last two problems

I'm about to post about all the good things going on shortly here in news, but first I'll write here about the two annoying problems everybody noticed today, and explain why they happened.

1) $user must be declared while commenting
I was sloppily copy/pasting and didn't test. My bad. It was 3am in a hotel room. (I could've tested.... but a bug in libdbd-mysql-perl for Debian on PowerPC made it difficult, but I fixed that)

2) ljlogs.user login problem
Web nodes are now diskless, which means they can't log locally, so they log to another db. We have our own internal db load balancing system. I added a new db role type to it: "logs". But since it doesn't replicate from any other server (it's not a slave), I marked its master serverid as 0. But... it'd been so long since I'd made that system that I forgot that's the heuristic used to find the master server, which is a special case in a few of ways. So suddenly everywhere was randomly picking one of the two "master" servers, one of which was wrong.

So, I thought I fixed it by making its masterid 8 (the serverid of the real master), and incrementing the db weight serial number. But... master db handles aren't affected by a bump in serial number. So it stayed cached.

Now, normally that'd be okay, and the problem would have still worked itself out, but the protocol handler doesn't rollover and die periodically like everything else did.

Further, it only was a problem on the FastCGI nodes. All the new hosts (and an increasing number of the old ones) are running mod_perl, and we use a MaxRequestsPerChild of 1000, so they still roll-over in time.

If I would've finished making all the machines netboot (which basically just involves copy/pasting their MAC addresses into something and running a command), then this would've been good.

So basically, this second problem was like 4 layer accident.

...

Now, wait for the news post with the good news. :-)
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

  • 46 comments
Previous
← Ctrl ← Alt
Next
Ctrl → Alt →
Previous
← Ctrl ← Alt
Next
Ctrl → Alt →