Brad Fitzpatrick (bradfitz) wrote in lj_maintenance,
Problem Summary

Problem 1
With HSRP, you're not supposed to leave your switch in auto-sensing mode on the ports where the two gateways are plugged into. We could normally telnet into the switch and fix it, but the recent reset of the switch (when we installed) destroyed those settings, so we only have serial port access. Sub problem: I'm in Portland and Evan doesn't have a car (or a cellphone... so I can never get ahold of him). I think Evan will be going today, then our connection to the net will be fast again. Right now the switch is auto-detecting the port speed wrong, so we're not actually getting a 100 Mbps full-duplex connection like we're supposed to be.

Problem 2
There is 1 master database and 4 slave databases. The master records a log of everything it does in a file. When that log file gets to be a gig or so, it increments its count and moves on to the next file. Slave databases read those files in order. When all slaves are done with a file, we have a script to detect that and purge those old logs so that partition on the master database doesn't fill up.

Sub problems:
-- there was a bug in the script (my fault) so it didn't purge old logs in a certain scenario which shouldn't happen but did and I should've anticipated it.
-- dormando hadn't got netsaint running on the new bitch box now that kenny (our old bitch box) is now a slave db server. thus, we didn't even know the binlog partition on cartman was almost full.

The solution to problem 2 was that we had to stop everything, resync everything (slow as hell), and then start everything back up. What a pain.

Problem 1 will be fixed as soon as Evan gets down to Internap. I'm going to get Sherm an Internap badge too so he can do these sorts of things in the future (and he has a car too).
