Brad Fitzpatrick (bradfitz) wrote in lj_maintenance,
We have to replace some network cards in the load balancer. They're defective and were recalled a long time ago but the company that owned the BIG/ip before we bought it never replaced them, so we've been crashing once or twice a day for 2 minutes while the machine reboots.

I guess these NICs have a round PCI clock signal so DMA fails often and overwrites kernel memory with zeros, especially during high loads.

Anyway, we're going to replace them with good cards.

We could switch everything to use mod_backhand again during the downtime, but that's enough work that's it's not worth it. If I had tools already written to switch all the settings I'd do it, but I don't.

Anyway, the site will be down in about an hour for maybe 5 minutes. I'd do it at night if I could, but this time is best for the guy helping me.

