Most people seem to think posts are getting lost... they're not. They're making it to the main database fine. From there they normally distribute out to both the slave database servers. However, one of the slave servers stopped replicating so the other one is doing all the work and is doing so many reads that it doesn't have time to get its writes in.
We've mailed MySQL support about the slave that stopped and expect an answer shortly (we have a support contract with them).
This couldn't have happened at a crappier time ... we've been waiting for the place we order servers from to open on Tuesday. (They were closed Friday and will be closed tomorrow too while they move to a new building). Anyway, come Tuesday we'll be getting two more database servers and another couple web servers. We'll also be getting a Cisco 24-port switch since we're outgrowing our existing ones (both in number of ports and doing more traffic than their backplane can handle ...)
I'm reaching to come up with an intermin hack solution while we wait to hear from MySQL but it really just comes down to not having the hardware right now to keep up. Suck.
Fun fun as always, right? Anyway, no posts are getting lost. It's just lagged right now.
Update: Well, a hack solution does exist.... I changed a line in ljconfig.pl and diverted 40% of the slave traffic to the master. So now the master's busy and slow, but the slave is almost caught up. I'll back it down to 30% once the slave is caught up.