1) We rebooted mackey (one of chef's two slaves), and it should be operational in an hour or two, at which point things should stop sucking. Until mackey dies again, of course. :-/
2) Chef is maxing out on number of connections. Poking around, it appears that chef has lower max-connections than the other two masters, even though it's hosting the most crowded cluster. Unfortunately we can't change max-connections without a restart, and we can't restart without totally killing performance for a few hours, but one of these nights we'll do it.
3) New servers in two weeks, or maybe even within a week if the Fates smile upon us. Or the server vendor and UPS trucks smile upon us. Whichever.