The issues were:
1) our load balancer (running a new version, as of last week) ran into a problem. vendor was notified and things were resolved. normally, traffic would fail-over to the redundant load balancer, but our redundant one is disconnected at this point because we kept the old software on that, in case we ran into problems with the new software and wanted to revert back. the problem was that between version, a default value changed (fin_wait_timeout on outbound NAT) and we were filling up tables. changing the value, as well as adding more source NAT addresses, more than fixes the problem.
2) one of our slave databases seems to have corrupted its index on one table, causing certain simple queries to stall. a quick index repair fixed it. (if anybody can find my old post where I describe in simple terms indexes vs. data, I'd appreciate it!)
Issue #1 won't happen again.
Issue #2 happens from time to time, but we were notified (in a roundabout way) so it was easy to fix. We'll change our monitoring to make the notification more explicit.