http://blog.reddit.com/2011/03/why-reddit-was-down-for-6-of-last-24.html
Reddit was down for a while yesterday and they had 2 failures - one
was EBS (they use Amazon EC2 and EBS) failing.
Then they had another failure where somehow their slave PG databases
got ahead of the master. They are using Londiste for replication and
the only thing I can think of is EBS must have been lying about fsync
on the master, so some transactions were lost there.
I don't see them posting on the lists much, maybe we should reach out
to them as Reddit is a rather popular site nowadays and it could be
some good exposure for PG. (They are also using Cassandra)
--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/http://www.dellsmartexitin.com/