All,
In my effort to make the discussion around the design decisions of synch
rep less opaque, I'm starting a separate thread about what has developed
to be one of the more contentious issues.
I'm going to champion timeouts because I plan to use them. In fact, I
plan to deploy synch rep with a timeout if it's available within 2 weeks
of 9.1 being released. Without a timeout (i.e. "wait forever" is the
only mode), that project will probably never use synch rep.
Let me give you my use-case so that you can understand why I want a timeout.
Client is a telecommunications service provider. They have a primary
server and a failover server for data updates. They also have two async
slaves on older machines for reporting purposes. The failover
currently does NOT accept any queries in order to keep it as current as
possible.
They would like the failover to be synchronous so that they can
guarentee no data loss in the event of a master failure. However, zero
data loss is less important to them than uptime ... they have a five9's
SLA with their clients, and the hardware on the master is very good.
So, if something happens to the standby, and it cannot return an ack in
30 seconds, they would like it to degrade to asynch mode. At that
point, they would also like to trigger a nagios alert which will wake up
the sysadmin with flashing red lights. Once he has resolved the
problem, he would like to promote the now-asynch standby back to synch
standby.
Yes, this means that, in the event of a standby failure, they have a
window where any failure on the master will mean data loss. The user
regards this risk as acceptable, given that both the master and the
failover are located in the same data center in any case, so there is
always a risk of a sufficient disaster wiping out all data back to the
daily backup.
-- -- Josh Berkus PostgreSQL Experts Inc.
http://www.pgexperts.com