On 10/06/2010 04:31 AM, Simon Riggs wrote:
> That situation would require two things
> * First, you have set up async replication and you're not monitoring it
> properly. Shame on you.
The way I read it, Jeff is complaining about the timeout you propose
that effectively turns sync into async replication in case of a failure.
With a master that waits forever, the standby that's newly required for
quorum certainly still needs its time to catch up. But it wouldn't live
in danger of being "optimized away" for availability in case it cannot
catch up within the given timeout. It's a tradeoff between availability
and durability.
> So it can occur in both cases, though it now looks to me that its less
> important an issue in either case. So I think this doesn't rate the term
> dangerous to describe it any longer.
The proposed timeout certainly still sounds dangerous to me. I'd rather
recommend setting it to an incredibly huge value to minimize its dangers
and get sync replication when that is what has been asked for. Use async
replication for increased availability.
Or do you envision any use case that requires a quorum of X standbies
for normal operation but is just fine with only none to (X-1) standbies
in case of failures? IMO that's when sync replication is most needed and
when it absolutely should hold to its promises - even if it means to
stop the system.
There's no point in continuing operation if you cannot guarantee the
minimum requirements for durability. If you happen to want such a thing,
you should better rethink your minimum requirement (as performance for
normal operations might benefit from a lower minimum as well).
Regards
Markus Wanner