On Sat, 2011-01-01 at 18:13 +0100, Stefan Kaltenbrunner wrote:
> On 01/01/2011 05:55 PM, Simon Riggs wrote:
> >
> > It appears to me there has been substantial confusion over alternatives,
> > because of a misunderstanding about how synchronisation works. Requiring
> > confirmation that standbys are in sync is *not* the same thing as them
> > actually being in sync. Every single proposal made by anybody here on
> > hackers that supports multiple standby servers suffers from the same
> > issue: when the primary crashes you need to work out which standby
> > server is ahead.
>
> aaah that was exactly what I was after - so the problem is that when you
> have a sync standby it will technically always be "in front" of the
> master (because it needs to fsync/apply/whatever before the master).
> In the end the question boils down to what is "the bigger problem" in
> the case of a lost master:
> a) a transaction that was confirmed on the master but might not be on
> any of the surviving sync standbys (or you will never know if it is) -
> this is how I understand the proposal so far
No that cannot happen, the current situation is that we will fsync WAL
on the master, then fsync WAL on the standby, then reply to the master.
The standby is never ahead of the master, at any point.
> b) a transaction that was not yet confirmed on the master but might have
> been applied on the surving standby before the desaster - this is what I
> understand "confirm from all sync standbys" could result in.
Yes, that is described in the docs changes I published.
(a) was discussed, but ruled out, since it would require any crash/immed
shutdown of the master to become a failover, or have some kind of weird
back channel to give the missing data back.
There hasn't been any difference of opinion in this area, that I am
aware of. All proposals have offered (b).
-- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services