Re: Issues with Quorum Commit - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Issues with Quorum Commit
Date
Msg-id 1286571909.2304.1026.camel@ebony
Whole thread Raw
In response to Re: Issues with Quorum Commit  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-hackers
On Fri, 2010-10-08 at 16:34 -0400, Greg Smith wrote:
> Tom Lane wrote:
> > How are you going to "mark the standby as degraded"?  The
> > standby can't keep that information, because it's not even connected
> > when the master makes the decision.
> 
>  From a high level, I'm assuming only that the master has a list in 
> memory of the standby system(s) it believes are up to date, and that it 
> is supposed to commit to synchronously.  When I say mark as degraded, I 
> mean that the master merely closes whatever communications channel it 
> had open with that system and removes the standby from that list.

My current coding works with two sets of parameters: 

The "master marks standby as degraded" is handled by the tcp keepalives.
When it notices no response, it kicks out the standby. We already had
this, so I never mentioned it before as being part of the solution.

The second part is the synchronous_replication_timeout which is a user
settable parameter defining how long the app is prepared to wait, which
could be more or less time than the keepalives.

> If that standby now reconnects again, I don't see how resolving what 
> happens at that point is any different from when a standby is first 
> started after both systems were turned off.  If the standby is current 
> with the data available on the master when it has an initial 
> conversation, great; it's now available for synchronous commit too 
> then.  If it's not, it goes into a catchup mode first instead.  When the 
> master sees you're back to current again, if you're on the list of sync 
> servers too you go back onto the list of active sync systems.
> 
> There's shouldn't be any state information to save here.  If the master 
> and standby can't figure out if they are in or out of sync with one 
> another based on the conversation they have when they first connect to 
> one another, that suggests to me there needs to be improvements made in 
> the communications protocol they use to exchange messages. 

Agreed.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Issues with Quorum Commit
Next
From: Robert Haas
Date:
Subject: Re: GIN vs. Partial Indexes