Re: Issues with Quorum Commit - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Issues with Quorum Commit
Date
Msg-id 4CAD9BF4.90706@enterprisedb.com
Whole thread Raw
In response to Re: Issues with Quorum Commit  (Dimitri Fontaine <dimitri@2ndQuadrant.fr>)
Responses Re: Issues with Quorum Commit
List pgsql-hackers
On 07.10.2010 12:52, Dimitri Fontaine wrote:
> Markus Wanner<markus@bluegap.ch>  writes:
>>> I'm just saying that this should be an option, not the only choice.
>>
>> I'm sorry, I just don't see the use case for a mode that drops
>> guarantees when they are most needed. People who don't need those
>> guarantees should definitely go for async replication instead.
>
> We're still talking about freezing the master and all the applications
> when the first standby still has to do a base backup and catch-up to
> where the master currently is, right?

Either that, or you configure your system for asynchronous replication 
first, and flip the switch to synchronous only after the standby has 
caught up. Setting up the first standby happens only once when you 
initially set up the system, or if you're recovering from a catastrophic 
loss of the standby.

>> What does a synchronous replication mode that falls back to async upon
>> failure give you, except for a severe degradation in performance during
>> normal operation? Why not use async right away in such a case?
>
> It's all about the standard case you're building, sync rep, and how to
> manage errors. In most cases I want flexibility. Alert says standby is
> down, you lost your durability requirements, so now I'm building a new
> standby. Does it mean my applications are all off and the master
> refusing to work?

Yes. That's why you want to have at least two standbys if you care about 
availability. Or if durability isn't that important to you after all, 
use asynchronous replication.

Of course, if in the heat of the moment the admin is willing to forge 
ahead without the standby, he can temporarily change the configuration 
in the master. If you want the standby to be rebuilt automatically, you 
can even incorporate that configuration change in the scripts too. The 
important point is that you or your scripts are in control, and you know 
at all times whether you can trust the standby or not. If the master 
makes such decisions automatically, you don't know if the standby is 
trustworthy (ie. guaranteed up-to-date) or not.

>>> so opening a
>>> superuser connection to act on the currently waiting transaction is
>>> still possible (pass/fail, but fail is what at this point? shutdown to
>>> wait some more offline?).
>>
>> Not sure I'm following here. The admin will be busy re-establishing
>> (connections to) standbies, killing transactions on the master doesn't
>> help anything - whether or not the master waits forever.
>
> The idea here would be to be able to manually ACK a transaction that's
> waiting forever, because you know it won't have an answer and you'd
> prefer the application to just continue. But I see that's not a valid
> use case for you.

I don't see anything wrong with having tools for admins to deal with the 
unexpected. I'm not sure overriding individual transactions is very 
useful though, more likely you'll want to take the whole server offline, 
or you want to change the config to allow all transactions to continue 
without the synchronous standby.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Sync Rep at Oct 5
Next
From: Dimitri Fontaine
Date:
Subject: Re: Issues with Quorum Commit