Re: Synchronization levels in SR - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Synchronization levels in SR
Date
Msg-id 4BFDAC34.2060202@enterprisedb.com
Whole thread Raw
In response to Re: Synchronization levels in SR  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Synchronization levels in SR
List pgsql-hackers
On 27/05/10 01:23, Simon Riggs wrote:
> On Thu, 2010-05-27 at 00:21 +0300, Heikki Linnakangas wrote:
>> On 26/05/10 23:31, Dimitri Fontaine wrote:
>>>    d. choice of commit or rollback at timeout
>>
>> Rollback is not an option. There is no going back after the commit
>> record has been flushed to disk or sent to a standby.
>
> There's definitely no going back after the xid has been removed from
> procarray because other transactions will then depend upon the final
> state. Currently we PANIC if we abort after we've marked clog, though
> that happens after XLogFlush(), which is where we're planning to wait
> for synch rep. If we abort after having written a commit record to disk
> we can still successfully generate an abort record as well. (Luckily, I
> note HS does actually cope with that. Phew!)
>
> So actually, an abort is a reasonable possibility, though I know it
> doesn't sound like it could be at first thought.

Hmm, that's an interesting thought. Interesting, as in crazy ;-).

I don't understand how HS could handle that. As soon as it sees the 
commit record, the transaction becomes visible to readers.

>> The choice is to either commit anyway after the timeout, or wait forever.
>
> Hmm, wait forever. What happens if we try to shutdown fast while there
> is a transaction that is waiting forever? Is that then a commit, even
> though it never made it to the standby? How would we know it was safe to
> switchover or not? Hmm.

Refuse to shut down until the standby acknowledges the commit. That's 
the only way to be sure..

In practice, hard synchronous "don't return ever until the commit hits 
the standby" behavior is rarely what admins actually want, because it's 
disastrous from an availability point of view. More likely, admins want 
"wait for ack from standby, unless it's not responding, in which case to 
hell with redundancy and just act like a single server". It makes sense 
if you just want to make sure that the standby doesn't return stale 
results when it's working properly, and you're not worried about 
durability but I'm not sure it's very sound otherwise.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: alvherre
Date:
Subject: Re: functional call named notation clashes with SQL feature
Next
From: Josh Berkus
Date:
Subject: Re: Keepalive for max_standby_delay