Re: Sync Rep v17 - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Sync Rep v17
Date
Msg-id AANLkTinuyuG9ku36f-2pWBjFFrfybGTrvRi0G8vr=BZy@mail.gmail.com
Whole thread Raw
In response to Re: Sync Rep v17  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Tue, Mar 1, 2011 at 5:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> >> What if fast shutdown is requested while RecordTransactionCommit
>> >> is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
>> >> until replication has been successfully done (i.e., until at least one
>> >> synchronous standby has connected to the master especially if
>> >> allow_standalone_primary is disabled). Is this OK?
>> >
>> > A "behaviour" - important, though needs further discussion.
>>
>> One of the scenarios which I'm concerned is:
>>
>> 1. The primary is running with allow_standalone_primary = on.
>> 2. While some backends are waiting for replication, the user requests
>> fast shutdown.
>> 3. Since the timeout expires, those backends stop waiting and return the success
>>     indication to the client (but not replicated to the standby).
>> 4. Since there is no backend waiting for replication, fast shutdown completes.
>> 5. The clusterware like pacemaker detects the death of the primary and
>> triggers the
>>     failover.
>> 6. New primary doesn't have some transactions committed to the client, i.e.,
>>     transaction lost happens!!
>>
>> To avoid such a transaction lost, we should prevent the primary from
>> returning the
>> success indication to the client while fast shutdown is being executed, even if
>> allow_standalone_primary is enabled, I think. Thought?
>
> Walsenders don't shutdown until after they have sent the shutdown
> checkpoint.
>
> We could make them wait for the reply from the standby at that point.

Right. But what if the replication connection is closed before #3?
In this case, obviously walsender cannot send WAL up to the
shutdown checkpoint.

> I'll think some more, not convinced yet.

Thanks! I'll think more, too, to make sync rep more reliable!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Sync Rep v17
Next
From: Yeb Havinga
Date:
Subject: Re: Sync Rep v17