Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher
Date
Msg-id CAB7nPqQbH2fxiE8PT21DXAXTi7MwWdKufAUgi25_YZSTTge_QA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Fri, Apr 14, 2017 at 3:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Apr 13, 2017 at 12:36 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Apr 13, 2017 at 12:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Thu, Apr 13, 2017 at 5:25 AM, Peter Eisentraut
>>> <peter.eisentraut@2ndquadrant.com> wrote:
>>>> On 4/12/17 09:55, Fujii Masao wrote:
>>>>> To fix this issue, we should terminate walsender for logical replication
>>>>> before shutdown checkpoint starts. Of course walsender for physical
>>>>> replication still needs to keep running until shutdown checkpoint ends,
>>>>> though.
>>>>
>>>> Can we turn it into a kind of read-only or no-new-commands mode instead,
>>>> so it can keep streaming but not accept any new actions?
>>>
>>> So we make walsenders switch to that mode and wait for all the already-ongoing
>>> their "write" commands to finish, and then we start a shutdown checkpoint?
>>> This is an idea, but seems a bit complicated. ISTM that it's simpler to
>>> terminate only walsenders for logical rep before shutdown checkpoint.
>>
>> Perhaps my memory is failing me here... But in clean shutdowns we do
>> shut down WAL senders after the checkpoint has completed so as we are
>> sure that they have flushed the LSN corresponding to the checkpoint
>> ending, right?
>
> Yes.
>
>> Why introducing an inconsistency for logical workers?
>> It seems to me that logical workers should fail under the same rules.
>
> Could you tell me why? You think that even walsender for logical rep
> needs to stream the shutdown checkpoint WAL record to the subscriber?
> I was not thinking that's true.

For physical replication, the property to wait that standbys have
flushed the LSN of the shutdown checkpoint can be important for
switchovers. For example, with a primary and a standby, it is possible
to stop cleanly the master, promote the standby, and then connect back
to the cluster the old primary as a standby to the now-new primary
with the guarantee that both are in a consistent state. It seems to me
that having similar guarantees for logical replication is important.

Now, I have not reviewed the code of logirep in details at the level
of Peter, Petr or yourself...
-- 
Michael



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Small patch for pg_basebackup argument parsing
Next
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands:\quit_if, \quit_unless)