Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher
Date
Msg-id CAB7nPqTHQJ7zODLCLmhg9oz03qow6eh27NhEqzT+CT64eku5Ug@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
List pgsql-hackers
On Sun, Apr 23, 2017 at 10:15 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> On 21/04/17 06:11, Michael Paquier wrote:
>> On Fri, Apr 21, 2017 at 12:29 AM, Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>> Hmm. I have been actually looking at this solution and I am having
>> doubts regarding its robustness. In short this would need to be
>> roughly a two-step process:
>> - In PostmasterStateMachine(), SIGUSR2 is sent to the checkpoint to
>> make it call ShutdownXLOG(). Prior doing that, a first signal should
>> be sent to all the WAL senders with
>> SignalSomeChildren(BACKEND_TYPE_WALSND). SIGUSR2 or SIGINT could be
>> used.
>> - At reception of this signal, all WAL senders switch to a stopping
>> state, refusing commands that can generate WAL.
>> - Checkpointer looks at the state of all WAL senders, looping with a
>> sleep call of a couple of ms, refusing to launch the shutdown
>> checkpoint as long as all WAL senders have not switched to the
>> stopping state.
>> - In reaper(), once checkpointer is confirmed as stopped, signal again
>> the WAL senders, and tell them to perform the last loop.

OK, I have been hacking that, finishing with the attached. In the
attached I am using SIGUSR2 to instruct the WAL senders to prepare for
stopping, and SIGINT to handle the last WAL flush loop. The shutdown
checkpoint moves on only if all active WAL senders are marked with a
STOPPING state. Reviews as welcome.

>> After that, I got a second, more simple idea.
>> CheckpointerShmem->ckpt_flags holds the information about checkpoints
>> currently running, so we could have the WAL senders look at this data
>> and prevent any commands generating WAL. The checkpointer may be
>> already stopped at the moment the WAL senders finish their loop, so we
>> need also to check if the checkpointer is running or not on those code
>> paths. Such safeguards may actually be enough for the problem of this
>> thread. Thoughts?
>>
>
> Hmm but how do we handle statements that are already in progress by the
> time ckpt_flags changes?

Yup, this does not handle well race conditions.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] TAP tests - installcheck vs check
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] identity columns