Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher
Date
Msg-id 836087bc-548c-b682-f732-b2076b77bf33@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 17/04/17 18:02, Andres Freund wrote:
> On 2017-04-15 02:33:59 +0900, Fujii Masao wrote:
>> On Fri, Apr 14, 2017 at 10:33 PM, Petr Jelinek
>> <petr.jelinek@2ndquadrant.com> wrote:
>>> On 12/04/17 15:55, Fujii Masao wrote:
>>>> Hi,
>>>>
>>>> When I shut down the publisher while I repeated creating and dropping
>>>> the subscription in the subscriber, the publisher emitted the following
>>>> PANIC error during shutdown checkpoint.
>>>>
>>>> PANIC:  concurrent transaction log activity while database system is
>>>> shutting down
>>>>
>>>> The cause of this problem is that walsender for logical replication can
>>>> generate WAL records even during shutdown checkpoint.
>>>>
>>>> Firstly walsender keeps running until shutdown checkpoint finishes
>>>> so that all the WAL including shutdown checkpoint record can be
>>>> replicated to the standby. This was safe because previously walsender
>>>> could not generate WAL records. However this assumption became
>>>> invalid because of logical replication. That is, currenty walsender for
>>>> logical replication can generate WAL records, for example, by executing
>>>> CREATE_REPLICATION_SLOT command. This is an oversight in
>>>> logical replication patch, I think.
>>>
>>> Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
>>> that the issue with walsender still exist (since we now allow normal SQL
>>> to run there) but I think it's important to identify what exactly causes
>>> the WAL activity in your case
>>
>> At least in my case, the following CREATE_REPLICATION_SLOT command
>> generated WAL record.
>>
>>     BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ;
>>     CREATE_REPLICATION_SLOT testslot TEMPORARY LOGICAL pgoutput USE_SNAPSHOT;
>>
>> Here is the pg_waldump output of the WAL record that CREATE_REPLICATION_SLOT
>> generated.
>>
>>     rmgr: Standby     len (rec/tot):     24/    50, tx:          0,
>> lsn: 0/01601438, prev 0/01601400, desc: RUNNING_XACTS nextXid 692
>> latestCompletedXid 691 oldestRunningXid 692
>>
>> So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
>> and which generates WAL record about snapshot of running transactions.
> 
> Erroring out in these cases sounds easy enough.  Wonder if there's not a
> bigger problem with WAL records generated e.g. by HOT pruning or such,
> during decoding.  Not super likely, but would probably hit exactly the
> same, no?
> 

Sounds possible, yes. Sounds like that's going to be nontrivial to fix
though.

Another problem is that queries can run on walsender now. But that
should be possible to detect and shutdown just like backend.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher
Next
From: Andres Freund
Date:
Subject: Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher