Re: Drop replslot after pgstat_shutdown cause assert coredump - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Drop replslot after pgstat_shutdown cause assert coredump
Date
Msg-id CAJcOf-e6pvs45WcEeyAV3hOj1yUK83ED4h4ZL7AGquTKvK_eQA@mail.gmail.com
Whole thread Raw
In response to Drop replslot after pgstat_shutdown cause assert coredump  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses Re: Drop replslot after pgstat_shutdown cause assert coredump  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
On Mon, Oct 11, 2021 at 6:55 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> I can see the walsender tried to release a not-quite-ready repliaction slot
> that was created when create a subscription. But the pgstat has been shutdown
> before invoking ReplicationSlotRelease().
>
> The stack is as follows:
>
> #2  in ExceptionalCondition (pgstat_is_initialized && !pgstat_is_shutdown)
> #3  in pgstat_assert_is_up () at pgstat.c:4852
> #4  in pgstat_send (msg=msg@entry=0x7ffe716f7470, len=len@entry=144) at pgstat.c:3075
> #5  in pgstat_report_replslot_drop (slotname=slotname@entry=0x7fbcf57a3c98 "sub") at pgstat.c:1869
> #6  in ReplicationSlotDropPtr (slot=0x7fbcf57a3c80) at slot.c:696
> #7  in ReplicationSlotDropAcquired () at slot.c:585
> #8  in ReplicationSlotRelease () at slot.c:482
> #9  in ProcKill (code=<optimized out>, arg=<optimized out>) at proc.c:852
> #10 in shmem_exit (code=code@entry=0) at ipc.c:272
> #11 in proc_exit_prepare (code=code@entry=0) at ipc.c:194
> #12 in proc_exit (code=code@entry=0) at ipc.c:107
> #13 in ProcessRepliesIfAny () at walsender.c:1807
> #14 in WalSndWaitForWal (loc=loc@entry=22087632) at walsender.c:1417
> #15 in logical_read_xlog_page (state=0x2f8c600, targetPagePtr=22085632,
>     reqLen=, targetRecPtr=, cur_page=0x2f6c1e0 "\016\321\005") at walsender.c:821
> #16 in ReadPageInternal (state=state@entry=0x2f8c600,
>     pageptr=pageptr@entry=22085632, reqLen=reqLen@entry=2000) at xlogreader.c:667
> #17 in XLogReadRecord (state=0x2f8c600,
>     errormsg=errormsg@entry=0x7ffe716f7f98) at xlogreader.c:337
> #18 in DecodingContextFindStartpoint (ctx=ctx@entry=0x2f8c240)
>     at logical.c:606
> #19 in CreateReplicationSlot (cmd=cmd@entry=0x2f1aef0)
>
> Is this behavior expected ?
>

I'd say it's not!

Just looking at the stacktrace, I'm thinking that the following commit
may have had a bearing on this problem, by causing pgstat to be
shutdown earlier:

commit fb2c5028e63589c01fbdf8b031be824766dc7a68
Author: Andres Freund <andres@anarazel.de>
Date:   Fri Aug 6 10:05:57 2021 -0700

    pgstat: Schedule per-backend pgstat shutdown via before_shmem_exit().


Can you see if the problem can be reproduced prior to this commit?


Regards,
Greg Nancarrow
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set
Next
From: Aleksander Alekseev
Date:
Subject: Re: RFC: compression dictionaries for JSONB