Re: Drop replslot after pgstat_shutdown cause assert coredump - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Drop replslot after pgstat_shutdown cause assert coredump |
Date | |
Msg-id | a40241b7-05f2-206b-c0ae-5f7d7f34734a@oss.nttdata.com Whole thread Raw |
In response to | Re: Drop replslot after pgstat_shutdown cause assert coredump (Greg Nancarrow <gregn4422@gmail.com>) |
Responses |
Re: Drop replslot after pgstat_shutdown cause assert coredump
|
List | pgsql-hackers |
On 2021/10/11 22:15, Greg Nancarrow wrote: > On Mon, Oct 11, 2021 at 6:55 PM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: >> >> I can see the walsender tried to release a not-quite-ready repliaction slot >> that was created when create a subscription. But the pgstat has been shutdown >> before invoking ReplicationSlotRelease(). >> >> The stack is as follows: >> >> #2 in ExceptionalCondition (pgstat_is_initialized && !pgstat_is_shutdown) >> #3 in pgstat_assert_is_up () at pgstat.c:4852 >> #4 in pgstat_send (msg=msg@entry=0x7ffe716f7470, len=len@entry=144) at pgstat.c:3075 >> #5 in pgstat_report_replslot_drop (slotname=slotname@entry=0x7fbcf57a3c98 "sub") at pgstat.c:1869 >> #6 in ReplicationSlotDropPtr (slot=0x7fbcf57a3c80) at slot.c:696 >> #7 in ReplicationSlotDropAcquired () at slot.c:585 >> #8 in ReplicationSlotRelease () at slot.c:482 >> #9 in ProcKill (code=<optimized out>, arg=<optimized out>) at proc.c:852 >> #10 in shmem_exit (code=code@entry=0) at ipc.c:272 >> #11 in proc_exit_prepare (code=code@entry=0) at ipc.c:194 >> #12 in proc_exit (code=code@entry=0) at ipc.c:107 >> #13 in ProcessRepliesIfAny () at walsender.c:1807 >> #14 in WalSndWaitForWal (loc=loc@entry=22087632) at walsender.c:1417 >> #15 in logical_read_xlog_page (state=0x2f8c600, targetPagePtr=22085632, >> reqLen=, targetRecPtr=, cur_page=0x2f6c1e0 "\016\321\005") at walsender.c:821 >> #16 in ReadPageInternal (state=state@entry=0x2f8c600, >> pageptr=pageptr@entry=22085632, reqLen=reqLen@entry=2000) at xlogreader.c:667 >> #17 in XLogReadRecord (state=0x2f8c600, >> errormsg=errormsg@entry=0x7ffe716f7f98) at xlogreader.c:337 >> #18 in DecodingContextFindStartpoint (ctx=ctx@entry=0x2f8c240) >> at logical.c:606 >> #19 in CreateReplicationSlot (cmd=cmd@entry=0x2f1aef0) >> >> Is this behavior expected ? >> > > I'd say it's not! Yes. I think this is a bug. > Just looking at the stacktrace, I'm thinking that the following commit > may have had a bearing on this problem, by causing pgstat to be > shutdown earlier: > > commit fb2c5028e63589c01fbdf8b031be824766dc7a68 > Author: Andres Freund <andres@anarazel.de> > Date: Fri Aug 6 10:05:57 2021 -0700 > > pgstat: Schedule per-backend pgstat shutdown via before_shmem_exit(). > > > Can you see if the problem can be reproduced prior to this commit? Even in prior to the commit, pgstat_shutdown_hook() can be called before ProcKill() at the backend exit, so ISTM that the problem can be reproduced. Probably we need to make sure that pgstat_shutdown_hook() is called after ProcKill(), e.g., by registering pgstat_shutdown_hook() into on_proc_exit_list (I'm not sure if this change is safe, though). Or maybe pgstat logic for replication slot drop needs to be overhauled. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
pgsql-hackers by date: