Thread: Assertion failure during walsender exit

Assertion failure during walsender exit

From
vignesh C
Date:
Hi,

I found one assertion failure while testing another logical
replication patch, the backtrace for the same is:
#3  0x000055cc18571e31 in ExceptionalCondition
(conditionName=0x55cc18724d00 "pgstat_is_initialized &&
!pgstat_is_shutdown", errorType=0x55cc187246f8 "FailedAssertion",
fileName=0x55cc1872435a "pgstat.c",
    lineNumber=4852) at assert.c:37
#4  0x000055cc182d72b6 in pgstat_assert_is_up () at pgstat.c:4852
#5  0x000055cc182d3dd0 in pgstat_send (msg=0x7ffd5df8a420, len=144) at
pgstat.c:3075
#6  0x000055cc182d2214 in pgstat_report_replslot_drop
(slotname=0x7f853c58e998 "sub1") at pgstat.c:1869
#7  0x000055cc1832dbe3 in ReplicationSlotDropPtr (slot=0x7f853c58e980)
at slot.c:696
#8  0x000055cc1832d8fe in ReplicationSlotDropAcquired () at slot.c:585
#9  0x000055cc1832d59c in ReplicationSlotRelease () at slot.c:482
#10 0x000055cc183a7be7 in ProcKill (code=0, arg=0) at proc.c:852
#11 0x000055cc18379c8d in shmem_exit (code=0) at ipc.c:272
#12 0x000055cc18379a94 in proc_exit_prepare (code=0) at ipc.c:194
#13 0x000055cc183799e1 in proc_exit (code=0) at ipc.c:107
#14 0x000055cc1833d4ff in ProcessRepliesIfAny () at walsender.c:1911
#15 0x000055cc1833cbbd in WalSndWaitForWal (loc=21662264) at walsender.c:1514
#16 0x000055cc1833b94b in logical_read_xlog_page
(state=0x55cc197b8170, targetPagePtr=21659648, reqLen=2616,
targetRecPtr=21662240, cur_page=0x55cc19797d50 "\016\321\005") at
walsender.c:922
#17 0x000055cc17f8f3a7 in ReadPageInternal (state=0x55cc197b8170,
pageptr=21659648, reqLen=2616) at xlogreader.c:667
#18 0x000055cc17f8eab4 in XLogReadRecord (state=0x55cc197b8170,
errormsg=0x7ffd5df8afd0) at xlogreader.c:337
#19 0x000055cc18302080 in DecodingContextFindStartpoint
(ctx=0x55cc197b7db0) at logical.c:606
#20 0x000055cc1833c12b in CreateReplicationSlot (cmd=0x55cc19745a20)
at walsender.c:1135
#21 0x000055cc1833d107 in exec_replication_command
(cmd_string=0x55cc196bfa00 "CREATE_REPLICATION_SLOT \"sub1\" LOGICAL
pgoutput (SNAPSHOT 'nothing')") at walsender.c:1740
#22 0x000055cc183b8298 in PostgresMain (dbname=0x55cc196ebb38
"postgres", username=0x55cc196ebb18 "vignesh") at postgres.c:4493
#23 0x000055cc182df658 in BackendRun (port=0x55cc196e3250) at postmaster.c:4584

This issue occurs during the walsender process exit. During the
process shutdown, statistics sending is shutdown(pgstat_is_shutdown)
in pgstat_shutdown_hook which is called before the shared memory exit.
Then later we try to send the replication slot dropped statistics
after statistics is shutdown from shmem_exit, it identifies that
statistics is shutdown causing the Assertion failure. I felt this
issue should be fixed.
Thoughts?

Regards,
Vignesh



Re: Assertion failure during walsender exit

From
Kyotaro Horiguchi
Date:
At Wed, 8 Dec 2021 10:02:15 +0530, vignesh C <vignesh21@gmail.com> wrote in 
> fileName=0x55cc1872435a "pgstat.c",
>     lineNumber=4852) at assert.c:37
> #4  0x000055cc182d72b6 in pgstat_assert_is_up () at pgstat.c:4852
> #5  0x000055cc182d3dd0 in pgstat_send (msg=0x7ffd5df8a420, len=144) at
> pgstat.c:3075
> #6  0x000055cc182d2214 in pgstat_report_replslot_drop
> (slotname=0x7f853c58e998 "sub1") at pgstat.c:1869
> #7  0x000055cc1832dbe3 in ReplicationSlotDropPtr (slot=0x7f853c58e980)
> at slot.c:696
> #8  0x000055cc1832d8fe in ReplicationSlotDropAcquired () at slot.c:585
> #9  0x000055cc1832d59c in ReplicationSlotRelease () at slot.c:482
> #10 0x000055cc183a7be7 in ProcKill (code=0, arg=0) at proc.c:852

Yeah, this issue has been reported several times and, maybe, is under
discussion.

https://www.postgresql.org/message-id/CAD21AoBgSTF8gp1SKojKRu9dqzN4p1Ob6Mh%3DQgVhGfLO1NtUYA%40mail.gmail.com
https://www.postgresql.org/message-id/OS0PR01MB571621B206EEB17D8AB171F094B59%40OS0PR01MB5716.jpnprd01.prod.outlook.com
https://www.postgresql.org/message-id/CA%2BHiwqEpGF%3DROEvVOqvvDF%3Dw9iaMBx0g5zBBhP62ZFE7vW6O8w%40mail.gmail.com
regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center