Assertion failure in pgstat_assert_is_up during walsender exit - Mailing list pgsql-hackers

From Tom Lane
Subject Assertion failure in pgstat_assert_is_up during walsender exit
Date
Msg-id 2924589.1644871540@sss.pgh.pa.us
Whole thread Raw
Responses Re: Assertion failure in pgstat_assert_is_up during walsender exit  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
I tried to replicate the problem described in [1] about logical
replication from a v14 source DB to a v11 target.  It fails as
described there; I've not yet tracked down why, but it looks like
the v11 apply worker fails and closes the connection after sending
CREATE_REPLICATION_SLOT.  The v14 walsender doesn't have a problem
with that.  However, if I do the same thing using HEAD as the source,
I see an assertion failure on the way out of the walsender.  The
postmaster log shows

2022-02-14 15:18:29.890 EST [2922064] LOG:  starting logical decoding for slot "sub01"
2022-02-14 15:18:29.890 EST [2922064] DETAIL:  Streaming transactions committing after 0/FEA23B0, reading WAL from
0/FEA23B0.
2022-02-14 15:18:29.890 EST [2922064] STATEMENT:  START_REPLICATION SLOT "sub01" LOGICAL 0/0 (proto_version '1',
publication_names'"pub01"') 
2022-02-14 15:18:29.890 EST [2922064] LOG:  logical decoding found consistent point at 0/FEA23B0
2022-02-14 15:18:29.890 EST [2922064] DETAIL:  There are no running transactions.
2022-02-14 15:18:29.890 EST [2922064] STATEMENT:  START_REPLICATION SLOT "sub01" LOGICAL 0/0 (proto_version '1',
publication_names'"pub01"') 
2022-02-14 15:18:29.895 EST [2922066] LOG:  logical decoding found consistent point at 0/FEA2460
2022-02-14 15:18:29.895 EST [2922066] DETAIL:  There are no running transactions.
2022-02-14 15:18:29.895 EST [2922066] STATEMENT:  CREATE_REPLICATION_SLOT "sub01_129343_sync_129337" TEMPORARY LOGICAL
pgoutputUSE_SNAPSHOT 
TRAP: FailedAssertion("pgstat_is_initialized && !pgstat_is_shutdown", File: "pgstat.c", Line: 5140, PID: 2922066)
postgres: walsender postgres [local] idle in transaction(ExceptionalCondition+0x7c)[0x95126c]
postgres: walsender postgres [local] idle in transaction[0x7a66dd]
postgres: walsender postgres [local] idle in transaction[0x7a68fb]
postgres: walsender postgres [local] idle in transaction(pgstat_report_replslot_drop+0x32)[0x7aad72]
postgres: walsender postgres [local] idle in transaction[0x7e7220]
postgres: walsender postgres [local] idle in transaction(ReplicationSlotCleanup+0xf1)[0x7e7561]
postgres: walsender postgres [local] idle in transaction[0x833982]
postgres: walsender postgres [local] idle in transaction(shmem_exit+0x85)[0x816195]
postgres: walsender postgres [local] idle in transaction[0x816277]
postgres: walsender postgres [local] idle in transaction(proc_exit+0x8)[0x816308]
postgres: walsender postgres [local] idle in transaction(PostgresMain+0x4b4)[0x83e2f4]
postgres: walsender postgres [local] idle in transaction[0x7b1872]
postgres: walsender postgres [local] idle in transaction(PostmasterMain+0xd2c)[0x7b282c]
postgres: walsender postgres [local] idle in transaction(main+0x1e9)[0x509489]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7fec53a86493]
postgres: walsender postgres [local] idle in transaction(_start+0x2e)[0x50976e]
2022-02-14 15:18:29.974 EST [2921719] LOG:  server process (PID 2922066) was terminated by signal 6: Aborted
2022-02-14 15:18:29.974 EST [2921719] LOG:  terminating any other active server processes

Stack trace looks like

#2  0x000000000095128b in ExceptionalCondition (
    conditionName=conditionName@entry=0xabe798 "pgstat_is_initialized && !pgstat_is_shutdown",
errorType=errorType@entry=0x9a4757"FailedAssertion",  
    fileName=fileName@entry=0xabe554 "pgstat.c",
    lineNumber=lineNumber@entry=5140) at assert.c:69
#3  0x00000000007a66dd in pgstat_assert_is_up () at pgstat.c:5140
#4  0x00000000007a68fb in pgstat_assert_is_up () at pgstat.c:3244
#5  pgstat_send (msg=<optimized out>, len=<optimized out>) at pgstat.c:3228
#6  0x00000000007aad72 in pgstat_report_replslot_drop (
    slotname=slotname@entry=0x7fec53654320 "sub01_129343_sync_129337")
    at pgstat.c:1948
#7  0x00000000007e7220 in ReplicationSlotDropPtr (slot=0x7fec53654308)
    at slot.c:696
#8  0x00000000007e7561 in ReplicationSlotCleanup () at slot.c:547
#9  0x0000000000833982 in ProcKill (code=<optimized out>, arg=<optimized out>)
    at proc.c:839
#10 ProcKill (code=<optimized out>, arg=<optimized out>) at proc.c:804
#11 0x0000000000816195 in shmem_exit (code=code@entry=0) at ipc.c:272
#12 0x0000000000816277 in proc_exit_prepare (code=code@entry=0) at ipc.c:194
#13 0x0000000000816308 in proc_exit (code=code@entry=0) at ipc.c:107
#14 0x000000000083e2f4 in PostgresMain (dbname=<optimized out>,

I guess this indicates that walsenders don't initialize their pgstats
support soon enough?

            regards, tom lane

[1] https://www.postgresql.org/message-id/CANhtRiamAgYt1A-Nh4%3DmU3E1UhG9XPgB%2BX6mW1DWqa93vUXW9A%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Maciek Sakrejda
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Next
From: Tom Lane
Date:
Subject: Re: Time to drop plpython2?