Thread: BUG #18789: logical replication slots are deleted after failovers
The following bug has been logged on the website: Bug reference: 18789 Logged by: Sachin Konde-Deshmukh Email address: sachinkonde3@gmail.com PostgreSQL version: 17.2 Operating system: Oracle Linux 8.9 Description: We are using 2 node PostgreSQL 17 HA setup using Patroni 4.0.4. When I do failover 2nd or third time or more than once, it fails to transfer or move logical replication slot to new Primary. postgres=# select slot_name,slot_type, failover, synced,confirmed_flush_lsn,active from pg_replication_slots; slot_name | slot_type | failover | synced | confirmed_flush_lsn | active --------------------+-----------+----------+--------+---------------------+-------- psoel89pgcluster01 | physical | f | f | | t mysub | logical | t | t | 0/4000AB8 | t (2 rows) After First Failover --> postgres=# select slot_name,slot_type, failover, synced,confirmed_flush_lsn,active from pg_replication_slots; slot_name | slot_type | failover | synced | confirmed_flush_lsn | active --------------------+-----------+----------+--------+---------------------+-------- psoel89pgcluster02 | physical | f | f | | t mysub | logical | f | f | 0/50001E0 | t (2 rows) After 2nd Failover --> select slot_name,slot_type, failover, synced,confirmed_flush_lsn,active from pg_replication_slots; slot_name | slot_type | failover | synced | confirmed_flush_lsn | active --------------------+-----------+----------+--------+---------------------+-------- psoel89pgcluster01 | physical | f | f | | t mysub | logical | f | f | 0/60002B0 | t After 3rd failover --> postgres=# select slot_name,slot_type, failover, synced,confirmed_flush_lsn,active from pg_replication_slots; slot_name | slot_type | failover | synced | confirmed_flush_lsn | active --------------------+-----------+----------+--------+---------------------+-------- psoel89pgcluster02 | physical | f | f | | t (1 row) has context menu has context menu
On Wed, Jan 29, 2025 at 7:01 AM PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 18789 > Logged by: Sachin Konde-Deshmukh > Email address: sachinkonde3@gmail.com > PostgreSQL version: 17.2 > Operating system: Oracle Linux 8.9 > Description: > > We are using 2 node PostgreSQL 17 HA setup using Patroni 4.0.4. > When I do failover 2nd or third time or more than once, it fails to transfer > or move logical replication slot to new Primary. > postgres=# select slot_name,slot_type, failover, > synced,confirmed_flush_lsn,active from pg_replication_slots; > slot_name | slot_type | failover | synced | confirmed_flush_lsn | > active > --------------------+-----------+----------+--------+---------------------+-------- > psoel89pgcluster01 | physical | f | f | | > t > mysub | logical | t | t | 0/4000AB8 | > t > (2 rows) I guess that this is the list of slots on the primary. > After First Failover --> > postgres=# select slot_name,slot_type, failover, > synced,confirmed_flush_lsn,active from pg_replication_slots; > slot_name | slot_type | failover | synced | confirmed_flush_lsn | > active > --------------------+-----------+----------+--------+---------------------+-------- > psoel89pgcluster02 | physical | f | f | | > t > mysub | logical | f | f | 0/50001E0 | > t > (2 rows) I guess that this is the list of slots on the new primary after a failover. It seems that a subscriber is receiving logical replication changes from the new primary by using the 'mysub' slot, which makes sense. However, a problem I can see is that its 'failover' and 'synced' fields were false. Was the slot sync worker running on the standby before the first failover? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
RE: BUG #18789: logical replication slots are deleted after failovers
From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Sachin, > > We are using 2 node PostgreSQL 17 HA setup using Patroni 4.0.4. > When I do failover 2nd or third time or more than once, it fails to transfer > or move logical replication slot to new Primary. For better understanding, can you clarify 1) network configuration you created and 2) actual nodes queries were run? Four instances are needed to do a failover third time, but not sure how they connected. ---------- Best regards, Haato Kuroda