Thread: BUG #18789: logical replication slots are deleted after failovers

BUG #18789: logical replication slots are deleted after failovers

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18789
Logged by:          Sachin Konde-Deshmukh
Email address:      sachinkonde3@gmail.com
PostgreSQL version: 17.2
Operating system:   Oracle Linux 8.9
Description:

We are using 2 node PostgreSQL 17 HA setup using Patroni 4.0.4.
When I do failover 2nd or third time or more than once, it fails to transfer
or move logical replication slot to new Primary.
postgres=# select slot_name,slot_type, failover,
synced,confirmed_flush_lsn,active from pg_replication_slots;
     slot_name      | slot_type | failover | synced | confirmed_flush_lsn |
active
--------------------+-----------+----------+--------+---------------------+--------
psoel89pgcluster01 | physical  | f        | f      |                     |
t
mysub              | logical   | t        | t      | 0/4000AB8           |
t
(2 rows)
After First Failover -->
postgres=# select slot_name,slot_type, failover,
synced,confirmed_flush_lsn,active from pg_replication_slots;
     slot_name      | slot_type | failover | synced | confirmed_flush_lsn |
active
--------------------+-----------+----------+--------+---------------------+--------
psoel89pgcluster02 | physical  | f        | f      |                     |
t
mysub              | logical   | f        | f      | 0/50001E0           |
t
(2 rows)
After 2nd Failover -->
select slot_name,slot_type, failover, synced,confirmed_flush_lsn,active from
pg_replication_slots;
     slot_name      | slot_type | failover | synced | confirmed_flush_lsn |
active
--------------------+-----------+----------+--------+---------------------+--------
psoel89pgcluster01 | physical  | f        | f      |                     |
t
mysub              | logical   | f        | f      | 0/60002B0           |
t
After 3rd failover -->
postgres=# select slot_name,slot_type, failover,
synced,confirmed_flush_lsn,active from pg_replication_slots;
     slot_name      | slot_type | failover | synced | confirmed_flush_lsn |
active
--------------------+-----------+----------+--------+---------------------+--------
psoel89pgcluster02 | physical  | f        | f      |                     |
t
(1 row)
has context menu


has context menu


Re: BUG #18789: logical replication slots are deleted after failovers

From
Masahiko Sawada
Date:
On Wed, Jan 29, 2025 at 7:01 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      18789
> Logged by:          Sachin Konde-Deshmukh
> Email address:      sachinkonde3@gmail.com
> PostgreSQL version: 17.2
> Operating system:   Oracle Linux 8.9
> Description:
>
> We are using 2 node PostgreSQL 17 HA setup using Patroni 4.0.4.
> When I do failover 2nd or third time or more than once, it fails to transfer
> or move logical replication slot to new Primary.
> postgres=# select slot_name,slot_type, failover,
> synced,confirmed_flush_lsn,active from pg_replication_slots;
>      slot_name      | slot_type | failover | synced | confirmed_flush_lsn |
> active
> --------------------+-----------+----------+--------+---------------------+--------
> psoel89pgcluster01 | physical  | f        | f      |                     |
> t
> mysub              | logical   | t        | t      | 0/4000AB8           |
> t
> (2 rows)

I guess that this is the list of slots on the primary.

> After First Failover -->
> postgres=# select slot_name,slot_type, failover,
> synced,confirmed_flush_lsn,active from pg_replication_slots;
>      slot_name      | slot_type | failover | synced | confirmed_flush_lsn |
> active
> --------------------+-----------+----------+--------+---------------------+--------
> psoel89pgcluster02 | physical  | f        | f      |                     |
> t
> mysub              | logical   | f        | f      | 0/50001E0           |
> t
> (2 rows)

I guess that this is the list of slots on the new primary after a
failover. It seems that a subscriber is receiving logical replication
changes from the new primary by using the 'mysub' slot, which makes
sense. However, a problem I can see is that its 'failover' and
'synced' fields were false. Was the slot sync worker running on the
standby before the first failover?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



RE: BUG #18789: logical replication slots are deleted after failovers

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Sachin,
> 
> We are using 2 node PostgreSQL 17 HA setup using Patroni 4.0.4.
> When I do failover 2nd or third time or more than once, it fails to transfer
> or move logical replication slot to new Primary.

For better understanding, can you clarify 1) network configuration you created
and 2) actual nodes queries were run?
Four instances are needed to do a failover third time, but not sure how they connected.

----------
Best regards,
Haato Kuroda