Re: failover logical replication slots - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: failover logical replication slots
Date
Msg-id CAA4eK1+wHTNZcODabt53e+1OExc5EoLzdLAWEfbAWPECJVBDFQ@mail.gmail.com
Whole thread Raw
List pgsql-hackers
On Wed, Jun 11, 2025 at 10:17 PM Fabrice Chapuis
<fabrice636861@gmail.com> wrote:
>
> Thanks for your reply.
> The problem I see is that after creating a new subscription, we have:
>
> 1) if a failover occurs, on the new primary node, the failover and sync flags are both set to true, so there's no
problem.
>
> 2) when the old node returns as a secondary in the cluster, the failover flag is set to true and the sync flag is set
tofalse then 
> the error message is generated:  ERROR: exiting from slot synchronization because same name slot "sub_test" already
existson the standby 
>
> Why not change the value of the synced flag when the standby is joining the cluster ? If the slot on the primary node
hasthe same name as the slot on the secondary node and the failover flag is set to true, 
>
> if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) {
> slot->data.synced = true
> ...

IIUC, Hou-san also mentioned the same idea, but it is not that
straightforward because the user may have created a logical slot with
the same name but with a few other different properties like
two_phase, slot_type, etc. I think we can try to compare all such slot
properties to ensure that we can overwrite the same name slot, but
there is still a chance that we may overwrite a slot that the user has
created for some other purpose. Now, we may want to extend this
functionality such that we give some knob to user which allows us to
overwrite the existing slots with same name. Then user can use this
knob (GUC or something else) when starting the node as standby after
switchover and allow the overwrite for existing slots.

As mentioned by Hou-San and Dilip, I also think it is more important
for the old node that comes as a standby to remove logical slots to
avoid WAL accumulation. For example, we can provide a function like
pg_drop_all_slots() with a type parameter indicating logical or
physical, and then utilities like patroni that provide switchover
functionality can use that function to remove all existing slots
(maybe keep the slots that are required for failover) when starting
the node as a standby.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Missing program_XXX calling in pgbench tests