Re: Issue with logical replication slot during switchover - Mailing list pgsql-hackers
From | Fabrice Chapuis |
---|---|
Subject | Re: Issue with logical replication slot during switchover |
Date | |
Msg-id | CAA5-nLDfuTkJ4CY4heuQWwqtCvpDRsNw=+Hie95RO0JnTS9_Lg@mail.gmail.com Whole thread Raw |
In response to | Re: Issue with logical replication slot during switchover (Fabrice Chapuis <fabrice636861@gmail.com>) |
List | pgsql-hackers |
Hi Shveta,
Here is the version 1 of the patch with corrections
1)
In both create_physical_replication_slot() and
create_physical_replication_slot():
+ false, false,false);
,false --> , false (space after comma is recommended)
Here is the version 1 of the patch with corrections
1)
In both create_physical_replication_slot() and
create_physical_replication_slot():
+ false, false,false);
,false --> , false (space after comma is recommended)
2)
+ elog(DEBUG1, "logical replication slot %s created with option
allow_overwrite to %s",
+ NameStr(slot->data.name), slot->data.allow_overwrite ? "true" : "false");
IMO, we don't need this as we do not log other properties of the slot as well.
Point 1-2 done
3)
We can have pg_replication_slots (pg_get_replication_slots) modified
to display the new property. Also we can modify docs to have the new
property defined.
Point 3 Not yet implemented
4)
+ {
+ /* Check if we need to overwrite an existing logical slot */
+ if (allow_overwrite)
+ {
+ /* Get rid of a replication slot that is no longer wanted */
+ ReplicationSlotDrop(remote_slot->name,true);
+
+ /* Get rid of a replication slot that is no longer wanted */
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("slot \"%s\" already exists"
+ " on the standby but it will be dropped because "
+ "flag allow_overwrite is set to true",
+ remote_slot->name));
+
+ /* Going back to the main loop after droping the failover slot */
+ return false;
Currently we are returning after dropping the slot. But I think we
shall drop the slot and proceed with new slot creation of the same
name otherwise we may be left with old slot dropped and no new slot
created specially when API is run.
Scenario:
--On primary, create 2 slots: slot1 and slot2.
--On standby, create one slot slot1 with allow_overwrite=true.
--Run 'SELECT pg_sync_replication_slots();' on standby.
At the end of API, expectation is that both slots will be present with
'synced'=true, but only slot2 is present. if we run this API next
time, slot1 is created. It should have been dropped and recreated (or
say overwritten) in the first run itself.
+ {
+ /* Check if we need to overwrite an existing logical slot */
+ if (allow_overwrite)
+ {
+ /* Get rid of a replication slot that is no longer wanted */
+ ReplicationSlotDrop(remote_slot->name,true);
+
+ /* Get rid of a replication slot that is no longer wanted */
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("slot \"%s\" already exists"
+ " on the standby but it will be dropped because "
+ "flag allow_overwrite is set to true",
+ remote_slot->name));
+
+ /* Going back to the main loop after droping the failover slot */
+ return false;
Currently we are returning after dropping the slot. But I think we
shall drop the slot and proceed with new slot creation of the same
name otherwise we may be left with old slot dropped and no new slot
created specially when API is run.
Scenario:
--On primary, create 2 slots: slot1 and slot2.
--On standby, create one slot slot1 with allow_overwrite=true.
--Run 'SELECT pg_sync_replication_slots();' on standby.
At the end of API, expectation is that both slots will be present with
'synced'=true, but only slot2 is present. if we run this API next
time, slot1 is created. It should have been dropped and recreated (or
say overwritten) in the first run itself.
Point 4:
I put the creation slot under the allow_overwrite condition.
After testing with syn_standby_slots disable on the standby, it works. I think the code for the synchronisation of the new slot could be factorised.
5)
IMO, LOG is sufficient here, as the action aligns with the
user-provided 'allow_overwrite' setting.
After testing with syn_standby_slots disable on the standby, it works. I think the code for the synchronisation of the new slot could be factorised.
5)
IMO, LOG is sufficient here, as the action aligns with the
user-provided 'allow_overwrite' setting.
Point 5
Where is it in the code?
Best regards,
Fabrice
On Fri, Sep 12, 2025 at 2:44 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
Thanks Shveta for all your comments. I'll prepare a version 2 of the patch including your proposals.RegardsFabriceOn Thu, 11 Sep 2025, 07:42 shveta malik <shveta.malik@gmail.com> wrote:On Sun, Sep 7, 2025 at 1:32 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> Hi,
> Here the version 1 of the patch with the modifications.
> I do a git diff --check until there are no more errors.
> In a next version of the patch, I think I have make change to call ReplicationSlotCreate() function with the value of the flag allow_overwrite be consistent
In which flow? I see all ReplicationSlotCreate() references already
accepting the value.
> and modify the interface ReplicationSlotCreate to display the attribute allow_overwrite.
>
Do you mean to modify pg_replication_slots?
Thank You for the patch. Please find a few comments:
1)
In both create_physical_replication_slot() and
create_physical_replication_slot():
+ false, false,false);
,false --> , false (space after comma is recommended)
2)
+ elog(DEBUG1, "logical replication slot %s created with option
allow_overwrite to %s",
+ NameStr(slot->data.name), slot->data.allow_overwrite ? "true" : "false");
IMO, we don't need this as we do not log other properties of the slot as well.
3)
We can have pg_replication_slots (pg_get_replication_slots) modified
to display the new property. Also we can modify docs to have the new
property defined.
4)
+ {
+ /* Check if we need to overwrite an existing logical slot */
+ if (allow_overwrite)
+ {
+ /* Get rid of a replication slot that is no longer wanted */
+ ReplicationSlotDrop(remote_slot->name,true);
+
+ /* Get rid of a replication slot that is no longer wanted */
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("slot \"%s\" already exists"
+ " on the standby but it will be dropped because "
+ "flag allow_overwrite is set to true",
+ remote_slot->name));
+
+ /* Going back to the main loop after droping the failover slot */
+ return false;
Currently we are returning after dropping the slot. But I think we
shall drop the slot and proceed with new slot creation of the same
name otherwise we may be left with old slot dropped and no new slot
created specially when API is run.
Scenario:
--On primary, create 2 slots: slot1 and slot2.
--On standby, create one slot slot1 with allow_overwrite=true.
--Run 'SELECT pg_sync_replication_slots();' on standby.
At the end of API, expectation is that both slots will be present with
'synced'=true, but only slot2 is present. if we run this API next
time, slot1 is created. It should have been dropped and recreated (or
say overwritten) in the first run itself.
5)
IMO, LOG is sufficient here, as the action aligns with the
user-provided 'allow_overwrite' setting.
thanks
Shveta
Attachment
pgsql-hackers by date: