Re: Logical Replication slot disappeared after promote Standby - Mailing list pgsql-hackers
From | Perumal Raj |
---|---|
Subject | Re: Logical Replication slot disappeared after promote Standby |
Date | |
Msg-id | CALvqh4pcn3bF2AbjiBKbWrTSXSz_2FMe9cRzO=5JP0zhya6RmQ@mail.gmail.com Whole thread Raw |
In response to | Re: Logical Replication slot disappeared after promote Standby (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: Logical Replication slot disappeared after promote Standby
|
List | pgsql-hackers |
Prerequisites for Setting Up a Logical Replication Slot sync in >= pg17
To successfully configure a logical replication slot, ensure the following settings are applied:
wal_level = 'logical' hot_standby = 'on' hot_standby_feedback = 'on' sync_replication_slots = 'on'
Replication Slot Synchronization
Logical replication slots can synchronize with all direct standby servers of the primary but are not compatible with cascade standby servers.
Temporary Status of New Standby Slots
If a new standby server is created after the logical replication slot, it will be marked as temporary=true until the reset_lsn of the primary matches the confirmed_lsn of the new standby.
Limitations on Using Logical Replication Slots
While logical replication slots can synchronize on the direct standby side, they cannot be utilized (as in the case of Debezium) until the standby server is promoted to primary. Attempting to use a synchronized logical slot on a standby server will result in the following error:
org.postgresql.util.PSQLException: ERROR: cannot use replication slot "kafka_logical_slot" for logical decoding Detail: This replication slot is being synchronized from the primary server.
replica_test=# SELECT * FROM pg_logical_slot_get_changes('kafka_logical_slot', NULL, NULL);
ERROR: option "proto_version" missing
CONTEXT: slot "kafka_logical_slot", output plugin "pgoutput", in the startup callback
Next, we can create a logical replication slot:
replica_test=# SELECT pg_create_logical_replication_slot('test', 'test_decoding', false, true, true);
pg_create_logical_replication_slot
------------------------------------
(test, 0/7B001AA0)
Now, let's attempt to retrieve changes from the new slot:
replica_test=# SELECT * FROM pg_logical_slot_get_changes('test', NULL, NULL);
WARNING: cannot specify logical replication slot "kafka_logical_slot" in parameter "synchronized_standby_slots"
DETAIL: Logical replication is waiting for correction on replication slot "kafka_logical_slot".
HINT: Remove the logical replication slot "kafka_logical_slot" from parameter "synchronized_standby_slots".
To resolve this, we will alter the system settings:
replica_test=# ALTER SYSTEM SET synchronized_standby_slots = '';
Finally, we can check for changes again:
replica_test=# SELECT * FROM pg_logical_slot_get_changes('test', NULL, NULL);
lsn | xid | data
-------------+------+---------------------------------------------- 0/7B001AA0 | 1218 | BEGIN 1218 0/7B00B9D0 | 1218 | table public.customers_1: TRUNCATE: (no-flags) 0/7B00BB70 | 1218 | COMMIT 1218
Thanks Shveta,
Zhijie Hou
Please correct me if needed.
On Fri, Jun 13, 2025 at 1:00 PM Perumal Raj <perucinci@gmail.com> wrote:
>
> Yes Shveta!
>
> I could see repeated message in New-replica .
>
> 2025-06-13 06:20:30.146 UTC [277861] LOG: could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:20:30.146 UTC [277861] DETAIL: The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:21:00.176 UTC [277861] LOG: could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:21:00.176 UTC [277861] DETAIL: The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:21:30.207 UTC [277861] LOG: could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:21:30.207 UTC [277861] DETAIL: The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:22:00.238 UTC [277861] LOG: could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:22:00.238 UTC [277861] DETAIL: The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:22:30.268 UTC [277861] LOG: could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:22:30.268 UTC [277861] DETAIL: The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:23:00.299 UTC [277861] LOG: could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:23:00.299 UTC [277861] DETAIL: The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:23:30.329 UTC [277861] LOG: could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:23:30.329 UTC [277861] DETAIL: The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:24:00.360 UTC [277861] LOG: could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:24:00.360 UTC [277861] DETAIL: The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:24:30.391 UTC [277861] LOG: could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:24:30.391 UTC [277861] DETAIL: The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:25:00.421 UTC [277861] LOG: could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:25:00.421 UTC [277861] DETAIL: The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:25:30.452 UTC [277861] LOG: could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:25:30.452 UTC [277861] DETAIL: The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
>
>
> It appears that my Debezium connectors have stopped consuming data, resulting in an outdated restart_lsn of "0/6D0000B8".
>
Yes, if there are no consumers consuming the changes on the failover
slot on primary, and meanwhile slot synchronization is started, the
initial sync may have such a temporary state of synced slot. This is
intentionally done to prevent the inconsistent state of the synced
slot and avoid unexpected behaviour if failover is performed at that
moment.
> In contrast, the New_replica has a restart_lsn that matches the primary server's most recent confirmed_flush_lsn, indicating it is up to date.
>
> As soon as I recreate that replication slot, it got sync with New_Replica(temporary=false) .
>
> 2025-06-13 06:26:00.484 UTC [277861] LOG: dropped replication slot "kafka_logical_slot" of database with OID 16384
>
> 2025-06-13 06:26:30.520 UTC [277861] LOG: starting logical decoding for slot "kafka_logical_slot"
>
> 2025-06-13 06:26:30.520 UTC [277861] DETAIL: Streaming transactions committing after 0/0, reading WAL from 0/76003140.
>
> 2025-06-13 06:26:30.520 UTC [277861] LOG: logical decoding found consistent point at 0/76003140
>
> 2025-06-13 06:26:30.520 UTC [277861] DETAIL: There are no running transactions.
>
> 2025-06-13 06:26:30.526 UTC [277861] LOG: newly created replication slot "kafka_logical_slot" is sync-ready now
>
> 2025-06-13 06:35:39.212 UTC [277857] LOG: restartpoint starting: time
>
> 2025-06-13 06:35:42.022 UTC [277857] LOG: restartpoint complete: wrote 29 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=2.805 s, sync=0.002 s, total=2.810 s; sync files=26, longest=0.002 s, average=0.001 s; distance=16496 kB, estimate=16496 kB; lsn=0/7701F480, redo lsn=0/7701F428
>
> 2025-06-13 06:35:42.022 UTC [277857] LOG: recovery restart point at 0/7701F428
>
> 2025-06-13 06:35:42.022 UTC [277857] DETAIL: Last completed transaction was at log time 2025-06-13 06:33:31.675341+00.
>
> Until the synchronization is complete, the slot type is marked as temporary=true, as you mentioned.
>
> is there any manual way to advance "restart_lsn" of logical replication slot ? This is to ensure slot synchronization.
>
1) The first and recommended option is to get the connector running
again and let it advance the slot by consuming the changes.
2) Another option is to manually advance the slot on the primary by
using pg_logical_slot_get_binary_changes(). However, if the logical
replication setup is intended to consume these changes but is
currently inactive, then slot's consumer will not be able to reprocess
those changes upon restarting. So the said API should be used only
after analyzing the current state of logical replication setup and if
we are okay with those changes not shipped to logical replication
consumers.
thanks
Shveta
pgsql-hackers by date: