Re: Logical Replication slot disappeared after promote Standby - Mailing list pgsql-hackers

From Perumal Raj
Subject Re: Logical Replication slot disappeared after promote Standby
Date
Msg-id CALvqh4pcn3bF2AbjiBKbWrTSXSz_2FMe9cRzO=5JP0zhya6RmQ@mail.gmail.com
Whole thread Raw
In response to Re: Logical Replication slot disappeared after promote Standby  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Logical Replication slot disappeared after promote Standby
List pgsql-hackers
Thanks for explanation Shveta!

------------ 
As Summary in this original thread,

  1. Prerequisites for Setting Up a Logical Replication Slot sync in >= pg17

    To successfully configure a logical replication slot, ensure the following settings are applied:

    wal_level = 'logical'
    hot_standby = 'on'
    hot_standby_feedback = 'on'
    sync_replication_slots = 'on'
  2. Replication Slot Synchronization

    Logical replication slots can synchronize with all direct standby servers of the primary but are not compatible with cascade standby servers.

  3. Temporary Status of New Standby Slots

    If a new standby server is created after the logical replication slot, it will be marked as temporary=true until the reset_lsn of the primary matches the confirmed_lsn of the new standby.

  4. Limitations on Using Logical Replication Slots

    While logical replication slots can synchronize on the direct standby side, they cannot be utilized (as in the case of Debezium) until the standby server is promoted to primary. Attempting to use a synchronized logical slot on a standby server will result in the following error:

    org.postgresql.util.PSQLException: ERROR: cannot use replication slot "kafka_logical_slot" for logical decoding
    Detail: This replication slot is being synchronized from the primary server.
Add on in ths thread,
 
We can advance the reset_lsn of a logical slot using the pg_logical_slot_get_changes function. However, there is a limitation regarding the plugin type (specifically, pgoutput).
replica_test=# SELECT * FROM pg_logical_slot_get_changes('kafka_logical_slot', NULL, NULL);
ERROR:  option "proto_version" missing
CONTEXT:  slot "kafka_logical_slot", output plugin "pgoutput", in the startup callback

Next, we can create a logical replication slot:

replica_test=# SELECT pg_create_logical_replication_slot('test', 'test_decoding', false, true, true);
pg_create_logical_replication_slot
------------------------------------
(test, 0/7B001AA0)

Now, let's attempt to retrieve changes from the new slot:

replica_test=# SELECT * FROM pg_logical_slot_get_changes('test', NULL, NULL);
WARNING:  cannot specify logical replication slot "kafka_logical_slot" in parameter "synchronized_standby_slots"
DETAIL:  Logical replication is waiting for correction on replication slot "kafka_logical_slot".
HINT:  Remove the logical replication slot "kafka_logical_slot" from parameter "synchronized_standby_slots".

To resolve this, we will alter the system settings:

replica_test=# ALTER SYSTEM SET synchronized_standby_slots = '';

Finally, we can check for changes again:

replica_test=# SELECT * FROM pg_logical_slot_get_changes('test', NULL, NULL);
     lsn     | xid  |                     data                     
-------------+------+---------------------------------------------- 0/7B001AA0 | 1218 | BEGIN 1218 0/7B00B9D0 | 1218 | table public.customers_1: TRUNCATE: (no-flags) 0/7B00BB70 | 1218 | COMMIT 1218

Thanks Shveta, Zhijie Hou 
Please correct me if needed.

On Fri, Jun 13, 2025 at 2:51 AM shveta malik <shveta.malik@gmail.com> wrote:
On Fri, Jun 13, 2025 at 1:00 PM Perumal Raj <perucinci@gmail.com> wrote:
>
> Yes Shveta!
>
> I could see repeated message in New-replica .
>
> 2025-06-13 06:20:30.146 UTC [277861] LOG:  could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:20:30.146 UTC [277861] DETAIL:  The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:21:00.176 UTC [277861] LOG:  could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:21:00.176 UTC [277861] DETAIL:  The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:21:30.207 UTC [277861] LOG:  could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:21:30.207 UTC [277861] DETAIL:  The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:22:00.238 UTC [277861] LOG:  could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:22:00.238 UTC [277861] DETAIL:  The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:22:30.268 UTC [277861] LOG:  could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:22:30.268 UTC [277861] DETAIL:  The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:23:00.299 UTC [277861] LOG:  could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:23:00.299 UTC [277861] DETAIL:  The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:23:30.329 UTC [277861] LOG:  could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:23:30.329 UTC [277861] DETAIL:  The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:24:00.360 UTC [277861] LOG:  could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:24:00.360 UTC [277861] DETAIL:  The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:24:30.391 UTC [277861] LOG:  could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:24:30.391 UTC [277861] DETAIL:  The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:25:00.421 UTC [277861] LOG:  could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:25:00.421 UTC [277861] DETAIL:  The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
> 2025-06-13 06:25:30.452 UTC [277861] LOG:  could not synchronize replication slot "kafka_logical_slot" because remote slot precedes local slot
> 2025-06-13 06:25:30.452 UTC [277861] DETAIL:  The remote slot has LSN 0/6D0000B8 and catalog xmin 1085, but the local slot has LSN 0/6F000000 and catalog xmin 1088.
>
>
> It appears that my Debezium connectors have stopped consuming data, resulting in an outdated restart_lsn of "0/6D0000B8".
>

Yes, if there are no consumers consuming the changes on the failover
slot on primary, and meanwhile slot synchronization is started, the
initial sync may have such a temporary state of synced slot. This is
intentionally done to prevent the inconsistent state of the synced
slot and avoid unexpected behaviour if failover is performed at that
moment.

> In contrast, the New_replica has a restart_lsn that matches the primary server's most recent confirmed_flush_lsn, indicating it is up to date.
>
> As soon as I recreate that replication slot, it got sync with New_Replica(temporary=false) .
>
> 2025-06-13 06:26:00.484 UTC [277861] LOG:  dropped replication slot "kafka_logical_slot" of database with OID 16384
>
> 2025-06-13 06:26:30.520 UTC [277861] LOG:  starting logical decoding for slot "kafka_logical_slot"
>
> 2025-06-13 06:26:30.520 UTC [277861] DETAIL:  Streaming transactions committing after 0/0, reading WAL from 0/76003140.
>
> 2025-06-13 06:26:30.520 UTC [277861] LOG:  logical decoding found consistent point at 0/76003140
>
> 2025-06-13 06:26:30.520 UTC [277861] DETAIL:  There are no running transactions.
>
> 2025-06-13 06:26:30.526 UTC [277861] LOG:  newly created replication slot "kafka_logical_slot" is sync-ready now
>
> 2025-06-13 06:35:39.212 UTC [277857] LOG:  restartpoint starting: time
>
> 2025-06-13 06:35:42.022 UTC [277857] LOG:  restartpoint complete: wrote 29 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=2.805 s, sync=0.002 s, total=2.810 s; sync files=26, longest=0.002 s, average=0.001 s; distance=16496 kB, estimate=16496 kB; lsn=0/7701F480, redo lsn=0/7701F428
>
> 2025-06-13 06:35:42.022 UTC [277857] LOG:  recovery restart point at 0/7701F428
>
> 2025-06-13 06:35:42.022 UTC [277857] DETAIL:  Last completed transaction was at log time 2025-06-13 06:33:31.675341+00.
>
> Until the synchronization is complete, the slot type is marked as temporary=true, as you mentioned.
>
> is there any manual way to advance "restart_lsn"  of logical replication slot ? This is to ensure slot synchronization.
>

1) The first and recommended option is to get the connector running
again and let it advance the slot by consuming the changes.

2) Another option is to manually advance the slot on the primary by
using pg_logical_slot_get_binary_changes(). However, if the logical
replication setup is intended to consume these changes but is
currently inactive, then slot's consumer will not be able to reprocess
those changes upon restarting.  So the said API should be used only
after analyzing the current state of logical replication setup and if
we are okay with those changes not shipped to logical replication
consumers.

thanks
Shveta

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Reduce TupleHashEntryData struct size by half
Next
From: Dmitry Koval
Date:
Subject: Re: Add SPLIT PARTITION/MERGE PARTITIONS commands