Home > mailing lists

Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From	Hsu, John
Subject	Re: Synchronizing slots from primary to standby
Date	December 16, 2021 02:15:42
Msg-id	2415E2B4-F79E-4C24-A28E-78D40721D08F@amazon.com Whole thread Raw
In response to	Re: Synchronizing slots from primary to standby (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
List	pgsql-hackers

Tree view

Hello,

I started taking a brief look at the v2 patch, and it does appear to work for the basic case. Logical slot is
synchronizedacross and I can connect to the promoted standby and stream changes afterwards.
 

It's not clear to me what the correct behavior is when a logical slot that has been synced to the replica and then it
getsdeleted on the writer. Would we expect this to be propagated or leave it up to the end-user to manage?
 

> +       rawname = pstrdup(standby_slot_names);
> +       SplitIdentifierString(rawname, ',', &namelist);
> +
> +       while (true)
> +       {
> +               int                     wait_slots_remaining;
> +               XLogRecPtr      oldest_flush_pos = InvalidXLogRecPtr;
> +               int                     rc;
> +
> +               wait_slots_remaining = list_length(namelist);
> +
> +               LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> +               for (int i = 0; i < max_replication_slots; i++)
> +               {

Even though standby_slot_names is PGC_SIGHUP, we never reload/re-process the value. If we have a wrong entry in there,
thebackend becomes stuck until we re-establish the logical connection. Adding "postmaster/interrupt.h" with
ConfigReloadPending/ ProcessConfigFile does seem to work.
 

Another thing I noticed is that once it starts waiting in this block, Ctrl+C doesn't seem to terminate the backend?

pg_recvlogical -d postgres -p 5432 --slot regression_slot --start -f -
..
^Cpg_recvlogical: error: unexpected termination of replication stream: 

The logical backend connection is still present:

ps aux | grep 51263
   hsuchen 51263 80.7  0.0 320180 14304 ?        Rs   01:11   3:04 postgres: walsender hsuchen [local]
START_REPLICATION

pstack 51263
#0  0x00007ffee99e79a5 in clock_gettime ()
#1  0x00007f8705e88246 in clock_gettime () from /lib64/libc.so.6
#2  0x000000000075f141 in WaitEventSetWait ()
#3  0x000000000075f565 in WaitLatch ()
#4  0x0000000000720aea in ReorderBufferProcessTXN ()
#5  0x00000000007142a6 in DecodeXactOp ()
#6  0x000000000071460f in LogicalDecodingProcessRecord ()

It can be terminated with a pg_terminate_backend though.

If we have a physical slot with name foo on the standby, and then a logical slot is created on the writer with the same
slot_nameit does error out on the replica although it prevents other slots from being synchronized which is probably
fine.

2021-12-16 02:10:29.709 UTC [73788] LOG:  replication slot synchronization worker for database "postgres" has started
2021-12-16 02:10:29.713 UTC [73788] ERROR:  cannot use physical replication slot for logical decoding
2021-12-16 02:10:29.714 UTC [73037] DEBUG:  unregistering background worker "replication slot synchronization worker"

On 12/14/21, 2:26 PM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.
 



    On 28.11.21 07:52, Bharath Rupireddy wrote:
    > 1) Instead of a new LIST_SLOT command, can't we use
    > READ_REPLICATION_SLOT (slight modifications needs to be done to make
    > it support logical replication slots and to get more information from
    > the subscriber).

    I looked at that but didn't see an obvious way to consolidate them.
    This is something we could look at again later.

    > 2) How frequently the new bg worker is going to sync the slot info?
    > How can it ensure that the latest information exists say when the
    > subscriber is down/crashed before it picks up the latest slot
    > information?

    The interval is currently hardcoded, but could be a configuration
    setting.  In the v2 patch, there is a new setting that orders physical
    replication before logical so that the logical subscribers cannot get
    ahead of the physical standby.

    > 3) Instead of the subscriber pulling the slot info, why can't the
    > publisher (via the walsender or a new bg worker maybe?) push the
    > latest slot info? I'm not sure we want to add more functionality to
    > the walsender, if yes, isn't it going to be much simpler?

    This sounds like the failover slot feature, which was rejected.

pgsql-hackers by date:

From: Michael Paquier
Date: 16 December 2021, 01:39:05
Subject: Re: pg_upgrade should truncate/remove its logs before running

From: "wangw.fnst@fujitsu.com"
Date: 16 December 2021, 02:27:06
Subject: RE: Confused comment about drop replica identity index

Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

Previous

Next