Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
| From | Hsu, John |
|---|---|
| Subject | Re: Synchronizing slots from primary to standby |
| Date | |
| Msg-id | 2415E2B4-F79E-4C24-A28E-78D40721D08F@amazon.com Whole thread Raw |
| In response to | Re: Synchronizing slots from primary to standby (Peter Eisentraut <peter.eisentraut@enterprisedb.com>) |
| List | pgsql-hackers |
Hello,
I started taking a brief look at the v2 patch, and it does appear to work for the basic case. Logical slot is
synchronizedacross and I can connect to the promoted standby and stream changes afterwards.
It's not clear to me what the correct behavior is when a logical slot that has been synced to the replica and then it
getsdeleted on the writer. Would we expect this to be propagated or leave it up to the end-user to manage?
> + rawname = pstrdup(standby_slot_names);
> + SplitIdentifierString(rawname, ',', &namelist);
> +
> + while (true)
> + {
> + int wait_slots_remaining;
> + XLogRecPtr oldest_flush_pos = InvalidXLogRecPtr;
> + int rc;
> +
> + wait_slots_remaining = list_length(namelist);
> +
> + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> + for (int i = 0; i < max_replication_slots; i++)
> + {
Even though standby_slot_names is PGC_SIGHUP, we never reload/re-process the value. If we have a wrong entry in there,
thebackend becomes stuck until we re-establish the logical connection. Adding "postmaster/interrupt.h" with
ConfigReloadPending/ ProcessConfigFile does seem to work.
Another thing I noticed is that once it starts waiting in this block, Ctrl+C doesn't seem to terminate the backend?
pg_recvlogical -d postgres -p 5432 --slot regression_slot --start -f -
..
^Cpg_recvlogical: error: unexpected termination of replication stream:
The logical backend connection is still present:
ps aux | grep 51263
hsuchen 51263 80.7 0.0 320180 14304 ? Rs 01:11 3:04 postgres: walsender hsuchen [local]
START_REPLICATION
pstack 51263
#0 0x00007ffee99e79a5 in clock_gettime ()
#1 0x00007f8705e88246 in clock_gettime () from /lib64/libc.so.6
#2 0x000000000075f141 in WaitEventSetWait ()
#3 0x000000000075f565 in WaitLatch ()
#4 0x0000000000720aea in ReorderBufferProcessTXN ()
#5 0x00000000007142a6 in DecodeXactOp ()
#6 0x000000000071460f in LogicalDecodingProcessRecord ()
It can be terminated with a pg_terminate_backend though.
If we have a physical slot with name foo on the standby, and then a logical slot is created on the writer with the same
slot_nameit does error out on the replica although it prevents other slots from being synchronized which is probably
fine.
2021-12-16 02:10:29.709 UTC [73788] LOG: replication slot synchronization worker for database "postgres" has started
2021-12-16 02:10:29.713 UTC [73788] ERROR: cannot use physical replication slot for logical decoding
2021-12-16 02:10:29.714 UTC [73037] DEBUG: unregistering background worker "replication slot synchronization worker"
On 12/14/21, 2:26 PM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.
On 28.11.21 07:52, Bharath Rupireddy wrote:
> 1) Instead of a new LIST_SLOT command, can't we use
> READ_REPLICATION_SLOT (slight modifications needs to be done to make
> it support logical replication slots and to get more information from
> the subscriber).
I looked at that but didn't see an obvious way to consolidate them.
This is something we could look at again later.
> 2) How frequently the new bg worker is going to sync the slot info?
> How can it ensure that the latest information exists say when the
> subscriber is down/crashed before it picks up the latest slot
> information?
The interval is currently hardcoded, but could be a configuration
setting. In the v2 patch, there is a new setting that orders physical
replication before logical so that the logical subscribers cannot get
ahead of the physical standby.
> 3) Instead of the subscriber pulling the slot info, why can't the
> publisher (via the walsender or a new bg worker maybe?) push the
> latest slot info? I'm not sure we want to add more functionality to
> the walsender, if yes, isn't it going to be much simpler?
This sounds like the failover slot feature, which was rejected.
pgsql-hackers by date: