Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAJpy0uCBHRX-GSKCeVza44kFEC=uTMD_6uzuXXnbUX32Vt-g8Q@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Nisha Moond <nisha.moond412@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby |
List | pgsql-hackers |
On Fri, Dec 1, 2023 at 5:40 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > Review for v41 patch. Thanks for the feedback. > > 1. > ====== > src/backend/utils/misc/postgresql.conf.sample > > +#enable_syncslot = on # enables slot synchronization on the physical > standby from the primary > > enable_syncslot is disabled by default, so, it should be 'off' here. > Sure, I will change it. > ~~~ > 2. > IIUC, the slotsyncworker's connection to the primary is to execute a > query. Its aim is not walsender type connection, but at primary when > queried, the 'backend_type' is set to 'walsender'. > Snippet from primary db- > > datname | usename | application_name | wait_event_type | backend_type > ---------+-------------+------------------+-----------------+-------------- > postgres | replication | slotsyncworker | Client | walsender > > Is it okay? > Slot sync worker uses 'libpqrcv_connect' for connection which sends 'replication'-'database' key-value pair as one of the connection options. And on the primary side, 'ProcessStartupPacket' on the basis of this key-value pair sets the process as walsender one (am_walsender = true). And thus this reflects as backend_type='walsender' in pg_stat_activity. I do not see any harm in this backend_type for slot-sync worker currently. This is on a similar line of connections used for logical-replications. And since a slot-sync worker also deals with wals-positions (lsns), it is okay to maintain backend_type as walsender unless you (or others) see any potential issue in doing that. So let me know. > ~~~ > 3. > As per current logic, If there are slots on primary with disabled > subscriptions, then, when standby is created it replicates these slots > but can't make them sync-ready until any activity happens on the > slots. > So, such slots stay in 'i' sync-state and get dropped when failover > happens. Now, if the subscriber tries to enable their existing > subscription after failover, it gives an error that the slot does not > exist. > yes, this is expected as Amit explained in [1]. But let me review if we need to document this case for disabled subscriptions. i.e. disabled subscription if enabled after promotion might not work. > ~~~ > 4. primary_slot_name GUC value test: > > When standby is started with a non-existing primary_slot_name, the > wal-receiver gives an error but the slot-sync worker does not raise > any error/warning. It is no-op though as it has a check 'if > (XLogRecPtrIsInvalid(WalRcv->latestWalEnd)) do nothing'. Is this > okay or shall the slot-sync worker too raise an error and exit? > > In another case, when standby is started with valid primary_slot_name, > but it is changed to some invalid value in runtime, then walreceiver > starts giving error but the slot-sync worker keeps on running. In this > case, unlike the previous case, it even did not go to no-op mode (as > it sees valid WalRcv->latestWalEnd from the earlier run) and keep > pinging primary repeatedly for slots. Shall here it should error out > or at least be no-op until we give a valid primary_slot_name? > I reviewed it. There is no way to test the existence/validity of 'primary_slot_name' on standby without making a connection to primary. If primary_slot_name is invalid from the start, slot-sync worker will be no-op (as you tested) as WalRecv->latestWalENd will be invalid, and if 'primary_slot_name' is changed to invalid on runtime, slot-sync worker will still keep on pinging primary. But that should be okay (in fact needed) as it needs to sync at-least the previous slot's positions (in case it is delayed in doing so for some reason earlier). And once the slots are up-to-date on standby, even if worker pings primary, it will not see any change in slots lsns and thus go for longer nap. I think, it is not worth the effort to introduce the complexity of checking validity of 'primary_slot_name' on primary from standby for this rare scenario. It will be good to know thoughts of others on above 3 points. thanks Shveta
pgsql-hackers by date: