Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAJpy0uD5nmxXKDweoLHMKaSEiXs2TnMagi9BtqDp92GFzAfgBw@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Mon, Oct 9, 2023 at 10:51 AM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On 10/6/23 6:48 PM, Amit Kapila wrote: > > On Wed, Oct 4, 2023 at 5:34 PM Drouvot, Bertrand > > <bertranddrouvot.pg@gmail.com> wrote: > >> > >> On 10/4/23 1:50 PM, shveta malik wrote: > >>> On Wed, Oct 4, 2023 at 5:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > >>>> > >>>> On Wed, Oct 4, 2023 at 11:55 AM Drouvot, Bertrand > >>>> <bertranddrouvot.pg@gmail.com> wrote: > >>>>> > >>>>> On 10/4/23 6:26 AM, shveta malik wrote: > >>>>>> On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > >>>>>>> > >>>>>>> > >>>>>>> How about an alternate scheme where we define sync_slot_names on > >>>>>>> standby but then store the physical_slot_name in the corresponding > >>>>>>> logical slot (ReplicationSlotPersistentData) to be synced? So, the > >>>>>>> standby will send the list of 'sync_slot_names' and the primary will > >>>>>>> add the physical standby's slot_name in each of the corresponding > >>>>>>> sync_slot. Now, if we do this then even after restart, we should be > >>>>>>> able to know for which physical slot each logical slot needs to wait. > >>>>>>> We can even provide an SQL API to reset the value of > >>>>>>> standby_slot_names in logical slots as a way to unblock decoding in > >>>>>>> case of emergency (for example, corresponding when physical standby > >>>>>>> never comes up). > >>>>>>> > >>>>>> > >>>>>> > >>>>>> Looks like a better approach to me. It solves most of the pain points like: > >>>>>> 1) Avoids the need of multiple GUCs > >>>>>> 2) Primary and standby need not to worry to be in sync if we maintain > >>>>>> sync-slot-names GUC on both > >>>> > >>>> As per my understanding of this approach, we don't want > >>>> 'sync-slot-names' to be set on the primary. Do you have a different > >>>> understanding? > >>>> > >>> > >>> Same understanding. We do not need it to be set on primary by user. It > >>> will be GUC on standby and standby will convey it to primary. > >> > >> +1, same understanding here. > >> > > > > At PGConf NYC, I had a brief discussion on this topic with Andres > > where yet another approach to achieve this came up. > > Great! > > > Have a parameter > > like enable_failover at the slot level (this will be persistent > > information). Users can set it during the create/alter subscription or > > via pg_create_logical_replication_slot(). Also, on physical standby, > > there will be a parameter like enable_syncslot. All the physical > > standbys that have set enable_syncslot will receive all the logical > > slots that are marked as enable_failover. To me, whether to sync a > > particular slot is a slot-level property, so defining it in this new > > way seems reasonable. > > Yeah, as this is a slot-level property, I agree that this seems reasonable. > > Also that sounds more natural to me with this approach. The primary > is really the one that "drives" which slots can be synced. I like it. > > One could also set enable_failover while creating a logical slot on a physical > standby (so that cascading standbys could also have "extra slot" to sync as > compare to "level 1" standbys). > > > > > I think this will simplify the scheme a bit but still, the list of > > physical standby's for which logical slots wait during decoding needs > > to be maintained as we thought. > > Right. > > > But, how about with the above two > > parameters (enable_failover and enable_syncslot), we have > > standby_slot_names defined on the primary. That avoids the need to > > store the list of standby_slot_names in logical slots and simplifies > > the implementation quite a bit, right? > > Agree. > > > Now, one can think if we have a > > parameter like 'standby_slot_names' then why do we need > > enable_syncslot on physical standby but that will be required to > > invoke sync worker which will pull logical slot's information? > > yes and enable_sync slot on the standby could also be used to "pause" > the sync on standbys (by disabling the parameter) if one would want to > (without the need to modify anything on the primary). > > > The > > advantage of having standby_slot_names defined on primary is that we > > can selectively wait on the subset of physical standbys where we are > > syncing the slots. > > Yeah and this flexibility/filtering looks somehow mandatory to me. > > > I think this will be something similar to > > 'synchronous_standby_names' in the sense that the physical standbys > > mentioned in standby_slot_names will behave as synchronous copies with > > respect to slots and after failover user can switch to one of these > > physical standby and others can start following new master/publisher. > > > > Thoughts? > > I like the idea and I think that's the one that seems the more reasonable > to me. I'd vote for this idea with: > > - standby_slot_names on the primary (could also be set on standbys in case of > cascading context) > - enable_failover at logical slot creation + API to enable/disable it at wish > - enable_syncslot on the standbys > Thank You Amit and Bertrand for feedback on the new design. PFA v23 patch set which attempts to implement the new proposed design to handle sync candidates: a) The synchronize_slot_names GUC is removed. Instead the 'enable_failover' property is added at the slot level which is persistent. It can be set by the user using create-subscription command. eg: create subscription mysub connection '....' publication mypub WITH (enable_failover = true); b) New GUC enable_syncslot is added on standbys to enable disable slot-sync on standbys c) standby_slot_names are maintained on primary. The patch 002 also addresses Peter's comments dated Oct 6 and Oct10. Thank You Ajin for implementing 'create subscription' cmd changes to support 'enable_failover' syntax. This patch has not implemented below yet, it will be done in next version: --Provide support to set/alter enable_failover using alter-subscription and pg_create_logical_replication_slot --Changes needed to support slot-synchronization on cascading standbys --Display "enable_failover" property in pg_replication_slots. I think it makes sense to do this. thanks Shveta
Attachment
pgsql-hackers by date: