Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From shveta malik
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAJpy0uD5nmxXKDweoLHMKaSEiXs2TnMagi9BtqDp92GFzAfgBw@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Mon, Oct 9, 2023 at 10:51 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On 10/6/23 6:48 PM, Amit Kapila wrote:
> > On Wed, Oct 4, 2023 at 5:34 PM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> >>
> >> On 10/4/23 1:50 PM, shveta malik wrote:
> >>> On Wed, Oct 4, 2023 at 5:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>>>
> >>>> On Wed, Oct 4, 2023 at 11:55 AM Drouvot, Bertrand
> >>>> <bertranddrouvot.pg@gmail.com> wrote:
> >>>>>
> >>>>> On 10/4/23 6:26 AM, shveta malik wrote:
> >>>>>> On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> How about an alternate scheme where we define sync_slot_names on
> >>>>>>> standby but then store the physical_slot_name in the corresponding
> >>>>>>> logical slot (ReplicationSlotPersistentData) to be synced? So, the
> >>>>>>> standby will send the list of 'sync_slot_names' and the primary will
> >>>>>>> add the physical standby's slot_name in each of the corresponding
> >>>>>>> sync_slot. Now, if we do this then even after restart, we should be
> >>>>>>> able to know for which physical slot each logical slot needs to wait.
> >>>>>>> We can even provide an SQL API to reset the value of
> >>>>>>> standby_slot_names in logical slots as a way to unblock decoding in
> >>>>>>> case of emergency (for example, corresponding when physical standby
> >>>>>>> never comes up).
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Looks like a better approach to me. It solves most of the pain points like:
> >>>>>> 1) Avoids the need of multiple GUCs
> >>>>>> 2) Primary and standby need not to worry to be in sync if we maintain
> >>>>>> sync-slot-names GUC on both
> >>>>
> >>>> As per my understanding of this approach, we don't want
> >>>> 'sync-slot-names' to be set on the primary. Do you have a different
> >>>> understanding?
> >>>>
> >>>
> >>> Same understanding. We do not need it to be set on primary by user. It
> >>> will be GUC on standby and standby will convey it to primary.
> >>
> >> +1, same understanding here.
> >>
> >
> > At PGConf NYC, I had a brief discussion on this topic with Andres
> > where yet another approach to achieve this came up.
>
> Great!
>
> > Have a parameter
> > like enable_failover at the slot level (this will be persistent
> > information). Users can set it during the create/alter subscription or
> > via pg_create_logical_replication_slot(). Also, on physical standby,
> > there will be a parameter like enable_syncslot. All the physical
> > standbys that have set enable_syncslot will receive all the logical
> > slots that are marked as enable_failover. To me, whether to sync a
> > particular slot is a slot-level property, so defining it in this new
> > way seems reasonable.
>
> Yeah, as this is a slot-level property, I agree that this seems reasonable.
>
> Also that sounds more natural to me with this approach. The primary
> is really the one that "drives" which slots can be synced. I like it.
>
> One could also set enable_failover while creating a logical slot on a physical
> standby (so that cascading standbys could also have "extra slot" to sync as
> compare to "level 1" standbys).
>
> >
> > I think this will simplify the scheme a bit but still, the list of
> > physical standby's for which logical slots wait during decoding needs
> > to be maintained as we thought.
>
> Right.
>
> > But, how about with the above two
> > parameters (enable_failover and enable_syncslot), we have
> > standby_slot_names defined on the primary. That avoids the need to
> > store the list of standby_slot_names in logical slots and simplifies
> > the implementation quite a bit, right?
>
> Agree.
>
> > Now, one can think if we have a
> > parameter like 'standby_slot_names' then why do we need
> > enable_syncslot on physical standby but that will be required to
> > invoke sync worker which will pull logical slot's information?
>
> yes and enable_sync slot on the standby could also be used to "pause"
> the sync on standbys (by disabling the parameter) if one would want to
> (without the need to modify anything on the primary).
>
> > The
> > advantage of having standby_slot_names defined on primary is that we
> > can selectively wait on the subset of physical standbys where we are
> > syncing the slots.
>
> Yeah and this flexibility/filtering looks somehow mandatory to me.
>
> > I think this will be something similar to
> > 'synchronous_standby_names' in the sense that the physical standbys
> > mentioned in standby_slot_names will behave as synchronous copies with
> > respect to slots and after failover user can switch to one of these
> > physical standby and others can start following new master/publisher.
> >
> > Thoughts?
>
> I like the idea and I think that's the one that seems the more reasonable
> to me. I'd vote for this idea with:
>
> - standby_slot_names on the primary (could also be set on standbys in case of
> cascading context)
> - enable_failover at logical slot creation + API to enable/disable it at wish
> - enable_syncslot on the standbys
>

Thank You Amit and Bertrand for feedback on the new design.

PFA v23 patch set which attempts to implement the new proposed design
to handle sync candidates:
   a) The synchronize_slot_names GUC is removed.  Instead the
'enable_failover' property is added at the slot level which is
persistent. It can be set by the user using create-subscription
command. eg:   create subscription mysub connection '....' publication
mypub WITH (enable_failover = true);
   b) New GUC enable_syncslot is added on standbys to enable disable
slot-sync on standbys
   c) standby_slot_names are maintained on primary.

The patch 002 also addresses Peter's comments dated Oct 6 and Oct10.

Thank You Ajin for implementing 'create subscription' cmd changes to
support 'enable_failover' syntax.

This patch has not implemented below yet, it will be done in next version:
--Provide support to set/alter enable_failover using
alter-subscription and pg_create_logical_replication_slot
--Changes needed to support slot-synchronization on cascading standbys
--Display "enable_failover" property in pg_replication_slots. I think
it makes sense to do this.

thanks
Shveta

Attachment

pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Add null termination to string received in parallel apply worker
Next
From: shveta malik
Date:
Subject: Re: Synchronizing slots from primary to standby