Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Drouvot, Bertrand |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | 84a63890-b2fe-4826-8c77-aa6ad9bcd460@gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
Hi, On 10/4/23 6:26 AM, shveta malik wrote: > On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Tue, Oct 3, 2023 at 9:27 PM shveta malik <shveta.malik@gmail.com> wrote: >>> >>> On Tue, Oct 3, 2023 at 7:56 PM Drouvot, Bertrand >>> <bertranddrouvot.pg@gmail.com> wrote: >>>> >>>> Hi, >>>> >>>> On 10/3/23 12:54 PM, Amit Kapila wrote: >>>>> On Mon, Oct 2, 2023 at 11:39 AM Drouvot, Bertrand >>>>> <bertranddrouvot.pg@gmail.com> wrote: >>>>>> >>>>>> On 9/29/23 1:33 PM, Amit Kapila wrote: >>>>>>> On Thu, Sep 28, 2023 at 6:31 PM Drouvot, Bertrand >>>>>>> <bertranddrouvot.pg@gmail.com> wrote: >>>>>>>> >>>>>>> >>>>>>>> - probably open corner cases like: what if a standby is down? would that mean >>>>>>>> that synchronize_slot_names not being send to the primary would allow the decoding >>>>>>>> on the primary to go ahead? >>>>>>>> >>>>>>> >>>>>>> Good question. BTW, irrespective of whether we have >>>>>>> 'standby_slot_names' parameters or not, how should we behave if >>>>>>> standby is down? Say, if 'synchronize_slot_names' is only specified on >>>>>>> standby then in such a situation primary won't be even aware that some >>>>>>> of the logical walsenders need to wait. >>>>>> >>>>>> Exactly, that's why I was thinking keeping standby_slot_names to address >>>>>> this scenario. In such a case one could simply decide to keep or remove >>>>>> the associated physical replication slot from standby_slot_names. Keep would >>>>>> mean "wait" and removing would mean allow to decode on the primary. >>>>>> >>>>>>> OTOH, one can say that users >>>>>>> should configure 'synchronize_slot_names' on both primary and standby >>>>>>> but note that this value could be different for different standby's, >>>>>>> so we can't configure it on primary. >>>>>>> >>>>>> >>>>>> Yeah, I think that's a good use case for standby_slot_names, what do you think? >>>>>> >>>>> >>>>> But, even if we keep 'standby_slot_names' for this purpose, the >>>>> primary doesn't know the value of 'synchronize_slot_names' once the >>>>> standby is down and or the primary is restarted. So, how will we know >>>>> which logical WAL senders needs to wait for 'standby_slot_names'? >>>>> >>>> >>>> Yeah right, I also think we'd need: >>>> >>>> - synchronize_slot_names on both primary and standby >>>> >>>> But now we would need to take care of different standby having different values ( >>>> as you said up-thread).... >>>> >>>> Thinking out loud: What about a single GUC on the primary (not standby_slot_names nor >>>> synchronize_slot_names) but say logical_slots_wait_for_standby that could be a list of say >>>> "logical_slot_name:physical_slot". >>>> >>>> I think this GUC would help us define each walsender behavior (should the standby(s) >>>> be up or down): >>>> >>> >>> It may help in defining the walsender's behaviour better for sure. But >>> the problem I see once we start defining sync-slot-names on primary >>> (in any form whether as independent GUC or as above mapping GUC) is >>> that it needs to be then in sync with standbys, as each standby for >>> sure needs to maintain its own sync-slot-names GUC to make it aware of >>> what all it needs to sync. >> >> Yes, I also think so. Also, defining such a GUC where user wants to >> sync all the slots which would normally be the case would be a night >> mare for the users. >> >>> >>> This brings us to the original question of >>> how do we actually keep these configurations in sync between primary >>> and standby if we plan to maintain it on both? >>> >>> >>>> - don't wait if its associated logical_slot is not listed in this GUC >>>> - or wait based on its associated "list" of mapped physical slots (would probably >>>> have to deal with the min restart_lsn for all the corresponding mapped ones). >>>> >>>> I don't think we can avoid having to define at least one GUC on the primary (at least to >>>> handle the case of standby(s) being down). >>>> >> >> How about an alternate scheme where we define sync_slot_names on >> standby but then store the physical_slot_name in the corresponding >> logical slot (ReplicationSlotPersistentData) to be synced? So, the >> standby will send the list of 'sync_slot_names' and the primary will >> add the physical standby's slot_name in each of the corresponding >> sync_slot. Now, if we do this then even after restart, we should be >> able to know for which physical slot each logical slot needs to wait. >> We can even provide an SQL API to reset the value of >> standby_slot_names in logical slots as a way to unblock decoding in >> case of emergency (for example, corresponding when physical standby >> never comes up). >> > > > Looks like a better approach to me. It solves most of the pain points like: > 1) Avoids the need of multiple GUCs > 2) Primary and standby need not to worry to be in sync if we maintain > sync-slot-names GUC on both > 3) User still gets the flexibility to remove a standby from wait-lost > of primary's logical-walsenders' using reset SQL API. > Fully agree. > Now some initial thoughts: > 1) Since each logical slot could be needed to be synched by multiple > physical-standbys, so in ReplicationSlotPersistentData, we need to > hold a list of standby's name. So this brings us to question as in how > much shall we allocate initially in shared-memory? Shall it be for > max_replication_slots (worst case scenario) in each > ReplicationSlotPersistentData to hold physical-standby names? > Yeah, and even if we do the opposite means add the 'to-sync' logical replication slot in the ReplicationSlotPersistentData of the physical slot(s) the questions still remain (as a physical standby could want to sync multiples slots) > 2) If standby sends '*', then we need to update each logical-slot with > that standby-name. Or do we have better way to deal with '*'? Need to > think more on this. > > JFYI, on the similar line, currently in ReplicationSlotPersistentData, > we are maintaining a flag for slot-sync feature which is: > > bool synced; /* Is this a slot created by a > sync-slot worker? */ > > This flag currently holds significance only on physical-standby. This > has been added to distinguish between a slot created by user for > logical decoding purpose and the ones being synced from primary. BTW, what about having this "user visible" through pg_replication_slots? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: