Re: Improve pg_sync_replication_slots() to wait for primary to advance - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
Date | |
Msg-id | CAJpy0uDNnKwvZgoLo-EY=ye7Tw7C=eUDSr6oJoGvHM32N7TKxg@mail.gmail.com Whole thread Raw |
In response to | Re: Improve pg_sync_replication_slots() to wait for primary to advance (Ashutosh Sharma <ashu.coek88@gmail.com>) |
List | pgsql-hackers |
On Tue, Sep 16, 2025 at 5:12 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > On Tue, Sep 16, 2025 at 11:53 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Mon, Sep 15, 2025 at 6:17 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > > > On Wed, Sep 10, 2025 at 2:45 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > On Tue, Sep 9, 2025 at 5:37 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > > > > > > > Attached v11 patch addressing the above comments. > > > > > > > > > > > > > Please find a few comments: > > > > > > > > 1) > > > > > > > > + Retry is done after 2 > > > > + * sec wait. Exits early if promotion is triggered or certain critical > > > > > > > > We can say: Retry is done after SLOTSYNC_API_NAPTIME_MS wait. > > > > > > > > > > Changed. > > > > > > > 2) > > > > + /* > > > > + * Fetch remote slot info for the given slot_names. If slot_names is NIL, > > > > + * fetch all failover-enabled slots. Note that we reuse slot_names from > > > > + * the previous iteration; re-fetching all failover slots each time could > > > > + * cause an endless loop. > > > > + */ > > > > > > > > a) > > > > the previous iteration --> the first iteration. > > > > > > > > b) Also we can mention the reason why we take names from first > > > > iteration instead of going for pending ones alone, something like: > > > > > > > > Instead of reprocessing only the pending slots in each iteration, it's > > > > better to process all the slots received in the first iteration. > > > > This ensures that by the time we're done, all slots reflect the latest values. > > > > > > > > 3) > > > > + remote_slots = fetch_remote_slots(wrconn, slot_names); > > > > + > > > > + > > > > + /* Attempt to synchronize slots */ > > > > + synchronize_slots(wrconn, remote_slots, &slot_persistence_pending); > > > > > > > > One extra blank line can be removed > > > > > > > > > > Fixed. > > > > > > > 4) > > > > > > > > + /* Clean up slot_names if allocated in TopMemoryContext */ > > > > + if (slot_names) > > > > + list_free_deep(slot_names); > > > > > > > > Can we please move it before 'ReplicationSlotCleanup'. > > > > > > > > > > Fixed. > > > > > > > 5) > > > > In case of error in subsequent iteration, slot_names allocated from > > > > TopMemoryContext will be left unfreed? > > > > > > > > > > I've changed the logic so that even on error, slot_names are freed. > > > > I see that you have freed 'slot_names' in config-changed and promotion > > case but the ERROR can come from other flows as well. The idea was to > > somehow to free it (if possible) in slotsync_failure_callback() by > > passing it as an argument, like we do for 'wrconn'. > > > > Are you suggesting introducing a structure (for example, SlotSyncContext as shown below) that encapsulates both wrconnand slot_names, and then passing a pointer to this structure as the Datum argument to the slotsync_failure_callbackcleanup function, so that the callback can handle freeing wrconn and slot_names and maybe some othermembers within the structure that allocate memory? > Yes, as I do not see any other simpler way to take care of this memory-free in all ERROR scenarios. > /* > * Extended structure that can hold both connection and slot_names info > */ > typedef struct SlotSyncContext > { > > WalReceiverConn *wrconn; /* Must be first for compatibility */ > List *slot_names; /* Pointer to slot_names list */ > bool extended; /* Flag to indicate extended context */ > > } SlotSyncContext; > > SyncReplicationSlots(WalReceiverConn *wrconn) > { > > SlotSyncContext sync_ctx; > ... > ... > /* Initialize extended context */ > sync_ctx.wrconn = wrconn; > sync_ctx.slot_names_ptr = &slot_names; > sync_ctx.extended = true; > > > PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(&sync_ctx)); > Yes, like this. thanks Shveta
pgsql-hackers by date: