Re: Improve pg_sync_replication_slots() to wait for primary to advance - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
Date | |
Msg-id | CAJpy0uASzojKbzinpNu29xuYGsSRnSo=22CLhXaSt_43TVoBhQ@mail.gmail.com Whole thread Raw |
In response to | Re: Improve pg_sync_replication_slots() to wait for primary to advance (Ajin Cherian <itsajin@gmail.com>) |
Responses |
Re: Improve pg_sync_replication_slots() to wait for primary to advance
|
List | pgsql-hackers |
On Tue, Aug 26, 2025 at 9:58 AM Ajin Cherian <itsajin@gmail.com> wrote: > > On Fri, Aug 22, 2025 at 3:44 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Aug 20, 2025 at 10:53 AM Ajin Cherian <itsajin@gmail.com> wrote: > > > > > > > > > I've removed them. > > > Attaching patch v8 addressing the above comments. > > > > > > > Thanks for the patch. Please find a few comments: > > > > 1) > > When the API is in progress, and meanwhile in another session we turn > > off hot_standby_feedback, the API session terminates abnormally. > > > > postgres=# SELECT pg_sync_replication_slots(); > > server closed the connection unexpectedly > > > > It seems slotsync_reread_config() is not adjusted for API. It does > > proc_exit assuming it is a worker process. > > > > I've removed the API calling ProcessSlotSyncInterrupts() as I don't > think the API needs to specifically handle shutdown requests, it works > without calling this. And the other config checks, I don't think the > API needs to handle, I think we should leave it to the user. > > > 2) > > slotsync_worker_onexit(): > > > > SlotSyncCtx->slot_persistence_pending = false; > > > > /* > > * If syncing_slots is true, it indicates that the process errored out > > * without resetting the flag. So, we need to clean up shared memory and > > * reset the flag here. > > */ > > if (syncing_slots) > > { > > SlotSyncCtx->syncing = false; > > syncing_slots = false; > > } > > > > Shall we reset slot_persistence_pending inside 'if (syncing_slots)'? > > slot_persistence_pending can not be true without syncing_slots being > > true. > > > > 3) > > reset_syncing_flag(): > > > > SpinLockAcquire(&SlotSyncCtx->mutex); > > SlotSyncCtx->syncing = false; > > + SlotSyncCtx->slot_persistence_pending = false; > > SpinLockRelease(&SlotSyncCtx->mutex); > > > > Here we are changing slot_persistence_pending under mutex, while at > > other places, it is not protected by mutex. Is it intentional here? > > > > 4) > > On rethinking, we maintain anything in shared memory if it has to be > > shared between a few processes. 'slot_persistence_pending' OTOH is > > required to be set and accessed by only one process at a time. Shall > > we move it out of SlotSyncCtxStruct and keep it static similar to > > 'syncing_slots'? Rest of the setting, resetting flow remains the same. > > What do you think? > > > > Yes, I agree. I have modified it accordingly. > > > > > 5) > > /* Build the query based on whether we're fetching all or refreshing > > specific slots */ > > > > Perhaps we can shift this comment to where we actually append > > target_slot_list. Better to have it something like: > > 'If target_slot_list is provided, construct the query only to fetch given slots' > > > > Changed. > > Attaching patch v9 addressing the above comments. > Thank You for the patches. Please find a few comments. 1) Change not needed: * without waiting for SLOTSYNC_RESTART_INTERVAL_SEC. + * */ 2) Regarding the naptime in API, I was thinking why not to use wait_for_slot_activity() directly? If there are no slots activity, it will keep on doubling the naptime starting from 2sec till max of 30sec. Thoughts? We can rename MIN_SLOTSYNC_WORKER_NAPTIME_MS and MAX_SLOTSYNC_WORKER_NAPTIME_MS to MIN_SLOTSYNC_NAPTIME_MS and MAX_SLOTSYNC_NAPTIME_MS in such a case. 3) + * target_slot_list - List of RemoteSlot structures to refresh, or NIL to + * fetch all failover slots Can we please change it to: List of failover logical slots to fetch from primary, or NIL to fetch all failover logical slots 4) In the worker, before each call to synchronize_slots(), we are starting a new transaction. It aligns with the previous implementation where StartTransaction was inside synchronize_slots(). But in API, we are doing StartTransaction once outside of the loop instead of doing before each synchronize_slots(), is it intentional? It may keep the transaction open for a long duration for the case where slots are not getting persisted soon. 5) With ProcessSlotSyncInterrupts() being removed from API, can you please check the behaviour of API on smart-shutdown and rest of the shutdown modes? It should behave like other APIs. And what happens if I change primary_conninfo to some non-existing server when the API is running. Does it error out or keep retrying? thanks Shveta
pgsql-hackers by date: