Re: Improve pg_sync_replication_slots() to wait for primary to advance - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
Date | |
Msg-id | CAJpy0uAxf0PS1atgTYuk=HU91adS17uUC_VJC6DXDmEFt9ziAg@mail.gmail.com Whole thread Raw |
In response to | Re: Improve pg_sync_replication_slots() to wait for primary to advance (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Improve pg_sync_replication_slots() to wait for primary to advance
|
List | pgsql-hackers |
On Fri, Jul 18, 2025 at 10:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Jul 18, 2025 at 10:45 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Fri, Jul 18, 2025 at 10:14 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Thu, Jul 17, 2025 at 9:34 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > On Wed, Jul 16, 2025 at 3:47 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > > > > > > > > I am not able to apply the patch to the latest head or even to a week > > > > > > back version. Can you please check and rebase? > > > > > > > > > > > > thanks > > > > > > Shveta > > > > > > > > > > Rebased. > > > > > > > > > > > > > Thanks. Please find a few comments: > > > > > > > > > > > > 1) > > > > /* Any slot with NULL in these fields should not have made it this far */ > > > > > > > > It is good to get rid of the case where we had checks for NULL > > > > confirmed_lsn and catalog_xmin (i.e. when slot was in RS_EPHEMERAL > > > > state), as that has already been checked by synchronize_slots() and > > > > such a slot will not even reach wait_for_primary_slot_catchup(). But a > > > > slot can still be invalidated on primary anytime, and thus during this > > > > wait, we should check for primary's invalidation as we were doing in > > > > v1. > > > > > > > > 2) > > > > + * If in SQL API synchronization, and we've been promoted, then no point > > > > > > > > extra space before promoted. > > > > > > > > 3) > > > > > > > > + if (!AmLogicalSlotSyncWorkerProcess() && PromoteIsTriggered()) > > > > > > > > We don't need 'AmLogicalSlotSyncWorkerProcess' as that is already > > > > checked at the beginning of this function. > > > > > > > > 4) > > > > + ereport(WARNING, > > > > + errmsg("aborting sync for slot \"%s\"", > > > > + remote_slot->name), > > > > + errdetail("Promotion occurred before this slot was fully" > > > > + " synchronized.")); > > > > + pfree(cmd.data); > > > > + > > > > + return false; > > > > > > > > a) Please add an error-code. > > > > > > > > b) Shall we change msg to > > > > > > > > errmsg("aborting sync for slot \"%s\"", > > > > remote_slot->name), > > > > errhint("%s cannot be executed once promotion is > > > > triggered.", > > > > > > > > "pg_sync_replication_slots()"))); > > > > > > > > > > > > > > > > 5) > > > > Instead of using PromoteIsTriggered, shall we rely on > > > > 'SlotSyncCtx->stopSignaled' as we do when we start this API. > > > > > > > > 6) > > > > In logicaldecoding.sgml, we can get rid of "Additionally, enabling > > > > sync_replication_slots on the standby is required" to make it same as > > > > what we had prior to the patch I pointed earlier. > > > > > > > > Or better we can refine it to below. Thoughts? > > > > > > > > The logical replication slots on the primary can be enabled for > > > > synchronization to the hot standby by using the failover parameter of > > > > pg_create_logical_replication_slot, or by using the failover option of > > > > CREATE SUBSCRIPTION during slot creation. After that, synchronization > > > > can be performed either manually by calling pg_sync_replication_slots > > > > on the standby, or automatically by enabling sync_replication_slots on > > > > the standby. When sync_replication_slots is enabled, the failover > > > > slots are periodically synchronized by the slot sync worker. For the > > > > synchronization to work, ..... > > > > > > I am wondering if we should provide an optional parameter to > > > pg_sync_replication_slots(), to control whether to wait for the slot > > > to be synced or just return with ERROR as it is doing now, default can > > > be wait. > > > > > > > Do you mean specifically in case of promotion or in general, as in do > > not wait for primary to catch-up (anytime) and exit and drop the > > temporary slot while exiting? > > I am specifically pointing to the exposed function > pg_sync_replication_slots() which was earlier non-blocking and was > giving an error if the primary slot for not catch-up, so we have > improved the functionality in this thread by making it wait for > primary to catch-up instead of just throwing an error. But my > question was since this is a user facing function so shall we keep the > old behavior intact with some optional parameters so that if the user > chooses not to wait they have options to do that? > Okay. I see your point. Yes, it was non-blocking earlier but it was not giving ERROR, it was just dumping in logilfe that primary is behind and thus slot-sync could not be done. If we continue using the non-blocking mode, there’s a risk that the API may never successfully sync the slots. This is because it eventually drops the temporary slot on exit, and when it tries to create a new one later on subsequent call, it’s likely that the new slot will again be ahead of the primary. This may happen if we have continuous ongoing writes on the primary and the logical slot is not being consumed at the same pace. My preference would be to avoid including such an option as it is confusing. With such an option in place, users may think that slot-sync is completed while that may not be the case. But if it's necessary for backward compatibility, it should be okay to provide it as a non-default option as you suggested. Would like to know what others think of this. thanks Shveta
pgsql-hackers by date: