RE: Improve pg_sync_replication_slots() to wait for primary to advance - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Improve pg_sync_replication_slots() to wait for primary to advance
Date
Msg-id TY4PR01MB16907630F7DE9263795978897947CA@TY4PR01MB16907.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Improve pg_sync_replication_slots() to wait for primary to advance  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Improve pg_sync_replication_slots() to wait for primary to advance
List pgsql-hackers
On Tuesday, February 17, 2026 12:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Feb 17, 2026 at 9:13 AM shveta malik <shveta.malik@gmail.com>
> wrote:
> >
> > On Mon, Feb 16, 2026 at 4:35 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > On Fri, Feb 13, 2026 at 7:54 AM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > Thanks for pushing! Here are the remaining patches.
> > > >
> > >
> > > One thing that bothers me about the remaining patch is that it could
> > > lead to infinite re-tires in the worst case. For example, in first
> > > try, slot-1 is not synced say due to physical replication delays in
> > > flushing WALs up to the confirmed_flush_lsn of that slot, then in
> > > next (re-)try, the same thing happened for slot-2, then in next
> > > (re-)try,
> > > slot-3 appears to invalidated on standby but it is valid on primary,
> > > and so on. What do you think?
> >
> > Yes, that is a possibility we cannot rule out. This can also happen
> > during the first invocation of the API (even without the new changes)
> > when we attempt to create new slots, they may remain in a temporary
> > state indefinitely. However, that risk is limited to the initial sync,
> > until the slots are persisted, which is somewhat expected behavior.
> >
> 
> Right.
> 
> > With the current changes though, the possibility of an indefinite wait
> > exists during every run. So the question becomes: what would be more
> > desirable for users -- for the API to finish with the risk that a few
> > slots are not synced, or for the API to wait longer to ensure that all
> > slots are properly synced?
> >
> > I think that if the primary use case of this API is when a user plans
> > to run it before a scheduled failover, then it would be better for the
> > API to wait and ensure everything is properly synced.
> >
> 
> I don't think we can guarantee that all slots are synced as per latest primary
> state in one invocation because some newly created slots can anyway be
> missed. So why take the risk of infinite waits in the API? I think it may be
> better to extend the usage of this API (probably with more parameters) based
> on more user feedback.
> 

I agree that we could wait for more user feedback before extending the
retry logic to persisted slots.

> > > Independent of whether we consider the entire patch, the following
> > > bit in the patch in useful as we retry to sync the slots via API.
> > > @@ -218,7 +219,7 @@ update_local_synced_slot(RemoteSlot
> > > *remote_slot, Oid remote_dbid)
> > >   * Can get here only if GUC 'synchronized_standby_slots' on the
> > >   * primary server was not configured correctly.
> > >   */
> > > - ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
> > > + ereport(LOG,
> > >   errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > >   errmsg("skipping slot synchronization because the received slot sync"
> > >      " LSN %X/%08X for slot \"%s\" is ahead of the standby position
> > > %X/%08X",
> > >
> >
> > yes. I agree.
> >
> 
> Let's wait for Hou-San's opinion on this one.

+1 for changing this.

Here is the patch set to convert elevel to LOG so that the function cyclically
retry until the standby catches up and the slot is successfully persisted.

Best Regards,
Hou zj

Attachment

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Row pattern recognition
Next
From: Daniil Davydov
Date:
Subject: Re: POC: Parallel processing of indexes in autovacuum