Re: Improve pg_sync_replication_slots() to wait for primary to advance - Mailing list pgsql-hackers

From Ashutosh Sharma
Subject Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date
Msg-id CAE9k0Pny3Ypo3qwWe1EmV5kxah9+aA8jUXz1GUwkc56TS-W53g@mail.gmail.com
Whole thread Raw
In response to Re: Improve pg_sync_replication_slots() to wait for primary to advance  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
List pgsql-hackers
On Fri, Sep 5, 2025 at 6:52 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Sep 3, 2025 at 11:58 AM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> >
> > On Fri, Aug 29, 2025 at 6:50 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > On Fri, Aug 29, 2025 at 11:42 AM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> > > +/*
> > > + * Flag used by pg_sync_replication_slots()
> > > + * to do retries if the slot did not persist while syncing.
> > > + */
> > > +static bool slot_persistence_pending = false;
> > >
> > > I don't think we need to keep a global variable for this. The variable
> > > is used only inside SyncReplicationSlots() and the call depth is not
> > > more than a few calls. From synchronize_slots(), before which the
> > > variable is reset and after which the variable is checked, to
> > > update_and_persist_local_synced_slot() which sets the variable, all
> > > the functions return bool. All of them can be made to return an
> > > integer status instead indicating the result of the operation. If we
> > > do so we could check the return value of synchronize_slots() to decide
> > > whether to retry or not, isntead of maintaining a global variable
> > > which has a much wider scope than required. It's difficult to keep it
> > > updated over the time.
> > >
> >
> > The problem is that all those calls synchronize_slots() and
> > update_and_persist_local_synced_slot() are shared with the slotsync
> > worker logic and API. Hence, changing this will affect slotsync_worker
> > logic as well. While the API needs to spefically retry only if the
> > initial sync fails, the slotsync worker will always be retrying. I
> > feel using a global variable is a more convenient way of doing this.
>
> AFAICS, it's a matter of expanding the scope of what's returned by
> those functions. The worker may not want to use the whole expanded
> scope but the API will use it. That shouldn't change the functionality
> of the worker, but it will help avoid the global variable - which have
> much wider scope and their maintenance can be prone to bugs.
>
> >
> > > @@ -1276,7 +1331,7 @@ wait_for_slot_activity(bool some_slot_updated)
> > >
> > > The function is too cute to be useful. The code should be part of
> > > ReplSlotSyncWorkerMain() just like other worker's main functions.
> > >
> >
> > But this wouldn't be part of this feature.
> >
> > > void
> > > SyncReplicationSlots(WalReceiverConn *wrconn)
> > > {
> > > PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
> > > {
> > >
> > > Shouldn't this function call CheckForInterrupts() somewhere in the
> > > loop since it could be potentially an infinite loop?
> >
> > I've tested this and I see that interrupts are being handled by
> > sending SIGQUIT and SIGINT to the backend process.
>
> Can you please point me to the code (the call to
> CHECK_FOR_INTERRUPTS()) which processes these interrupts while
> pg_sync_replication_slots() is executing, especially when the function
> is waiting while syncing a slot.
>

I noticed that the function libpqrcv_processTuples, which is invoked
by fetch_remote_slots, includes a CHECK_FOR_INTERRUPTS call. This is
currently helping in processing interrupts while we are in an infinite
loop within SyncReplicationSlots(). I’m just pointing this out based
on my observation while reviewing the changes in this patch. Ajin,
please correct me if I’m mistaken. If not, can we always rely on this
particular check for interrupts.

--
With Regards,
Ashutosh Sharma.



pgsql-hackers by date:

Previous
From: "Greg Burd"
Date:
Subject: Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock
Next
From: Robert Haas
Date:
Subject: Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock