On Fri, Apr 12, 2024 at 7:57 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Sat, Apr 6, 2024 at 12:25 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Fri, Apr 5, 2024 at 10:31 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > Please find v2. Changes are:
> >
> > Thanks for the patch. Here are some comments.
>
> Thanks for reviewing.
> >
> > 1. Can we have a clear saying in the shmem variable who's syncing at
> > the moment? Is it a slot sync worker or a backend via SQL function?
> > Perhaps turn "bool syncing;" to "SlotSyncSource sync_source;"
> >
> > typedef enum SlotSyncSource
> > {
> > SLOT_SYNC_NONE,
> > SLOT_SYNC_WORKER,
> > SLOT_SYNC_BACKEND,
> > } SlotSyncSource;
> >
> > Then, the check in ShutDownSlotSync can be:
> >
> > + /*
> > + * Return if neither the slot sync worker is running nor the function
> > + * pg_sync_replication_slots() is executing.
> > + */
> > + if ((SlotSyncCtx->pid == InvalidPid) &&
> > SlotSyncCtx->sync_source != SLOT_SYNC_BACKEND)
> > {
> >
I don't know if this will be help, especially after fixing the race
condition I mentioned. But otherwise, also, at this stage it doesn't
seem helpful to add the source of sync explicitly.
> > 2.
> > SyncReplicationSlots(WalReceiverConn *wrconn)
> > {
> > + /*
> > + * Startup process signaled the slot sync to stop, so if meanwhile user
> > + * has invoked slot sync SQL function, simply return.
> > + */
> > + SpinLockAcquire(&SlotSyncCtx->mutex);
> > + if (SlotSyncCtx->stopSignaled)
> > + {
> > + ereport(LOG,
> > + errmsg("skipping slot synchronization as slot sync
> > shutdown is signaled during promotion"));
> > +
> >
> > Unless I'm missing something, I think this can't detect if the backend
> > via SQL function is already half-way through syncing in
> > synchronize_one_slot. So, better move this check to (or also have it
> > there) slot sync loop that calls synchronize_one_slot. To avoid
> > spinlock acquisitions, we can perhaps do this check in when we acquire
> > the spinlock for synced flag.
>
> If the sync via SQL function is already half-way, then promotion
> should wait for it to finish. I don't think it is a good idea to move
> the check to synchronize_one_slot(). The sync-call should either not
> start (if it noticed the promotion) or finish the sync and then let
> promotion proceed. But I would like to know others' opinion on this.
>
Agreed.
--
With Regards,
Amit Kapila.