Re: Allow logical failover slots to wait on synchronous replication - Mailing list pgsql-hackers

From shveta malik
Subject Re: Allow logical failover slots to wait on synchronous replication
Date
Msg-id CAJpy0uAexFn9=W2dTssxrHmAHu2V08Jq6BfFpjDxKUpuMdYKDw@mail.gmail.com
Whole thread Raw
In response to Re: Allow logical failover slots to wait on synchronous replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Allow logical failover slots to wait on synchronous replication
Re: Allow logical failover slots to wait on synchronous replication
List pgsql-hackers
On Mon, Sep 16, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Sep 13, 2024 at 3:13 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Sep 12, 2024 at 3:04 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Wed, Sep 11, 2024 at 2:40 AM John H <johnhyvr@gmail.com> wrote:
> > > >
> > > > Hi Shveta,
> > > >
> > > > On Sun, Sep 8, 2024 at 11:16 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > >
> > > > > I was trying to have a look at the patch again, it doesn't apply on
> > > > > the head, needs rebase.
> > > > >
> > > >
> > > > Rebased with the latest changes.
> > > >
> > > > > Regarding 'mode = SyncRepWaitMode', FWIW, SyncRepWaitForLSN() also
> > > > > does in a similar way. It gets mode in local var initially and uses it
> > > > > later. See [1]. So isn't there a chance too that 'SyncRepWaitMode' can
> > > > > change in between.
> > > > >
> > > > > [1]:
> > > > > mode = SyncRepWaitMode;
> > > > > .....
> > > > > ....
> > > > >         if (!WalSndCtl->sync_standbys_defined ||
> > > > >                 lsn <= WalSndCtl->lsn[mode])
> > > > >         {
> > > > >                 LWLockRelease(SyncRepLock);
> > > > >                 return;
> > > > >         }
> > > >
> > > > You are right, thanks for the correction. I tried reproducing with GDB
> > > > where SyncRepWaitMode
> > > > changes due to pg_ctl reload but was unable to do so. It seems like
> > > > SIGHUP only sets ConfigReloadPending = true,
> > > > which gets processed in the next loop in WalSndLoop() and that's
> > > > probably where I was getting confused.
> > >
> > > yes, SIGHUP will be processed in the caller of
> > > StandbySlotsHaveCaughtup() (see ProcessConfigFile() in
> > > WaitForStandbyConfirmation()). So we can use 'SyncRepWaitMode'
> > > directly as it is not going to change in StandbySlotsHaveCaughtup()
> > > even if user triggers the change. And thus it was okay to use it even
> > > in the local variable too similar to how SyncRepWaitForLSN() does it.
> > >
> > > > In the latest patch, I've added:
> > > >
> > > > Assert(SyncRepWaitMode >= 0);
> > > >
> > > > which should be true since we call SyncRepConfigured() at the
> > > > beginning of StandbySlotsHaveCaughtup(),
> > > > and used SyncRepWaitMode directly.
> > >
> > > Yes, it should be okay I think. As SyncRepRequested() in the beginning
> > > makes sure synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH and
> > > thus SyncRepWaitMode should be mapped to either of
> > > WAIT_WRITE/FLUSH/APPLY etc. Will review further.
> > >
> >
> > I was wondering if we need somethign similar to SyncRepWaitForLSN() here:
> >
> >         /* Cap the level for anything other than commit to remote flush only. */
> >         if (commit)
> >                 mode = SyncRepWaitMode;
> >         else
> >                 mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);
> >
> > The header comment says:
> >  * 'lsn' represents the LSN to wait for.  'commit' indicates whether this LSN
> >  * represents a commit record.  If it doesn't, then we wait only for the WAL
> >  * to be flushed if synchronous_commit is set to the higher level of
> >  * remote_apply, because only commit records provide apply feedback.
> >
> > If we don't do something similar, then aren't there chances that we
> > keep on waiting on the wrong lsn[mode] for the case when mode =
> > SYNC_REP_WAIT_APPLY while sync-rep-wait infrastructure is updating
> > different mode's lsn. Is my understanding correct?
> >
>
> I think here we always need the lsn values corresponding to
> SYNC_REP_WAIT_FLUSH as we want to ensure that the WAL has to be
> flushed on physical standby before sending it to the logical
> subscriber. See ProcessStandbyReplyMessage() where we always use
> flushPtr to advance the physical_slot via
> PhysicalConfirmReceivedLocation().

I agree. So even if the mode is SYNC_REP_WAIT_WRITE (lower one) or
SYNC_REP_WAIT_APPLY (higher one), we need to wait for
lsn[SYNC_REP_WAIT_FLUSH].

> Another question aside from the above point, what if someone has
> specified logical subscribers in synchronous_standby_names? In the
> case of synchronized_standby_slots, we won't proceed with such slots.
>

Yes, it is a possibility. I have missed this point earlier. Now I
tried a case where I give a mix of logical subscriber and physical
standby in 'synchronous_standby_names' on pgHead, it even takes that
'mix' configuration and starts waiting accordingly.

synchronous_standby_names = 'FIRST 2(logicalsub_1, phy_standby_1,
phy_standby_2)';

thanks
Shveta



pgsql-hackers by date:

Previous
From: Marlene Reiterer
Date:
Subject: Re: Doc: Move standalone backup section, mention -X argument
Next
From: Tomas Vondra
Date:
Subject: Re: A starter task