Re: Allow async standbys wait for sync replication - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Allow async standbys wait for sync replication
Date
Msg-id 20220301.163431.1826638724406024793.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Allow async standbys wait for sync replication  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
(Now I understand what "async" mean here..)

At Mon, 28 Feb 2022 22:05:28 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in 
> On Tue, Mar 01, 2022 at 11:10:09AM +0530, Bharath Rupireddy wrote:
> > On Tue, Mar 1, 2022 at 12:27 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >> My feedback is specifically about this behavior.  I don't think we should
> >> spin in XLogSend*() waiting for an LSN to be synchronously replicated.  I
> >> think we should just choose the SendRqstPtr based on what is currently
> >> synchronously replicated.
> > 
> > Do you mean something like the following?
> > 
> > /* Main loop of walsender process that streams the WAL over Copy messages. */
> > static void
> > WalSndLoop(WalSndSendDataCallback send_data)
> > {
> >     /*
> >      * Loop until we reach the end of this timeline or the client requests to
> >      * stop streaming.
> >      */
> >     for (;;)
> >     {
> >         if (am_async_walsender && there_are_sync_standbys)
> >         {
> >              XLogRecPtr SendRqstLSN;
> >              XLogRecPtr SyncFlushLSN;
> > 
> >             SendRqstLSN = GetFlushRecPtr(NULL);
> >             LWLockAcquire(SyncRepLock, LW_SHARED);
> >             SyncFlushLSN = walsndctl->lsn[SYNC_REP_WAIT_FLUSH];
> >             LWLockRelease(SyncRepLock);
> > 
> >             if (SendRqstLSN > SyncFlushLSN)
> >                continue;
> >         }

The current trend is energy-savings. We never add a "wait for some
fixed time then exit if the condition makes, otherwise repeat" loop
for this kind of purpose where there's no guarantee that the loop
exits quite shortly.  Concretely we ought to rely on condition
variables to do that.

> Not quite.  Instead of "continue", I would set SendRqstLSN to SyncFlushLSN
> so that the WAL sender only sends up to the current synchronously

I'm not sure, but doesn't that makes walsender falsely believes it
have caught up to the bleeding edge of WAL?

> replicated LSN.  TBH there are probably other things that need to be
> considered (e.g., how do we ensure that the WAL sender sends the rest once
> it is replicated?), but I still think we should avoid spinning in the WAL
> sender waiting for WAL to be replicated.

It seems to me it would be something similar to
SyncRepReleaseWaiters().  Or it could be possible to consolidate this
feature into the function, I'm not sure, though.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Yura Sokolov
Date:
Subject: Re: BufferAlloc: don't take two simultaneous locks
Next
From: Michael Paquier
Date:
Subject: Re: Allow file inclusion in pg_hba and pg_ident files