Re: Allow logical failover slots to wait on synchronous replication - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Allow logical failover slots to wait on synchronous replication
Date
Msg-id Zo60gFD/ZFbfIxby@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Allow logical failover slots to wait on synchronous replication  (John H <johnhyvr@gmail.com>)
Responses Re: Allow logical failover slots to wait on synchronous replication
List pgsql-hackers
Hi,

On Mon, Jul 08, 2024 at 12:08:58PM -0700, John H wrote:
> I took a deeper look at this with GDB and I think it's necessary to
> cache the value of "mode".
> We first check:
> 
> if (mode == SYNC_REP_NO_WAIT)
> return true;
> 
> However after this check it's possible to receive a SIGHUP changing
> SyncRepWaitMode
> to SYNC_REP_NO_WAIT (e.g. synchronous_commit = 'on' -> 'off'), leading
> to lsn[-1].

What about adding "SyncRepWaitMode" as a third StandbySlotsHaveCaughtup()
parameter then? (so that the function will used whatever value was passed during
the call).

> > 2 ====
> >
> > +               static XLogRecPtr       lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr};
> >
> > I did some testing and saw that the lsn[] values were not always set to
> > InvalidXLogRecPtr right after. It looks like that, in that case, we should
> > avoid setting the lsn[] values at compile time. Then, what about?
> >
> > 1. remove the "static".
> >
> > or
> >
> > 2. keep the static but set the lsn[] values after its declaration.
> 
> I'd prefer to keep the static because it reduces unnecessary
> contention on SyncRepLock if logical client has fallen behind.
> I'll add a change with your second suggestion.

Got it, you want lsn[] to be initialized only one time and that each call to
StandbySlotsHaveCaughtup() relies on the values that were previously stored in 
lsn[] and then return if lsn[mode] >= wait_for_lsn.

Then I think that:

1 ===

That's worth additional comments in the code.

2 ===

+               if (!initialized)
+               {
+                       for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+                       {
+                               lsn[i] = InvalidXLogRecPtr;
+                       }
+               }

Looks like setting initialized to true is missing once done.

Also,

3 ===

+               /* Cache values to reduce contention on lock */
+               for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+               {
+                       lsn[i] = walsndctl->lsn[i];
+               }

NUM_SYNC_REP_WAIT_MODE is small but as the goal is the keep the lock time as
short as possible I wonder if it wouldn't be better to use memcpy() here instead
of this for loop.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Add a GUC check hook to ensure summarize_wal cannot be enabled when wal_level is minimal
Next
From: Nazir Bilal Yavuz
Date:
Subject: Re: Streaming I/O, vectored I/O (WIP)