Thread: Allow logical failover slots to wait on synchronous replication

Hi hackers,

Building on bf279ddd1c, this patch introduces a GUC
'standby_slot_names_from_syncrep' which allows logical failover slots
to wait for changes to have been synchronously replicated before sending
the decoded changes to logical subscribers.

The existing 'standby_slot_names' isn't great for users who are running
clusters with quorum-based synchronous replicas. For instance, if
the user has  synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
bit tedious to have to reconfigure the standby_slot_names to set it to
the most updated 3 sync replicas whenever different sync replicas start
lagging. In the event that both GUCs are set, 'standby_slot_names' takes
precedence.

I did some very brief pgbench runs to compare the latency. Client instance
was running pgbench and 10 logical clients while the Postgres box hosted
the writer and 5 synchronous replicas.

There's a hit to TPS, which I'm thinking is due to more contention on the
SyncRepLock, and that scales with the number of logical walsenders. I'm
guessing we can avoid this if we introduce another set of
lsn[NUM_SYNC_REP_WAIT_MODE] and have the logical walsenders check
and wait on that instead but I wasn't sure if that's the right approach.

pgbench numbers:

// Empty standby_slot_names_from_syncrep
query mode: simple
number of clients: 8
number of threads: 8
maximum number of tries: 1
duration: 1800 s
number of transactions actually processed: 1720173
number of failed transactions: 0 (0.000%)
latency average = 8.371 ms
initial connection time = 7.963 ms
tps = 955.651025 (without initial connection time)

// standby_slot_names_from_syncrep = 'true'
scaling factor: 200
query mode: simple
number of clients: 8
number of threads: 8
maximum number of tries: 1
duration: 1800 s
number of transactions actually processed: 1630105
number of failed transactions: 0 (0.000%)
latency average = 8.834 ms
initial connection time = 7.670 ms
tps = 905.610230 (without initial connection time)

Thanks,

--
John Hsu - Amazon Web Services

Attachment

Re: Allow logical failover slots to wait on synchronous replication

From
Nathan Bossart
Date:
On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote:
> The existing 'standby_slot_names' isn't great for users who are running
> clusters with quorum-based synchronous replicas. For instance, if
> the user has  synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
> bit tedious to have to reconfigure the standby_slot_names to set it to
> the most updated 3 sync replicas whenever different sync replicas start
> lagging. In the event that both GUCs are set, 'standby_slot_names' takes
> precedence.

Hm.  IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E"
to get the desired behavior today.  That might ordinarily be okay, but it
could cause logical replication to be held back unnecessarily if one of the
replicas falls behind for whatever reason.  A way to tie standby_slot_names
to synchronous replication instead does seem like it would be useful in
this case.

-- 
nathan



Re: Allow logical failover slots to wait on synchronous replication

From
Bertrand Drouvot
Date:
Hi,

On Mon, Jun 10, 2024 at 09:25:10PM -0500, Nathan Bossart wrote:
> On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote:
> > The existing 'standby_slot_names' isn't great for users who are running
> > clusters with quorum-based synchronous replicas. For instance, if
> > the user has  synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
> > bit tedious to have to reconfigure the standby_slot_names to set it to
> > the most updated 3 sync replicas whenever different sync replicas start
> > lagging. In the event that both GUCs are set, 'standby_slot_names' takes
> > precedence.
> 
> Hm.  IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E"
> to get the desired behavior today.  That might ordinarily be okay, but it
> could cause logical replication to be held back unnecessarily if one of the
> replicas falls behind for whatever reason.  A way to tie standby_slot_names
> to synchronous replication instead does seem like it would be useful in
> this case.

FWIW, I have the same understanding and also think your proposal would be
useful in this case.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Allow logical failover slots to wait on synchronous replication

From
Bertrand Drouvot
Date:
Hi,

On Tue, Jun 11, 2024 at 10:00:46AM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Mon, Jun 10, 2024 at 09:25:10PM -0500, Nathan Bossart wrote:
> > On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote:
> > > The existing 'standby_slot_names' isn't great for users who are running
> > > clusters with quorum-based synchronous replicas. For instance, if
> > > the user has  synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
> > > bit tedious to have to reconfigure the standby_slot_names to set it to
> > > the most updated 3 sync replicas whenever different sync replicas start
> > > lagging. In the event that both GUCs are set, 'standby_slot_names' takes
> > > precedence.
> > 
> > Hm.  IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E"
> > to get the desired behavior today.  That might ordinarily be okay, but it
> > could cause logical replication to be held back unnecessarily if one of the
> > replicas falls behind for whatever reason.  A way to tie standby_slot_names
> > to synchronous replication instead does seem like it would be useful in
> > this case.
> 
> FWIW, I have the same understanding and also think your proposal would be
> useful in this case.

A few random comments about v1:

1 ====

+               int mode = SyncRepWaitMode;

It's set to SyncRepWaitMode and then never change. Worth to get rid of "mode"?

2 ====

+               static XLogRecPtr       lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr};

I did some testing and saw that the lsn[] values were not always set to
InvalidXLogRecPtr right after. It looks like that, in that case, we should
avoid setting the lsn[] values at compile time. Then, what about?

1. remove the "static". 

or

2. keep the static but set the lsn[] values after its declaration.

3 ====

-       /*
-        * Return false if not all the standbys have caught up to the specified
-        * WAL location.
-        */
-       if (caught_up_slot_num != standby_slot_names_config->nslotnames)
-               return false;
+               if (!XLogRecPtrIsInvalid(lsn[mode]) && lsn[mode] >= wait_for_lsn)
+                       return true;

lsn[] values are/(should have been, see 2 above) just been initialized to
InvalidXLogRecPtr so that XLogRecPtrIsInvalid(lsn[mode]) will always return
true. I think this check is not needed then.

4 ====

> > > I did some very brief pgbench runs to compare the latency. Client instance
> > > was running pgbench and 10 logical clients while the Postgres box hosted
> > > the writer and 5 synchronous replicas.

> > > There's a hit to TPS

Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
and standby_slot_names not empty?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Jun 11, 2024 at 4:21 AM John H <johnhyvr@gmail.com> wrote:
>
> Building on bf279ddd1c, this patch introduces a GUC
> 'standby_slot_names_from_syncrep' which allows logical failover slots
> to wait for changes to have been synchronously replicated before sending
> the decoded changes to logical subscribers.
>
> The existing 'standby_slot_names' isn't great for users who are running
> clusters with quorum-based synchronous replicas. For instance, if
> the user has  synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
> bit tedious to have to reconfigure the standby_slot_names to set it to
> the most updated 3 sync replicas whenever different sync replicas start
> lagging. In the event that both GUCs are set, 'standby_slot_names' takes
> precedence.
>
> I did some very brief pgbench runs to compare the latency. Client instance
> was running pgbench and 10 logical clients while the Postgres box hosted
> the writer and 5 synchronous replicas.
>
> There's a hit to TPS, which I'm thinking is due to more contention on the
> SyncRepLock, and that scales with the number of logical walsenders. I'm
> guessing we can avoid this if we introduce another set of
> lsn[NUM_SYNC_REP_WAIT_MODE] and have the logical walsenders check
> and wait on that instead but I wasn't sure if that's the right approach.
>
> pgbench numbers:
>
> // Empty standby_slot_names_from_syncrep
> query mode: simple
..
> latency average = 8.371 ms
> initial connection time = 7.963 ms
> tps = 955.651025 (without initial connection time)
>
> // standby_slot_names_from_syncrep = 'true'
> scaling factor: 200
...
> latency average = 8.834 ms
> initial connection time = 7.670 ms
> tps = 905.610230 (without initial connection time)
>

The reading indicates when you set 'standby_slot_names_from_syncrep',
the TPS reduces as compared to when it is not set. It would be better
to see the data comparing 'standby_slot_names_from_syncrep' and the
existing parameter 'standby_slot_names'.

I see the value in your idea but was wondering if can we do something
without introducing a new GUC for this. Can we make it a default
behavior that logical slots marked with a failover option will wait
for 'synchronous_standby_names' as per your patch's idea unless
'standby_slot_names' is specified? I don't know if there is any value
in setting the 'failover' option for a slot without specifying
'standby_slot_names', so was wondering if we can additionally tie it
to 'synchronous_standby_names'. Any better ideas?

--
With Regards,
Amit Kapila.