Re: Allow logical failover slots to wait on synchronous replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Allow logical failover slots to wait on synchronous replication
Date
Msg-id CAA4eK1+5H4wUCg3afO=Jc_mHCCwGCadH0SxxXR=SQAPCBGcRuA@mail.gmail.com
Whole thread Raw
In response to Allow logical failover slots to wait on synchronous replication  (John H <johnhyvr@gmail.com>)
Responses Re: Allow logical failover slots to wait on synchronous replication
List pgsql-hackers
On Tue, Jun 11, 2024 at 4:21 AM John H <johnhyvr@gmail.com> wrote:
>
> Building on bf279ddd1c, this patch introduces a GUC
> 'standby_slot_names_from_syncrep' which allows logical failover slots
> to wait for changes to have been synchronously replicated before sending
> the decoded changes to logical subscribers.
>
> The existing 'standby_slot_names' isn't great for users who are running
> clusters with quorum-based synchronous replicas. For instance, if
> the user has  synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
> bit tedious to have to reconfigure the standby_slot_names to set it to
> the most updated 3 sync replicas whenever different sync replicas start
> lagging. In the event that both GUCs are set, 'standby_slot_names' takes
> precedence.
>
> I did some very brief pgbench runs to compare the latency. Client instance
> was running pgbench and 10 logical clients while the Postgres box hosted
> the writer and 5 synchronous replicas.
>
> There's a hit to TPS, which I'm thinking is due to more contention on the
> SyncRepLock, and that scales with the number of logical walsenders. I'm
> guessing we can avoid this if we introduce another set of
> lsn[NUM_SYNC_REP_WAIT_MODE] and have the logical walsenders check
> and wait on that instead but I wasn't sure if that's the right approach.
>
> pgbench numbers:
>
> // Empty standby_slot_names_from_syncrep
> query mode: simple
..
> latency average = 8.371 ms
> initial connection time = 7.963 ms
> tps = 955.651025 (without initial connection time)
>
> // standby_slot_names_from_syncrep = 'true'
> scaling factor: 200
...
> latency average = 8.834 ms
> initial connection time = 7.670 ms
> tps = 905.610230 (without initial connection time)
>

The reading indicates when you set 'standby_slot_names_from_syncrep',
the TPS reduces as compared to when it is not set. It would be better
to see the data comparing 'standby_slot_names_from_syncrep' and the
existing parameter 'standby_slot_names'.

I see the value in your idea but was wondering if can we do something
without introducing a new GUC for this. Can we make it a default
behavior that logical slots marked with a failover option will wait
for 'synchronous_standby_names' as per your patch's idea unless
'standby_slot_names' is specified? I don't know if there is any value
in setting the 'failover' option for a slot without specifying
'standby_slot_names', so was wondering if we can additionally tie it
to 'synchronous_standby_names'. Any better ideas?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Yugo NAGATA
Date:
Subject: Re: Document NULL
Next
From: Tom Lane
Date:
Subject: Re: Document NULL