Re: Allow logical failover slots to wait on synchronous replication - Mailing list pgsql-hackers

From John H
Subject Re: Allow logical failover slots to wait on synchronous replication
Date
Msg-id CA+-JvFvvVn7ab0fWCh+Bqj0khSYsnbvxiZsktj8k+OOB-V_uZw@mail.gmail.com
Whole thread Raw
In response to Re: Allow logical failover slots to wait on synchronous replication  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: Allow logical failover slots to wait on synchronous replication
Re: Allow logical failover slots to wait on synchronous replication
List pgsql-hackers
Hi,

Thanks Bertrand for taking a look at the patch.

On Mon, Jun 17, 2024 at 8:19 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

>
> +               int mode = SyncRepWaitMode;
>
> It's set to SyncRepWaitMode and then never change. Worth to get rid of "mode"?
>

I took a deeper look at this with GDB and I think it's necessary to
cache the value of "mode".
We first check:

if (mode == SYNC_REP_NO_WAIT)
return true;

However after this check it's possible to receive a SIGHUP changing
SyncRepWaitMode
to SYNC_REP_NO_WAIT (e.g. synchronous_commit = 'on' -> 'off'), leading
to lsn[-1].

> 2 ====
>
> +               static XLogRecPtr       lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr};
>
> I did some testing and saw that the lsn[] values were not always set to
> InvalidXLogRecPtr right after. It looks like that, in that case, we should
> avoid setting the lsn[] values at compile time. Then, what about?
>
> 1. remove the "static".
>
> or
>
> 2. keep the static but set the lsn[] values after its declaration.

I'd prefer to keep the static because it reduces unnecessary
contention on SyncRepLock if logical client has fallen behind.
I'll add a change with your second suggestion.



> 3 ====
>
> -       /*
> -        * Return false if not all the standbys have caught up to the specified
> -        * WAL location.
> -        */
> -       if (caught_up_slot_num != standby_slot_names_config->nslotnames)
> -               return false;
> +               if (!XLogRecPtrIsInvalid(lsn[mode]) && lsn[mode] >= wait_for_lsn)
> +                       return true;
>
> lsn[] values are/(should have been, see 2 above) just been initialized to
> InvalidXLogRecPtr so that XLogRecPtrIsInvalid(lsn[mode]) will always return
> true. I think this check is not needed then.

Removed.

> Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
> and standby_slot_names not empty?

I didn't think 'standby_slot_names' would impact TPS as much since
it's not grabbing the SyncRepLock but here's a quick test.
Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
pgbench all running from the same server.

Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5

Result with: standby_slot_names =
'replica_1,replica_2,replica_3,replica_4,replica_5'

latency average = 5.600 ms
latency stddev = 2.854 ms
initial connection time = 5.503 ms
tps = 714.148263 (without initial connection time)

Result with: standby_slot_names_from_syncrep = 'true',
synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'

latency average = 5.740 ms
latency stddev = 2.543 ms
initial connection time = 4.093 ms
tps = 696.776249 (without initial connection time)

Result with nothing set:

latency average = 5.090 ms
latency stddev = 3.467 ms
initial connection time = 4.989 ms
tps = 785.665963 (without initial connection time)

Again I think it's possible to improve the synchronous numbers if we
cache but I'll try that out in a bit.



--
John Hsu - Amazon Web Services



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Next
From: John H
Date:
Subject: Re: Allow logical failover slots to wait on synchronous replication