Re: Allow logical failover slots to wait on synchronous replication - Mailing list pgsql-hackers
From | John H |
---|---|
Subject | Re: Allow logical failover slots to wait on synchronous replication |
Date | |
Msg-id | CA+-JvFvvVn7ab0fWCh+Bqj0khSYsnbvxiZsktj8k+OOB-V_uZw@mail.gmail.com Whole thread Raw |
In response to | Re: Allow logical failover slots to wait on synchronous replication (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: Allow logical failover slots to wait on synchronous replication
Re: Allow logical failover slots to wait on synchronous replication |
List | pgsql-hackers |
Hi, Thanks Bertrand for taking a look at the patch. On Mon, Jun 17, 2024 at 8:19 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > + int mode = SyncRepWaitMode; > > It's set to SyncRepWaitMode and then never change. Worth to get rid of "mode"? > I took a deeper look at this with GDB and I think it's necessary to cache the value of "mode". We first check: if (mode == SYNC_REP_NO_WAIT) return true; However after this check it's possible to receive a SIGHUP changing SyncRepWaitMode to SYNC_REP_NO_WAIT (e.g. synchronous_commit = 'on' -> 'off'), leading to lsn[-1]. > 2 ==== > > + static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr}; > > I did some testing and saw that the lsn[] values were not always set to > InvalidXLogRecPtr right after. It looks like that, in that case, we should > avoid setting the lsn[] values at compile time. Then, what about? > > 1. remove the "static". > > or > > 2. keep the static but set the lsn[] values after its declaration. I'd prefer to keep the static because it reduces unnecessary contention on SyncRepLock if logical client has fallen behind. I'll add a change with your second suggestion. > 3 ==== > > - /* > - * Return false if not all the standbys have caught up to the specified > - * WAL location. > - */ > - if (caught_up_slot_num != standby_slot_names_config->nslotnames) > - return false; > + if (!XLogRecPtrIsInvalid(lsn[mode]) && lsn[mode] >= wait_for_lsn) > + return true; > > lsn[] values are/(should have been, see 2 above) just been initialized to > InvalidXLogRecPtr so that XLogRecPtrIsInvalid(lsn[mode]) will always return > true. I think this check is not needed then. Removed. > Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off > and standby_slot_names not empty? I didn't think 'standby_slot_names' would impact TPS as much since it's not grabbing the SyncRepLock but here's a quick test. Writer with 5 synchronous replicas, 10 pg_recvlogical clients and pgbench all running from the same server. Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5 Result with: standby_slot_names = 'replica_1,replica_2,replica_3,replica_4,replica_5' latency average = 5.600 ms latency stddev = 2.854 ms initial connection time = 5.503 ms tps = 714.148263 (without initial connection time) Result with: standby_slot_names_from_syncrep = 'true', synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' latency average = 5.740 ms latency stddev = 2.543 ms initial connection time = 4.093 ms tps = 696.776249 (without initial connection time) Result with nothing set: latency average = 5.090 ms latency stddev = 3.467 ms initial connection time = 4.989 ms tps = 785.665963 (without initial connection time) Again I think it's possible to improve the synchronous numbers if we cache but I'll try that out in a bit. -- John Hsu - Amazon Web Services
pgsql-hackers by date: