Thread: Allow logical failover slots to wait on synchronous replication
Hi hackers, Building on bf279ddd1c, this patch introduces a GUC 'standby_slot_names_from_syncrep' which allows logical failover slots to wait for changes to have been synchronously replicated before sending the decoded changes to logical subscribers. The existing 'standby_slot_names' isn't great for users who are running clusters with quorum-based synchronous replicas. For instance, if the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a bit tedious to have to reconfigure the standby_slot_names to set it to the most updated 3 sync replicas whenever different sync replicas start lagging. In the event that both GUCs are set, 'standby_slot_names' takes precedence. I did some very brief pgbench runs to compare the latency. Client instance was running pgbench and 10 logical clients while the Postgres box hosted the writer and 5 synchronous replicas. There's a hit to TPS, which I'm thinking is due to more contention on the SyncRepLock, and that scales with the number of logical walsenders. I'm guessing we can avoid this if we introduce another set of lsn[NUM_SYNC_REP_WAIT_MODE] and have the logical walsenders check and wait on that instead but I wasn't sure if that's the right approach. pgbench numbers: // Empty standby_slot_names_from_syncrep query mode: simple number of clients: 8 number of threads: 8 maximum number of tries: 1 duration: 1800 s number of transactions actually processed: 1720173 number of failed transactions: 0 (0.000%) latency average = 8.371 ms initial connection time = 7.963 ms tps = 955.651025 (without initial connection time) // standby_slot_names_from_syncrep = 'true' scaling factor: 200 query mode: simple number of clients: 8 number of threads: 8 maximum number of tries: 1 duration: 1800 s number of transactions actually processed: 1630105 number of failed transactions: 0 (0.000%) latency average = 8.834 ms initial connection time = 7.670 ms tps = 905.610230 (without initial connection time) Thanks, -- John Hsu - Amazon Web Services
Attachment
On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote: > The existing 'standby_slot_names' isn't great for users who are running > clusters with quorum-based synchronous replicas. For instance, if > the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a > bit tedious to have to reconfigure the standby_slot_names to set it to > the most updated 3 sync replicas whenever different sync replicas start > lagging. In the event that both GUCs are set, 'standby_slot_names' takes > precedence. Hm. IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E" to get the desired behavior today. That might ordinarily be okay, but it could cause logical replication to be held back unnecessarily if one of the replicas falls behind for whatever reason. A way to tie standby_slot_names to synchronous replication instead does seem like it would be useful in this case. -- nathan
Hi, On Mon, Jun 10, 2024 at 09:25:10PM -0500, Nathan Bossart wrote: > On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote: > > The existing 'standby_slot_names' isn't great for users who are running > > clusters with quorum-based synchronous replicas. For instance, if > > the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a > > bit tedious to have to reconfigure the standby_slot_names to set it to > > the most updated 3 sync replicas whenever different sync replicas start > > lagging. In the event that both GUCs are set, 'standby_slot_names' takes > > precedence. > > Hm. IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E" > to get the desired behavior today. That might ordinarily be okay, but it > could cause logical replication to be held back unnecessarily if one of the > replicas falls behind for whatever reason. A way to tie standby_slot_names > to synchronous replication instead does seem like it would be useful in > this case. FWIW, I have the same understanding and also think your proposal would be useful in this case. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Jun 11, 2024 at 10:00:46AM +0000, Bertrand Drouvot wrote: > Hi, > > On Mon, Jun 10, 2024 at 09:25:10PM -0500, Nathan Bossart wrote: > > On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote: > > > The existing 'standby_slot_names' isn't great for users who are running > > > clusters with quorum-based synchronous replicas. For instance, if > > > the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a > > > bit tedious to have to reconfigure the standby_slot_names to set it to > > > the most updated 3 sync replicas whenever different sync replicas start > > > lagging. In the event that both GUCs are set, 'standby_slot_names' takes > > > precedence. > > > > Hm. IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E" > > to get the desired behavior today. That might ordinarily be okay, but it > > could cause logical replication to be held back unnecessarily if one of the > > replicas falls behind for whatever reason. A way to tie standby_slot_names > > to synchronous replication instead does seem like it would be useful in > > this case. > > FWIW, I have the same understanding and also think your proposal would be > useful in this case. A few random comments about v1: 1 ==== + int mode = SyncRepWaitMode; It's set to SyncRepWaitMode and then never change. Worth to get rid of "mode"? 2 ==== + static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr}; I did some testing and saw that the lsn[] values were not always set to InvalidXLogRecPtr right after. It looks like that, in that case, we should avoid setting the lsn[] values at compile time. Then, what about? 1. remove the "static". or 2. keep the static but set the lsn[] values after its declaration. 3 ==== - /* - * Return false if not all the standbys have caught up to the specified - * WAL location. - */ - if (caught_up_slot_num != standby_slot_names_config->nslotnames) - return false; + if (!XLogRecPtrIsInvalid(lsn[mode]) && lsn[mode] >= wait_for_lsn) + return true; lsn[] values are/(should have been, see 2 above) just been initialized to InvalidXLogRecPtr so that XLogRecPtrIsInvalid(lsn[mode]) will always return true. I think this check is not needed then. 4 ==== > > > I did some very brief pgbench runs to compare the latency. Client instance > > > was running pgbench and 10 logical clients while the Postgres box hosted > > > the writer and 5 synchronous replicas. > > > There's a hit to TPS Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off and standby_slot_names not empty? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Jun 11, 2024 at 4:21 AM John H <johnhyvr@gmail.com> wrote: > > Building on bf279ddd1c, this patch introduces a GUC > 'standby_slot_names_from_syncrep' which allows logical failover slots > to wait for changes to have been synchronously replicated before sending > the decoded changes to logical subscribers. > > The existing 'standby_slot_names' isn't great for users who are running > clusters with quorum-based synchronous replicas. For instance, if > the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a > bit tedious to have to reconfigure the standby_slot_names to set it to > the most updated 3 sync replicas whenever different sync replicas start > lagging. In the event that both GUCs are set, 'standby_slot_names' takes > precedence. > > I did some very brief pgbench runs to compare the latency. Client instance > was running pgbench and 10 logical clients while the Postgres box hosted > the writer and 5 synchronous replicas. > > There's a hit to TPS, which I'm thinking is due to more contention on the > SyncRepLock, and that scales with the number of logical walsenders. I'm > guessing we can avoid this if we introduce another set of > lsn[NUM_SYNC_REP_WAIT_MODE] and have the logical walsenders check > and wait on that instead but I wasn't sure if that's the right approach. > > pgbench numbers: > > // Empty standby_slot_names_from_syncrep > query mode: simple .. > latency average = 8.371 ms > initial connection time = 7.963 ms > tps = 955.651025 (without initial connection time) > > // standby_slot_names_from_syncrep = 'true' > scaling factor: 200 ... > latency average = 8.834 ms > initial connection time = 7.670 ms > tps = 905.610230 (without initial connection time) > The reading indicates when you set 'standby_slot_names_from_syncrep', the TPS reduces as compared to when it is not set. It would be better to see the data comparing 'standby_slot_names_from_syncrep' and the existing parameter 'standby_slot_names'. I see the value in your idea but was wondering if can we do something without introducing a new GUC for this. Can we make it a default behavior that logical slots marked with a failover option will wait for 'synchronous_standby_names' as per your patch's idea unless 'standby_slot_names' is specified? I don't know if there is any value in setting the 'failover' option for a slot without specifying 'standby_slot_names', so was wondering if we can additionally tie it to 'synchronous_standby_names'. Any better ideas? -- With Regards, Amit Kapila.