Thread: Allow logical failover slots to wait on synchronous replication
Hi hackers, Building on bf279ddd1c, this patch introduces a GUC 'standby_slot_names_from_syncrep' which allows logical failover slots to wait for changes to have been synchronously replicated before sending the decoded changes to logical subscribers. The existing 'standby_slot_names' isn't great for users who are running clusters with quorum-based synchronous replicas. For instance, if the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a bit tedious to have to reconfigure the standby_slot_names to set it to the most updated 3 sync replicas whenever different sync replicas start lagging. In the event that both GUCs are set, 'standby_slot_names' takes precedence. I did some very brief pgbench runs to compare the latency. Client instance was running pgbench and 10 logical clients while the Postgres box hosted the writer and 5 synchronous replicas. There's a hit to TPS, which I'm thinking is due to more contention on the SyncRepLock, and that scales with the number of logical walsenders. I'm guessing we can avoid this if we introduce another set of lsn[NUM_SYNC_REP_WAIT_MODE] and have the logical walsenders check and wait on that instead but I wasn't sure if that's the right approach. pgbench numbers: // Empty standby_slot_names_from_syncrep query mode: simple number of clients: 8 number of threads: 8 maximum number of tries: 1 duration: 1800 s number of transactions actually processed: 1720173 number of failed transactions: 0 (0.000%) latency average = 8.371 ms initial connection time = 7.963 ms tps = 955.651025 (without initial connection time) // standby_slot_names_from_syncrep = 'true' scaling factor: 200 query mode: simple number of clients: 8 number of threads: 8 maximum number of tries: 1 duration: 1800 s number of transactions actually processed: 1630105 number of failed transactions: 0 (0.000%) latency average = 8.834 ms initial connection time = 7.670 ms tps = 905.610230 (without initial connection time) Thanks, -- John Hsu - Amazon Web Services
Attachment
On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote: > The existing 'standby_slot_names' isn't great for users who are running > clusters with quorum-based synchronous replicas. For instance, if > the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a > bit tedious to have to reconfigure the standby_slot_names to set it to > the most updated 3 sync replicas whenever different sync replicas start > lagging. In the event that both GUCs are set, 'standby_slot_names' takes > precedence. Hm. IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E" to get the desired behavior today. That might ordinarily be okay, but it could cause logical replication to be held back unnecessarily if one of the replicas falls behind for whatever reason. A way to tie standby_slot_names to synchronous replication instead does seem like it would be useful in this case. -- nathan
Hi, On Mon, Jun 10, 2024 at 09:25:10PM -0500, Nathan Bossart wrote: > On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote: > > The existing 'standby_slot_names' isn't great for users who are running > > clusters with quorum-based synchronous replicas. For instance, if > > the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a > > bit tedious to have to reconfigure the standby_slot_names to set it to > > the most updated 3 sync replicas whenever different sync replicas start > > lagging. In the event that both GUCs are set, 'standby_slot_names' takes > > precedence. > > Hm. IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E" > to get the desired behavior today. That might ordinarily be okay, but it > could cause logical replication to be held back unnecessarily if one of the > replicas falls behind for whatever reason. A way to tie standby_slot_names > to synchronous replication instead does seem like it would be useful in > this case. FWIW, I have the same understanding and also think your proposal would be useful in this case. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Jun 11, 2024 at 10:00:46AM +0000, Bertrand Drouvot wrote: > Hi, > > On Mon, Jun 10, 2024 at 09:25:10PM -0500, Nathan Bossart wrote: > > On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote: > > > The existing 'standby_slot_names' isn't great for users who are running > > > clusters with quorum-based synchronous replicas. For instance, if > > > the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a > > > bit tedious to have to reconfigure the standby_slot_names to set it to > > > the most updated 3 sync replicas whenever different sync replicas start > > > lagging. In the event that both GUCs are set, 'standby_slot_names' takes > > > precedence. > > > > Hm. IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E" > > to get the desired behavior today. That might ordinarily be okay, but it > > could cause logical replication to be held back unnecessarily if one of the > > replicas falls behind for whatever reason. A way to tie standby_slot_names > > to synchronous replication instead does seem like it would be useful in > > this case. > > FWIW, I have the same understanding and also think your proposal would be > useful in this case. A few random comments about v1: 1 ==== + int mode = SyncRepWaitMode; It's set to SyncRepWaitMode and then never change. Worth to get rid of "mode"? 2 ==== + static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr}; I did some testing and saw that the lsn[] values were not always set to InvalidXLogRecPtr right after. It looks like that, in that case, we should avoid setting the lsn[] values at compile time. Then, what about? 1. remove the "static". or 2. keep the static but set the lsn[] values after its declaration. 3 ==== - /* - * Return false if not all the standbys have caught up to the specified - * WAL location. - */ - if (caught_up_slot_num != standby_slot_names_config->nslotnames) - return false; + if (!XLogRecPtrIsInvalid(lsn[mode]) && lsn[mode] >= wait_for_lsn) + return true; lsn[] values are/(should have been, see 2 above) just been initialized to InvalidXLogRecPtr so that XLogRecPtrIsInvalid(lsn[mode]) will always return true. I think this check is not needed then. 4 ==== > > > I did some very brief pgbench runs to compare the latency. Client instance > > > was running pgbench and 10 logical clients while the Postgres box hosted > > > the writer and 5 synchronous replicas. > > > There's a hit to TPS Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off and standby_slot_names not empty? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Jun 11, 2024 at 4:21 AM John H <johnhyvr@gmail.com> wrote: > > Building on bf279ddd1c, this patch introduces a GUC > 'standby_slot_names_from_syncrep' which allows logical failover slots > to wait for changes to have been synchronously replicated before sending > the decoded changes to logical subscribers. > > The existing 'standby_slot_names' isn't great for users who are running > clusters with quorum-based synchronous replicas. For instance, if > the user has synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a > bit tedious to have to reconfigure the standby_slot_names to set it to > the most updated 3 sync replicas whenever different sync replicas start > lagging. In the event that both GUCs are set, 'standby_slot_names' takes > precedence. > > I did some very brief pgbench runs to compare the latency. Client instance > was running pgbench and 10 logical clients while the Postgres box hosted > the writer and 5 synchronous replicas. > > There's a hit to TPS, which I'm thinking is due to more contention on the > SyncRepLock, and that scales with the number of logical walsenders. I'm > guessing we can avoid this if we introduce another set of > lsn[NUM_SYNC_REP_WAIT_MODE] and have the logical walsenders check > and wait on that instead but I wasn't sure if that's the right approach. > > pgbench numbers: > > // Empty standby_slot_names_from_syncrep > query mode: simple .. > latency average = 8.371 ms > initial connection time = 7.963 ms > tps = 955.651025 (without initial connection time) > > // standby_slot_names_from_syncrep = 'true' > scaling factor: 200 ... > latency average = 8.834 ms > initial connection time = 7.670 ms > tps = 905.610230 (without initial connection time) > The reading indicates when you set 'standby_slot_names_from_syncrep', the TPS reduces as compared to when it is not set. It would be better to see the data comparing 'standby_slot_names_from_syncrep' and the existing parameter 'standby_slot_names'. I see the value in your idea but was wondering if can we do something without introducing a new GUC for this. Can we make it a default behavior that logical slots marked with a failover option will wait for 'synchronous_standby_names' as per your patch's idea unless 'standby_slot_names' is specified? I don't know if there is any value in setting the 'failover' option for a slot without specifying 'standby_slot_names', so was wondering if we can additionally tie it to 'synchronous_standby_names'. Any better ideas? -- With Regards, Amit Kapila.
Hi, Thanks Bertrand for taking a look at the patch. On Mon, Jun 17, 2024 at 8:19 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > + int mode = SyncRepWaitMode; > > It's set to SyncRepWaitMode and then never change. Worth to get rid of "mode"? > I took a deeper look at this with GDB and I think it's necessary to cache the value of "mode". We first check: if (mode == SYNC_REP_NO_WAIT) return true; However after this check it's possible to receive a SIGHUP changing SyncRepWaitMode to SYNC_REP_NO_WAIT (e.g. synchronous_commit = 'on' -> 'off'), leading to lsn[-1]. > 2 ==== > > + static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr}; > > I did some testing and saw that the lsn[] values were not always set to > InvalidXLogRecPtr right after. It looks like that, in that case, we should > avoid setting the lsn[] values at compile time. Then, what about? > > 1. remove the "static". > > or > > 2. keep the static but set the lsn[] values after its declaration. I'd prefer to keep the static because it reduces unnecessary contention on SyncRepLock if logical client has fallen behind. I'll add a change with your second suggestion. > 3 ==== > > - /* > - * Return false if not all the standbys have caught up to the specified > - * WAL location. > - */ > - if (caught_up_slot_num != standby_slot_names_config->nslotnames) > - return false; > + if (!XLogRecPtrIsInvalid(lsn[mode]) && lsn[mode] >= wait_for_lsn) > + return true; > > lsn[] values are/(should have been, see 2 above) just been initialized to > InvalidXLogRecPtr so that XLogRecPtrIsInvalid(lsn[mode]) will always return > true. I think this check is not needed then. Removed. > Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off > and standby_slot_names not empty? I didn't think 'standby_slot_names' would impact TPS as much since it's not grabbing the SyncRepLock but here's a quick test. Writer with 5 synchronous replicas, 10 pg_recvlogical clients and pgbench all running from the same server. Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5 Result with: standby_slot_names = 'replica_1,replica_2,replica_3,replica_4,replica_5' latency average = 5.600 ms latency stddev = 2.854 ms initial connection time = 5.503 ms tps = 714.148263 (without initial connection time) Result with: standby_slot_names_from_syncrep = 'true', synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' latency average = 5.740 ms latency stddev = 2.543 ms initial connection time = 4.093 ms tps = 696.776249 (without initial connection time) Result with nothing set: latency average = 5.090 ms latency stddev = 3.467 ms initial connection time = 4.989 ms tps = 785.665963 (without initial connection time) Again I think it's possible to improve the synchronous numbers if we cache but I'll try that out in a bit. -- John Hsu - Amazon Web Services
Hi Amit, Thanks for taking a look. On Tue, Jun 18, 2024 at 10:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > The reading indicates when you set 'standby_slot_names_from_syncrep', > the TPS reduces as compared to when it is not set. It would be better > to see the data comparing 'standby_slot_names_from_syncrep' and the > existing parameter 'standby_slot_names'. I added new benchmark numbers in the reply to Bertrand, but I'll include in this thread for posterity. Writer with 5 synchronous replicas, 10 pg_recvlogical clients and pgbench all running from the same server. Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5 Result with: standby_slot_names = 'replica_1,replica_2,replica_3,replica_4,replica_5' latency average = 5.600 ms latency stddev = 2.854 ms initial connection time = 5.503 ms tps = 714.148263 (without initial connection time) Result with: standby_slot_names_from_syncrep = 'true', synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' latency average = 5.740 ms latency stddev = 2.543 ms initial connection time = 4.093 ms tps = 696.776249 (without initial connection time) Result with nothing set: latency average = 5.090 ms latency stddev = 3.467 ms initial connection time = 4.989 ms tps = 785.665963 (without initial connection time) > Can we make it a default > behavior that logical slots marked with a failover option will wait > for 'synchronous_standby_names' as per your patch's idea unless > 'standby_slot_names' is specified? I don't know if there is any value > in setting the 'failover' option for a slot without specifying > 'standby_slot_names', so was wondering if we can additionally tie it > to 'synchronous_standby_names'. Any better ideas? > No, I think that works pretty cleanly actually. Reserving some special keyword isn't great which is the only other thing I can think of. I've updated the patch and tests to reflect that. Attached the patch that addresses these changes. -- John Hsu - Amazon Web Services
Attachment
Hi, On Mon, Jul 08, 2024 at 12:08:58PM -0700, John H wrote: > I took a deeper look at this with GDB and I think it's necessary to > cache the value of "mode". > We first check: > > if (mode == SYNC_REP_NO_WAIT) > return true; > > However after this check it's possible to receive a SIGHUP changing > SyncRepWaitMode > to SYNC_REP_NO_WAIT (e.g. synchronous_commit = 'on' -> 'off'), leading > to lsn[-1]. What about adding "SyncRepWaitMode" as a third StandbySlotsHaveCaughtup() parameter then? (so that the function will used whatever value was passed during the call). > > 2 ==== > > > > + static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr}; > > > > I did some testing and saw that the lsn[] values were not always set to > > InvalidXLogRecPtr right after. It looks like that, in that case, we should > > avoid setting the lsn[] values at compile time. Then, what about? > > > > 1. remove the "static". > > > > or > > > > 2. keep the static but set the lsn[] values after its declaration. > > I'd prefer to keep the static because it reduces unnecessary > contention on SyncRepLock if logical client has fallen behind. > I'll add a change with your second suggestion. Got it, you want lsn[] to be initialized only one time and that each call to StandbySlotsHaveCaughtup() relies on the values that were previously stored in lsn[] and then return if lsn[mode] >= wait_for_lsn. Then I think that: 1 === That's worth additional comments in the code. 2 === + if (!initialized) + { + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) + { + lsn[i] = InvalidXLogRecPtr; + } + } Looks like setting initialized to true is missing once done. Also, 3 === + /* Cache values to reduce contention on lock */ + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) + { + lsn[i] = walsndctl->lsn[i]; + } NUM_SYNC_REP_WAIT_MODE is small but as the goal is the keep the lock time as short as possible I wonder if it wouldn't be better to use memcpy() here instead of this for loop. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Jul 9, 2024 at 12:42 AM John H <johnhyvr@gmail.com> wrote: > > > > Can we make it a default > > behavior that logical slots marked with a failover option will wait > > for 'synchronous_standby_names' as per your patch's idea unless > > 'standby_slot_names' is specified? I don't know if there is any value > > in setting the 'failover' option for a slot without specifying > > 'standby_slot_names', so was wondering if we can additionally tie it > > to 'synchronous_standby_names'. Any better ideas? > > > > No, I think that works pretty cleanly actually. Reserving some special > keyword isn't great > which is the only other thing I can think of. I've updated the patch > and tests to reflect that. > > Attached the patch that addresses these changes. Thank You for the patch. I like the overall idea, it is a useful one and will make user experience better. Please find few comments. 1) I agree with the idea that instead of introducing new GUC, we can make failover logical slots wait on 'synchronous_standby_names', but this will leave user no option to unlink failover-enabled logical subscribers from the wait on synchronous standbys. We provide user a way to switch off automatic slot-sync by disabling 'sync_replication_slots' and we also provide user a way to manually sync the slots using function 'pg_sync_replication_slots()' and nowhere we make it mandatory to set 'synchronized_standby_slots'; in fact in docs, it is a recommended setting and not a mandatory one. User can always create failover slots, switch off automatic slot sync and disable wait on standbys by not specifying 'synchronized_standby_slots' and do the slot-sync and consistency checks manually when needed. I feel, for worst case scenarios, we should provide user an option to delink failover-enabled logical subscriptions from 'synchronous_standby_names'. We can have 'synchronized_standby_slots' (standby_slot_names) to accept one such option as in 'SAME_AS_SYNCREP_STANDBYS' or say 'DEFAULT'. So when 'synchronous_standby_names' is comma separated list, we pick those slots; if it is empty, then no wait on standbys, and if its value is 'DEFAULT' as configured by user, then go with 'synchronous_standby_names'. Thoughts? 2) When 'synchronized_standby_slots' is configured but standby named in it is down blocking logical replication, then we get a WARNING in subscriber's log file: WARNING: replication slot "standby_2" specified in parameter synchronized_standby_slots does not have active_pid. DETAIL: Logical replication is waiting on the standby associated with "standby_2". HINT: Consider starting standby associated with "standby_2" or amend parameter synchronized_standby_slots. But OTOH, when 'synchronous_standby_names' is configured instead of 'synchronized_standby_slots' and any of the standbys listed is down blocking logical replication, we do not get any sort of warning. It is inconsistent behavior. Also user might be left clueless on why subscribers are not getting changes. thanks Shveta
On Thu, Jul 18, 2024 at 9:25 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Jul 9, 2024 at 12:42 AM John H <johnhyvr@gmail.com> wrote: > > > > > > > Can we make it a default > > > behavior that logical slots marked with a failover option will wait > > > for 'synchronous_standby_names' as per your patch's idea unless > > > 'standby_slot_names' is specified? I don't know if there is any value > > > in setting the 'failover' option for a slot without specifying > > > 'standby_slot_names', so was wondering if we can additionally tie it > > > to 'synchronous_standby_names'. Any better ideas? > > > > > > > No, I think that works pretty cleanly actually. Reserving some special > > keyword isn't great > > which is the only other thing I can think of. I've updated the patch > > and tests to reflect that. > > > > Attached the patch that addresses these changes. > > Thank You for the patch. I like the overall idea, it is a useful one > and will make user experience better. Please find few comments. > > 1) > I agree with the idea that instead of introducing new GUC, we can make > failover logical slots wait on 'synchronous_standby_names', but this > will leave user no option to unlink failover-enabled logical > subscribers from the wait on synchronous standbys. We provide user a > way to switch off automatic slot-sync by disabling > 'sync_replication_slots' and we also provide user a way to manually > sync the slots using function 'pg_sync_replication_slots()' and > nowhere we make it mandatory to set 'synchronized_standby_slots'; in > fact in docs, it is a recommended setting and not a mandatory one. > User can always create failover slots, switch off automatic slot sync > and disable wait on standbys by not specifying > 'synchronized_standby_slots' and do the slot-sync and consistency > checks manually when needed. I feel, for worst case scenarios, we > should provide user an option to delink failover-enabled logical > subscriptions from 'synchronous_standby_names'. > > We can have 'synchronized_standby_slots' (standby_slot_names) to > accept one such option as in 'SAME_AS_SYNCREP_STANDBYS' or say > 'DEFAULT'. So when 'synchronous_standby_names' is comma separated > list, we pick those slots; if it is empty, then no wait on standbys, > and if its value is 'DEFAULT' as configured by the user, then go with > 'synchronous_standby_names'. Thoughts? One correction here ('synchronous_standby_names-->synchronized_standby_slots). Corrected version: So when 'synchronized_standby_slots' is comma separated list, we pick those slots; if it is empty, then no wait on standbys, and if its value is 'DEFAULT' as configured by user, then go with 'synchronous_standby_names'. Thoughts? > > 2) > When 'synchronized_standby_slots' is configured but standby named in > it is down blocking logical replication, then we get a WARNING in > subscriber's log file: > > WARNING: replication slot "standby_2" specified in parameter > synchronized_standby_slots does not have active_pid. > DETAIL: Logical replication is waiting on the standby associated with > "standby_2". > HINT: Consider starting standby associated with "standby_2" or amend > parameter synchronized_standby_slots. > > But OTOH, when 'synchronous_standby_names' is configured instead of > 'synchronized_standby_slots' and any of the standbys listed is down > blocking logical replication, we do not get any sort of warning. It is > inconsistent behavior. Also user might be left clueless on why > subscribers are not getting changes. > > thanks > Shveta
Hi Shveta, Thanks for taking a look at the patch. > > will leave user no option to unlink failover-enabled logical > > subscribers from the wait on synchronous standbys. That's a good point. I'm a bit biased in that I don't think there's a great reason why someone would want to replicate logical changes out of the synchronous cluster without it having been synchronously replicated but yes this would be different behavior compared to strictly the slot one. > ... > So when 'synchronized_standby_slots' is comma separated list, we pick > those slots; if it is empty, then no wait on standbys, and if its > value is 'DEFAULT' as configured by user, then go with > 'synchronous_standby_names'. Thoughts? I think I'd prefer having a separate GUC if the alternative is to reserve special keywords in 'synchronized_standby_slots' but I'm not sure if I feel strongly about that. > > 2) > > When 'synchronized_standby_slots' is configured but standby named in > > it is down blocking logical replication, then we get a WARNING in > > subscriber's log file: > > > > WARNING: replication slot "standby_2" specified in parameter > > synchronized_standby_slots does not have active_pid. > > DETAIL: Logical replication is waiting on the standby associated with > > "standby_2". > > HINT: Consider starting standby associated with "standby_2" or amend > > parameter synchronized_standby_slots. > > > > But OTOH, when 'synchronous_standby_names' is configured instead of > > 'synchronized_standby_slots' and any of the standbys listed is down > > blocking logical replication, we do not get any sort of warning. It is > > inconsistent behavior. Also user might be left clueless on why > > subscribers are not getting changes. Ah that's a gap. Let me add some logging/warning in a similar fashion. Although I think I'd have the warning be relatively generic (e.g. changes are blocked because they're not synchronously committed) Thanks, -- John Hsu - Amazon Web Services
Hi Bertrand, > 1 === > ... > That's worth additional comments in the code. There's this comment already about caching the value already, not sure if you prefer something more? /* Cache values to reduce contention on lock */ > 2 === > ... > Looks like setting initialized to true is missing once done. Thanks, will update. > 3 === > ... > NUM_SYNC_REP_WAIT_MODE is small but as the goal is the keep the lock time as > short as possible I wonder if it wouldn't be better to use memcpy() here instead > of this for loop. > It results in a "Wdiscarded-qualifiers" which is safe given we take the lock, but adds noise? What do you think? "slot.c:2756:46: warning: passing argument 2 of ‘memcpy’ discards ‘volatile’ qualifier from pointer target type [-Wdiscarded-qualifiers]" Thanks, -- John Hsu - Amazon Web Services
On Fri, Jul 19, 2024 at 2:52 AM John H <johnhyvr@gmail.com> wrote: > > Hi Shveta, > > Thanks for taking a look at the patch. > > > > will leave user no option to unlink failover-enabled logical > > > subscribers from the wait on synchronous standbys. > > That's a good point. I'm a bit biased in that I don't think there's a > great reason why someone would > want to replicate logical changes out of the synchronous cluster > without it having been synchronously replicated > but yes this would be different behavior compared to strictly the slot one. > > > ... > > So when 'synchronized_standby_slots' is comma separated list, we pick > > those slots; if it is empty, then no wait on standbys, and if its > > value is 'DEFAULT' as configured by user, then go with > > 'synchronous_standby_names'. Thoughts? > > I think I'd prefer having a separate GUC if the alternative is to reserve > special keywords in 'synchronized_standby_slots' but I'm not sure if I > feel strongly about that. My only concern is, earlier we provided a way to set the failover property of slots even without mandatorily wait on physical standbys. But now we will be changing this behaviour. Okay, we can see what other comments. If we plan to go this way, we can change docs to clearly mention this. > > > 2) > > > When 'synchronized_standby_slots' is configured but standby named in > > > it is down blocking logical replication, then we get a WARNING in > > > subscriber's log file: > > > > > > WARNING: replication slot "standby_2" specified in parameter > > > synchronized_standby_slots does not have active_pid. > > > DETAIL: Logical replication is waiting on the standby associated with > > > "standby_2". > > > HINT: Consider starting standby associated with "standby_2" or amend > > > parameter synchronized_standby_slots. > > > > > > But OTOH, when 'synchronous_standby_names' is configured instead of > > > 'synchronized_standby_slots' and any of the standbys listed is down > > > blocking logical replication, we do not get any sort of warning. It is > > > inconsistent behavior. Also user might be left clueless on why > > > subscribers are not getting changes. > > Ah that's a gap. Let me add some logging/warning in a similar fashion. > Although I think I'd have the warning be relatively generic (e.g. > changes are blocked because > they're not synchronously committed) > okay, sounds good. thanks Shveta
On Mon, Jul 22, 2024 at 9:12 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Fri, Jul 19, 2024 at 2:52 AM John H <johnhyvr@gmail.com> wrote: > > > > Hi Shveta, > > > > Thanks for taking a look at the patch. > > > > > > will leave user no option to unlink failover-enabled logical > > > > subscribers from the wait on synchronous standbys. > > > > That's a good point. I'm a bit biased in that I don't think there's a > > great reason why someone would > > want to replicate logical changes out of the synchronous cluster > > without it having been synchronously replicated > > but yes this would be different behavior compared to strictly the slot one. > > > > > ... > > > So when 'synchronized_standby_slots' is comma separated list, we pick > > > those slots; if it is empty, then no wait on standbys, and if its > > > value is 'DEFAULT' as configured by user, then go with > > > 'synchronous_standby_names'. Thoughts? > > > > I think I'd prefer having a separate GUC if the alternative is to reserve > > special keywords in 'synchronized_standby_slots' but I'm not sure if I > > feel strongly about that. > > My only concern is, earlier we provided a way to set the failover > property of slots even without mandatorily wait on physical standbys. > But now we will be changing this behaviour. > Adding a new GUC as John suggests addressing this concern is one way to go but we should think some more before adding a new GUC. Then second as you are proposing to add a special value for GUC synchronized_standby_slots will also have a downside in that it will create dependency among two GUCs (synchronized_standby_slots and synchronous_standby_names) which can also make the code complex. Yet another possibility is to have a slot-level parameter (something like failover_wait_for) which can be used to decide the GUC preference for failover-enabled slots. As this is a new feature and we haven't got much feedback from users so like John, I am also not very sure how much merit we have in retaining the old behavior where failover slots don't need to wait for any of the standbys. But anyway, we have at least some escape route where logical subscribers keep on waiting for some physical standby that is down to come back and one may want to use that in some situations, so there is clearly some value in retaining old behavior. -- With Regards, Amit Kapila.
On Tue, Jul 9, 2024 at 12:39 AM John H <johnhyvr@gmail.com> wrote: > > > Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off > > and standby_slot_names not empty? > > I didn't think 'standby_slot_names' would impact TPS as much since > it's not grabbing the SyncRepLock but here's a quick test. > Writer with 5 synchronous replicas, 10 pg_recvlogical clients and > pgbench all running from the same server. > > Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5 > > Result with: standby_slot_names = > 'replica_1,replica_2,replica_3,replica_4,replica_5' > > latency average = 5.600 ms > latency stddev = 2.854 ms > initial connection time = 5.503 ms > tps = 714.148263 (without initial connection time) > > Result with: standby_slot_names_from_syncrep = 'true', > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' > > latency average = 5.740 ms > latency stddev = 2.543 ms > initial connection time = 4.093 ms > tps = 696.776249 (without initial connection time) > > Result with nothing set: > > latency average = 5.090 ms > latency stddev = 3.467 ms > initial connection time = 4.989 ms > tps = 785.665963 (without initial connection time) > > Again I think it's possible to improve the synchronous numbers if we > cache but I'll try that out in a bit. > Okay, so the tests done till now conclude that we won't get the benefit by using 'standby_slot_names_from_syncrep'. Now, if we increase the number of standby's in both lists and still keep ANY 3 in synchronous_standby_names then the results may vary. We should try to find out if there is a performance benefit with the use of synchronous_standby_names in the normal configurations like the one you used in the above tests to prove the value of this patch. -- With Regards, Amit Kapila.
On Tue, Jul 23, 2024 at 10:35 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jul 9, 2024 at 12:39 AM John H <johnhyvr@gmail.com> wrote: > > > > > Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off > > > and standby_slot_names not empty? > > > > I didn't think 'standby_slot_names' would impact TPS as much since > > it's not grabbing the SyncRepLock but here's a quick test. > > Writer with 5 synchronous replicas, 10 pg_recvlogical clients and > > pgbench all running from the same server. > > > > Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5 > > > > Result with: standby_slot_names = > > 'replica_1,replica_2,replica_3,replica_4,replica_5' > > > > latency average = 5.600 ms > > latency stddev = 2.854 ms > > initial connection time = 5.503 ms > > tps = 714.148263 (without initial connection time) > > > > Result with: standby_slot_names_from_syncrep = 'true', > > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' > > > > latency average = 5.740 ms > > latency stddev = 2.543 ms > > initial connection time = 4.093 ms > > tps = 696.776249 (without initial connection time) > > > > Result with nothing set: > > > > latency average = 5.090 ms > > latency stddev = 3.467 ms > > initial connection time = 4.989 ms > > tps = 785.665963 (without initial connection time) > > > > Again I think it's possible to improve the synchronous numbers if we > > cache but I'll try that out in a bit. > > > > Okay, so the tests done till now conclude that we won't get the > benefit by using 'standby_slot_names_from_syncrep'. Now, if we > increase the number of standby's in both lists and still keep ANY 3 in > synchronous_standby_names then the results may vary. We should try to > find out if there is a performance benefit with the use of > synchronous_standby_names in the normal configurations like the one > you used in the above tests to prove the value of this patch. > I didn't fully understand the parameters mentioned above, specifically what 'latency stddev' and 'latency average' represent.. But shouldn't we see the benefit/value of this patch by having a setup where a particular standby is slow in sending the response back to primary (could be due to network lag or other reasons) and then measuring the latency in receiving changes on failover-enabled logical subscribers? We can perform this test with both of the below settings and say make D and E slow in sending responses: 1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' 2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot. thanks Shveta
On Fri, Jul 26, 2024 at 3:28 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Jul 23, 2024 at 10:35 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Jul 9, 2024 at 12:39 AM John H <johnhyvr@gmail.com> wrote: > > > > > > > Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off > > > > and standby_slot_names not empty? > > > > > > I didn't think 'standby_slot_names' would impact TPS as much since > > > it's not grabbing the SyncRepLock but here's a quick test. > > > Writer with 5 synchronous replicas, 10 pg_recvlogical clients and > > > pgbench all running from the same server. > > > > > > Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5 > > > > > > Result with: standby_slot_names = > > > 'replica_1,replica_2,replica_3,replica_4,replica_5' > > > > > > latency average = 5.600 ms > > > latency stddev = 2.854 ms > > > initial connection time = 5.503 ms > > > tps = 714.148263 (without initial connection time) > > > > > > Result with: standby_slot_names_from_syncrep = 'true', > > > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' > > > > > > latency average = 5.740 ms > > > latency stddev = 2.543 ms > > > initial connection time = 4.093 ms > > > tps = 696.776249 (without initial connection time) > > > > > > Result with nothing set: > > > > > > latency average = 5.090 ms > > > latency stddev = 3.467 ms > > > initial connection time = 4.989 ms > > > tps = 785.665963 (without initial connection time) > > > > > > Again I think it's possible to improve the synchronous numbers if we > > > cache but I'll try that out in a bit. > > > > > > > Okay, so the tests done till now conclude that we won't get the > > benefit by using 'standby_slot_names_from_syncrep'. Now, if we > > increase the number of standby's in both lists and still keep ANY 3 in > > synchronous_standby_names then the results may vary. We should try to > > find out if there is a performance benefit with the use of > > synchronous_standby_names in the normal configurations like the one > > you used in the above tests to prove the value of this patch. > > > > I didn't fully understand the parameters mentioned above, specifically > what 'latency stddev' and 'latency average' represent.. But shouldn't > we see the benefit/value of this patch by having a setup where a > particular standby is slow in sending the response back to primary > (could be due to network lag or other reasons) and then measuring the > latency in receiving changes on failover-enabled logical subscribers? > We can perform this test with both of the below settings and say make > D and E slow in sending responses: > 1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' > 2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot. > Yes, I also expect the patch should perform better in such a scenario but it is better to test it. Also, irrespective of that, we should investigate why the reported case is slower for synchronous_standby_names and see if we can improve it. BTW, you for 2), I think you wanted to say synchronized_standby_slots, not standby_slot_names. We have recently changed the GUC name. -- With Regards, Amit Kapila.
On Fri, Jul 26, 2024 at 5:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jul 26, 2024 at 3:28 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Tue, Jul 23, 2024 at 10:35 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Jul 9, 2024 at 12:39 AM John H <johnhyvr@gmail.com> wrote: > > > > > > > > > Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off > > > > > and standby_slot_names not empty? > > > > > > > > I didn't think 'standby_slot_names' would impact TPS as much since > > > > it's not grabbing the SyncRepLock but here's a quick test. > > > > Writer with 5 synchronous replicas, 10 pg_recvlogical clients and > > > > pgbench all running from the same server. > > > > > > > > Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5 > > > > > > > > Result with: standby_slot_names = > > > > 'replica_1,replica_2,replica_3,replica_4,replica_5' > > > > > > > > latency average = 5.600 ms > > > > latency stddev = 2.854 ms > > > > initial connection time = 5.503 ms > > > > tps = 714.148263 (without initial connection time) > > > > > > > > Result with: standby_slot_names_from_syncrep = 'true', > > > > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' > > > > > > > > latency average = 5.740 ms > > > > latency stddev = 2.543 ms > > > > initial connection time = 4.093 ms > > > > tps = 696.776249 (without initial connection time) > > > > > > > > Result with nothing set: > > > > > > > > latency average = 5.090 ms > > > > latency stddev = 3.467 ms > > > > initial connection time = 4.989 ms > > > > tps = 785.665963 (without initial connection time) > > > > > > > > Again I think it's possible to improve the synchronous numbers if we > > > > cache but I'll try that out in a bit. > > > > > > > > > > Okay, so the tests done till now conclude that we won't get the > > > benefit by using 'standby_slot_names_from_syncrep'. Now, if we > > > increase the number of standby's in both lists and still keep ANY 3 in > > > synchronous_standby_names then the results may vary. We should try to > > > find out if there is a performance benefit with the use of > > > synchronous_standby_names in the normal configurations like the one > > > you used in the above tests to prove the value of this patch. > > > > > > > I didn't fully understand the parameters mentioned above, specifically > > what 'latency stddev' and 'latency average' represent.. But shouldn't > > we see the benefit/value of this patch by having a setup where a > > particular standby is slow in sending the response back to primary > > (could be due to network lag or other reasons) and then measuring the > > latency in receiving changes on failover-enabled logical subscribers? > > We can perform this test with both of the below settings and say make > > D and E slow in sending responses: > > 1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' > > 2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot. > > > > Yes, I also expect the patch should perform better in such a scenario > but it is better to test it. Also, irrespective of that, we should > investigate why the reported case is slower for > synchronous_standby_names and see if we can improve it. +1 > BTW, you for 2), I think you wanted to say synchronized_standby_slots, > not standby_slot_names. We have recently changed the GUC name. yes, sorry, synchronized_standby_slots it is. thanks Shveta
Hi John, On Thu, Jul 18, 2024 at 02:22:08PM -0700, John H wrote: > Hi Bertrand, > > > 1 === > > ... > > That's worth additional comments in the code. > > There's this comment already about caching the value already, not sure > if you prefer something more? > > /* Cache values to reduce contention on lock */ Yeah, at the same place as the static lsn[] declaration, something like: static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE]; /* cached LSNs */ but that may just be a matter of taste. > > 3 === > > ... > > NUM_SYNC_REP_WAIT_MODE is small but as the goal is the keep the lock time as > > short as possible I wonder if it wouldn't be better to use memcpy() here instead > > of this for loop. > > > > It results in a "Wdiscarded-qualifiers" which is safe given we take > the lock, but adds noise? > What do you think? > > "slot.c:2756:46: warning: passing argument 2 of ‘memcpy’ discards > ‘volatile’ qualifier from pointer target type > [-Wdiscarded-qualifiers]" Right, we may want to cast it then but given that the for loop is "small" I think that's also fine to keep the for loop. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi Shveta, On Sun, Jul 21, 2024 at 8:42 PM shveta malik <shveta.malik@gmail.com> wrote: > > Ah that's a gap. Let me add some logging/warning in a similar fashion. > > Although I think I'd have the warning be relatively generic (e.g. > > changes are blocked because > > they're not synchronously committed) > > > > okay, sounds good. > > thanks > Shveta I took a look at having similar warnings the existing 'synchronized_standby_slots' feature has and I don't think it's particularly feasible. The checks/warnings for 'synchronized_standby_slots' are intended to protect against misconfiguration. They consist of slot validation (valid slot_name, not logical slot, slot has not been invalidated), and whether or not the slot is active. I don't think there's a "misconfiguration" equivalent for waiting on synchronous_commit. With the current proposal, once you have (synchronous_commit enabled && failover_slots), logical decoding is dependent on whether or not the writes have been replicated to a synchronous replica. If there is no data being replicated out of the logical slot, it is because from the perspective of the database no writes have been committed yet. I don't think it would make sense to add logging/warning as to why a transaction is still not committed (e.g. which potential replica is the one lagging). There isn't a nice way to determine why synchronous commit is waiting without being particularly invasive, and even then it could be particularly noisy (e.g. provide all the application_names that we are potentially waiting on). Thanks, -- John Hsu - Amazon Web Services
Hi Bertrand, On Sun, Jul 28, 2024 at 10:00 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > Yeah, at the same place as the static lsn[] declaration, something like: > > static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE]; /* cached LSNs */ > > but that may just be a matter of taste. > I've updated the patch to reflect that. > > Right, we may want to cast it then but given that the for loop is "small" I think > that's also fine to keep the for loop. > Ah I see what you mean. I've updated these changes and attached the patch to the other thread. Thanks, -- John Hsu - Amazon Web Services
On Tue, Aug 27, 2024 at 12:58 AM John H <johnhyvr@gmail.com> wrote: > > For instance, in Shveta's suggestion of > > > > > We can perform this test with both of the below settings and say make > > > > D and E slow in sending responses: > > > > 1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' > > > > 2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot. > > if the server associated with E_slot is just down or undergoing > some sort of maintenance, then all logical consumers would start lagging until > the server is back up. I could also mimic a network lag of 20 seconds > and it's guaranteed > that this patch will perform better. > I wanted a simple test where in the first case you use synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' and in the second case use standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot. You can try some variations of it as well. The idea is that even if the performance is less for synchronous_standby_names configuration, we should be able to document it. This will help users to decide what is best for them. > I re-ran the benchmarks with a longer run time of 3 hours, and testing > a new shared cache > for walsenders to check the value before obtaining the SyncRepLock. > > I also saw I was being throttled on storage in my previous benchmarks > so I moved to a new setup. > I benchmarked a new test case with an additional shared cache between > all the walsenders to > reduce potential contention on SyncRepLock, and have attached said patch. > > Database: Writer on it's own disk, 5 RRs on the other disk together > Client: 10 logical clients, pgbench running from here as well > > 'pgbench -c 32 -j 4 -T 10800 -U "ec2-user" -d postgres -r -P 1' > > # Test failover_slots with synchronized_standby_slots = 'rr_1, rr_2, > rr_3, rr_4, rr_5' > latency average = 10.683 ms > latency stddev = 11.851 ms > initial connection time = 145.876 ms > tps = 2994.595673 (without initial connection time) > > # Test failover_slots waiting on sync_rep no new shared cache > latency average = 10.684 ms > latency stddev = 12.247 ms > initial connection time = 142.561 ms > tps = 2994.160136 (without initial connection time) > statement latencies in milliseconds and failures: > > # Test failover slots with additional shared cache > latency average = 10.674 ms > latency stddev = 11.917 ms > initial connection time = 142.486 ms > tps = 2997.315874 (without initial connection time) > > The tps improvement between no cache and shared_cache seems marginal, but we do > see the slight improvement in stddev which makes sense from a > contention perspective. > What is the difference between "Test failover_slots with synchronized_standby_slots = 'rr_1, rr_2, > rr_3, rr_4, rr_5'" and "Test failover_slots waiting on sync_rep no new shared cache"? I want to know what configurationdid you used for synchronous_standby_names in the latter case. > I think the cache would demonstrate a lot more improvement if we had > say 1000 logical slots > and all of them are trying to obtain SyncRepLock for updating its values. > > I've attached the patch but don't feel particularly strongly about the > new shared LSN values. > I am also not sure especially as the test results didn't shown much improvement and the code also becomes bit complicated. BTW, in the 0003 version in the below code: + /* Cache values to reduce contention */ + LWLockAcquire(SyncRepLock, LW_SHARED); + memcpy((XLogRecPtr *) walsndctl->cached_lsn, (XLogRecPtr *) walsndctl->lsn, sizeof(lsn)); + LWLockRelease(SyncRepLock); Which mode lsn is being copied? I am not sure if I understood this part of the code. In the 0002 version, in the following code [1], you are referring to LSN mode which is enabled for logical walsender irrespective of the mode used by the physical walsender. It is possible that they are always the same but that is not evident from the code or comments in the patch. [1] : + /* Cache values to reduce contention on lock */ + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) + { + lsn[i] = walsndctl->lsn[i]; + } - ss_oldest_flush_lsn = min_restart_lsn; + LWLockRelease(SyncRepLock); - return true; + if (lsn[mode] >= wait_for_lsn) + return true; -- With Regards, Amit Kapila.
On Tue, Aug 27, 2024 at 12:56 AM John H <johnhyvr@gmail.com> wrote: > > Hi Shveta, > > On Sun, Jul 21, 2024 at 8:42 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > Ah that's a gap. Let me add some logging/warning in a similar fashion. > > > Although I think I'd have the warning be relatively generic (e.g. > > > changes are blocked because > > > they're not synchronously committed) > > > > > > > okay, sounds good. > > > > thanks > > Shveta > > I took a look at having similar warnings the existing > 'synchronized_standby_slots' feature has > and I don't think it's particularly feasible. > > The checks/warnings for 'synchronized_standby_slots' are intended to > protect against misconfiguration. > They consist of slot validation (valid slot_name, not logical slot, > slot has not been invalidated), and > whether or not the slot is active. > > I don't think there's a "misconfiguration" equivalent for waiting on > synchronous_commit. > With the current proposal, once you have (synchronous_commit enabled > && failover_slots), logical > decoding is dependent on whether or not the writes have been > replicated to a synchronous replica. > If there is no data being replicated out of the logical slot, it is > because from the perspective of the > database no writes have been committed yet. I don't think it would > make sense to add logging/warning as to > why a transaction is still not committed (e.g. which potential replica > is the one lagging). There isn't a > nice way to determine why synchronous commit is waiting without being > particularly invasive, and even then > it could be particularly noisy (e.g. provide all the application_names > that we are potentially waiting on). > Okay. Thanks for the details. I see your point. I will review to see if anything comes to my mind for a simpler way to do this. thanks Shveta
On Thu, Aug 29, 2024 at 2:31 AM John H <johnhyvr@gmail.com> wrote: > > Hi Amit, > > On Mon, Aug 26, 2024 at 11:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I wanted a simple test where in the first case you use > > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' and in the second case > > use standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot. You > > can try some variations of it as well. The idea is that even if the > > performance is less for synchronous_standby_names configuration, we > > should be able to document it. This will help users to decide what is > > ... > > What is the difference between "Test failover_slots with > > synchronized_standby_slots = 'rr_1, rr_2, > > > rr_3, rr_4, rr_5'" and "Test failover_slots waiting on sync_rep no new shared cache"? I want to know what configurationdid you used for synchronous_standby_names in the latter case. > > Sorry for the confusion due to the bad-naming of the test cases, let > me rephrase. > All three tests had synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' > set with synchronous_commit = 'on', and failover_slots = 'on' > for the 10 logical slots. > > # Test failover_slots with synchronized_standby_slots = 'rr_1, rr_2, > rr_3, rr_4, rr_5' > This is the test you wanted where the logical clients are waiting on > all 5 slots to acknowledge the change since > 'synchronized_standby_slots' takes priority when set. > > # Test failover_slots sync rep no cache > This test has 'synchronized_standby_slots' commented out, and without > relying on the new cache introduced in 0003. > Logical clients will wait on synchronous_standby_names in this case. > > # Test failover slots with additional shared cache > This test also has 'synchronized_standby_slots' commented out, and > logical clients will wait on the LSNs > reported from synchronous_standby_names but it relies on a new cache > to reduce contention on SyncRepLock. > > > The idea is that even if the > > performance is less for synchronous_standby_names configuration, we > > should be able to document it. This will help users to decide what is > > best for them. > > Makes sense. > > > I am also not sure especially as the test results didn't shown much > > improvement and the code also becomes bit complicated. BTW, in the > > 0003 version in the below code: > > That's fair, I've updated to be more in line with 0002. > > > + /* Cache values to reduce contention */ > > + LWLockAcquire(SyncRepLock, LW_SHARED); > > + memcpy((XLogRecPtr *) walsndctl->cached_lsn, (XLogRecPtr *) > > walsndctl->lsn, sizeof(lsn)); > > + LWLockRelease(SyncRepLock); > > > > Which mode lsn is being copied? I am not sure if I understood this > > part of the code. > > All of the mode LSNs are being copied in case SyncRepWaitMode changes in > the next iteration. I've removed that part but kept: > > > + memcpy(lsn, (XLogRecPtr *) walsndctl->lsn, sizeof(lsn)); > > as suggested by Bertrand to avoid the for loop updating values one-by-one. > > Here's what's logged after the memcpy: > > 2024-08-28 19:41:13.798 UTC [1160413] LOG: lsn[0] after memcpy is: 279/752C7FF0 > 2024-08-28 19:41:13.798 UTC [1160413] LOG: lsn[1] after memcpy is: 279/752C7F20 > 2024-08-28 19:41:13.798 UTC [1160413] LOG: lsn[2] after memcpy is: 279/752C7F20 > > > In the 0002 version, in the following code [1], you are referring to > > LSN mode which is enabled for logical walsender irrespective of the > > mode used by the physical walsender. It is possible that they are > > always the same but that is not evident from the code or comments in > > the patch. > > They are almost always the same, I tried to indicate that with the > following comment in the patch, but I could make it more explicit? > > /* Initialize value in case SIGHUP changing to SYNC_REP_NO_WAIT */ > > At the beginning we set > > > int mode = SyncRepWaitMode; > > At this time, the logical walsender mode it's checking against is the > same as what the physical walsenders are using. > It's possible that this mode is no longer the same when we execute the > following check: > > > if (lsn[mode] >= wait_for_lsn) > > because of a SIGHUP to synchronous_commit that changes SyncRepWaitMode > to some other value > > We cache the value instead of > > if (lsn[SyncRepWaitMode] >= wait_for_lsn) > > because SYNC_REP_NO_WAIT is -1. If SyncRepWaitMode is set to this it > leads to out of bounds access. > > I've attached a new patch that removes the shared cache introduced in 0003. > Thanks for the patch. Few comments and queries: 1) + static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE]; We shall name it as 'lsns' as there are multiple. 2) + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) + { + lsn[i] = InvalidXLogRecPtr; + } Can we do it like below similar to what you have done at another place: memset(lsn, InvalidXLogRecPtr, sizeof(lsn)); 3) + if (!initialized) + { + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) + { + lsn[i] = InvalidXLogRecPtr; + } + } I do not see 'initialized' set to TRUE anywhere. Can you please elaborate the intent here? 4) + int mode = SyncRepWaitMode; + int i; + + if (!initialized) + { + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) + { + lsn[i] = InvalidXLogRecPtr; + } + } + if (mode == SYNC_REP_NO_WAIT) + return true; I do not understand this code well. As Amit also pointed out, 'mode' may change. When we initialize 'mode' lets say SyncRepWaitMode is SYNC_REP_NO_WAIT but by the time we check 'if (mode == SYNC_REP_NO_WAIT)', SyncRepWaitMode has changed to say SYNC_REP_WAIT_FLUSH, if so, then we will wrongly return true from here. Is that a possibility? ProcessConfigFile() is in the caller, and thus we may end up using the wrong mode. thanks Shveta
Hi Shveta, Thanks for reviewing it so quickly. On Thu, Aug 29, 2024 at 2:35 AM shveta malik <shveta.malik@gmail.com> wrote: > > Thanks for the patch. Few comments and queries: > > 1) > + static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE]; > > We shall name it as 'lsns' as there are multiple. > This follows the same naming convention in walsender_private.h > 2) > > + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) > + { > + lsn[i] = InvalidXLogRecPtr; > + } > > Can we do it like below similar to what you have done at another place: > memset(lsn, InvalidXLogRecPtr, sizeof(lsn)); > I don't think memset works in this case? Well, I think *technically* works but not sure if that's something worth optimizing. If I understand correctly, memset takes in a char for the value and not XLogRecPtr (uint64). So something like: memset(lsn, 0, sizeof(lsn)) InvalidXLogRecPtr is defined as 0 so I think it works but there's an implicit dependency here for correctness. > 3) > + if (!initialized) > + { > + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) > + { > + lsn[i] = InvalidXLogRecPtr; > + } > + } > > I do not see 'initialized' set to TRUE anywhere. Can you please > elaborate the intent here? You're right I thought I fixed this. WIll update. > > 4) > + int mode = SyncRepWaitMode; > + int i; > + > + if (!initialized) > + { > + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) > + { > + lsn[i] = InvalidXLogRecPtr; > + } > + } > + if (mode == SYNC_REP_NO_WAIT) > + return true; > > I do not understand this code well. As Amit also pointed out, 'mode' > may change. When we initialize 'mode' lets say SyncRepWaitMode is > SYNC_REP_NO_WAIT but by the time we check 'if (mode == > SYNC_REP_NO_WAIT)', SyncRepWaitMode has changed to say > SYNC_REP_WAIT_FLUSH, if so, then we will wrongly return true from > here. Is that a possibility? ProcessConfigFile() is in the caller, and > thus we may end up using the wrong mode. > Yes it's possible for mode to change. In my comment to Amit in the other thread, I think we have to store mode and base our execution of this logic and ignore SyncRepWaitMode changing due to ProcesConfigFile/SIGHUP for one iteration. We can store the value of mode later, so something like: > if (SyncRepWaitMode == SYNC_REP_NO_WAIT) > return true; > mode = SyncRepWaitMode > if (lsn[mode] >= wait_for_lsn) > return true; But it's the same issue which is when you check lsn[mode], SyncRepWaitMode could have changed to something else, so you always have to initialize the value and will always have this discrepancy. I'm skeptical end users are changing SyncRepWaitMode in their database clusters as it has implications for their durability and I would assume they run with the same durability guarantees. Thanks, -- John Hsu - Amazon Web Services
On Fri, Aug 30, 2024 at 12:56 AM John H <johnhyvr@gmail.com> wrote: > > Hi Shveta, > > Thanks for reviewing it so quickly. > > On Thu, Aug 29, 2024 at 2:35 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > Thanks for the patch. Few comments and queries: > > > > 1) > > + static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE]; > > > > We shall name it as 'lsns' as there are multiple. > > > > This follows the same naming convention in walsender_private.h > > > 2) > > > > + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) > > + { > > + lsn[i] = InvalidXLogRecPtr; > > + } > > > > Can we do it like below similar to what you have done at another place: > > memset(lsn, InvalidXLogRecPtr, sizeof(lsn)); > > > > I don't think memset works in this case? Well, I think *technically* works but > not sure if that's something worth optimizing. > If I understand correctly, memset takes in a char for the value and > not XLogRecPtr (uint64). > > So something like: memset(lsn, 0, sizeof(lsn)) > > InvalidXLogRecPtr is defined as 0 so I think it works but there's an > implicit dependency here > for correctness. > > > 3) > > + if (!initialized) > > + { > > + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) > > + { > > + lsn[i] = InvalidXLogRecPtr; > > + } > > + } > > > > I do not see 'initialized' set to TRUE anywhere. Can you please > > elaborate the intent here? > > You're right I thought I fixed this. WIll update. > > > > > 4) > > + int mode = SyncRepWaitMode; > > + int i; > > + > > + if (!initialized) > > + { > > + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) > > + { > > + lsn[i] = InvalidXLogRecPtr; > > + } > > + } > > + if (mode == SYNC_REP_NO_WAIT) > > + return true; > > > > I do not understand this code well. As Amit also pointed out, 'mode' > > may change. When we initialize 'mode' lets say SyncRepWaitMode is > > SYNC_REP_NO_WAIT but by the time we check 'if (mode == > > SYNC_REP_NO_WAIT)', SyncRepWaitMode has changed to say > > SYNC_REP_WAIT_FLUSH, if so, then we will wrongly return true from > > here. Is that a possibility? ProcessConfigFile() is in the caller, and > > thus we may end up using the wrong mode. > > > > Yes it's possible for mode to change. In my comment to Amit in the other thread, > I think we have to store mode and base our execution of this logic and ignore > SyncRepWaitMode changing due to ProcesConfigFile/SIGHUP for one iteration. > > We can store the value of mode later, so something like: > > > if (SyncRepWaitMode == SYNC_REP_NO_WAIT) > > return true; > > mode = SyncRepWaitMode > > if (lsn[mode] >= wait_for_lsn) > > return true; > > But it's the same issue which is when you check lsn[mode], > SyncRepWaitMode could have changed to > something else, so you always have to initialize the value and will > always have this discrepancy. > > I'm skeptical end users are changing SyncRepWaitMode in their database > clusters as > it has implications for their durability and I would assume they run > with the same durability guarantees. > I was trying to have a look at the patch again, it doesn't apply on the head, needs rebase. Regarding 'mode = SyncRepWaitMode', FWIW, SyncRepWaitForLSN() also does in a similar way. It gets mode in local var initially and uses it later. See [1]. So isn't there a chance too that 'SyncRepWaitMode' can change in between. [1]: mode = SyncRepWaitMode; ..... .... if (!WalSndCtl->sync_standbys_defined || lsn <= WalSndCtl->lsn[mode]) { LWLockRelease(SyncRepLock); return; } thanks Shveta
On Wed, Sep 11, 2024 at 2:40 AM John H <johnhyvr@gmail.com> wrote: > > Hi Shveta, > > On Sun, Sep 8, 2024 at 11:16 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > I was trying to have a look at the patch again, it doesn't apply on > > the head, needs rebase. > > > > Rebased with the latest changes. > > > Regarding 'mode = SyncRepWaitMode', FWIW, SyncRepWaitForLSN() also > > does in a similar way. It gets mode in local var initially and uses it > > later. See [1]. So isn't there a chance too that 'SyncRepWaitMode' can > > change in between. > > > > [1]: > > mode = SyncRepWaitMode; > > ..... > > .... > > if (!WalSndCtl->sync_standbys_defined || > > lsn <= WalSndCtl->lsn[mode]) > > { > > LWLockRelease(SyncRepLock); > > return; > > } > > You are right, thanks for the correction. I tried reproducing with GDB > where SyncRepWaitMode > changes due to pg_ctl reload but was unable to do so. It seems like > SIGHUP only sets ConfigReloadPending = true, > which gets processed in the next loop in WalSndLoop() and that's > probably where I was getting confused. yes, SIGHUP will be processed in the caller of StandbySlotsHaveCaughtup() (see ProcessConfigFile() in WaitForStandbyConfirmation()). So we can use 'SyncRepWaitMode' directly as it is not going to change in StandbySlotsHaveCaughtup() even if user triggers the change. And thus it was okay to use it even in the local variable too similar to how SyncRepWaitForLSN() does it. > In the latest patch, I've added: > > Assert(SyncRepWaitMode >= 0); > > which should be true since we call SyncRepConfigured() at the > beginning of StandbySlotsHaveCaughtup(), > and used SyncRepWaitMode directly. Yes, it should be okay I think. As SyncRepRequested() in the beginning makes sure synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH and thus SyncRepWaitMode should be mapped to either of WAIT_WRITE/FLUSH/APPLY etc. Will review further. thanks Shveta
On Thu, Sep 12, 2024 at 3:04 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Sep 11, 2024 at 2:40 AM John H <johnhyvr@gmail.com> wrote: > > > > Hi Shveta, > > > > On Sun, Sep 8, 2024 at 11:16 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > I was trying to have a look at the patch again, it doesn't apply on > > > the head, needs rebase. > > > > > > > Rebased with the latest changes. > > > > > Regarding 'mode = SyncRepWaitMode', FWIW, SyncRepWaitForLSN() also > > > does in a similar way. It gets mode in local var initially and uses it > > > later. See [1]. So isn't there a chance too that 'SyncRepWaitMode' can > > > change in between. > > > > > > [1]: > > > mode = SyncRepWaitMode; > > > ..... > > > .... > > > if (!WalSndCtl->sync_standbys_defined || > > > lsn <= WalSndCtl->lsn[mode]) > > > { > > > LWLockRelease(SyncRepLock); > > > return; > > > } > > > > You are right, thanks for the correction. I tried reproducing with GDB > > where SyncRepWaitMode > > changes due to pg_ctl reload but was unable to do so. It seems like > > SIGHUP only sets ConfigReloadPending = true, > > which gets processed in the next loop in WalSndLoop() and that's > > probably where I was getting confused. > > yes, SIGHUP will be processed in the caller of > StandbySlotsHaveCaughtup() (see ProcessConfigFile() in > WaitForStandbyConfirmation()). So we can use 'SyncRepWaitMode' > directly as it is not going to change in StandbySlotsHaveCaughtup() > even if user triggers the change. And thus it was okay to use it even > in the local variable too similar to how SyncRepWaitForLSN() does it. > > > In the latest patch, I've added: > > > > Assert(SyncRepWaitMode >= 0); > > > > which should be true since we call SyncRepConfigured() at the > > beginning of StandbySlotsHaveCaughtup(), > > and used SyncRepWaitMode directly. > > Yes, it should be okay I think. As SyncRepRequested() in the beginning > makes sure synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH and > thus SyncRepWaitMode should be mapped to either of > WAIT_WRITE/FLUSH/APPLY etc. Will review further. > I was wondering if we need somethign similar to SyncRepWaitForLSN() here: /* Cap the level for anything other than commit to remote flush only. */ if (commit) mode = SyncRepWaitMode; else mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH); The header comment says: * 'lsn' represents the LSN to wait for. 'commit' indicates whether this LSN * represents a commit record. If it doesn't, then we wait only for the WAL * to be flushed if synchronous_commit is set to the higher level of * remote_apply, because only commit records provide apply feedback. If we don't do something similar, then aren't there chances that we keep on waiting on the wrong lsn[mode] for the case when mode = SYNC_REP_WAIT_APPLY while sync-rep-wait infrastructure is updating different mode's lsn. Is my understanding correct? thanks Shveta
On Fri, Sep 13, 2024 at 3:13 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Thu, Sep 12, 2024 at 3:04 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Sep 11, 2024 at 2:40 AM John H <johnhyvr@gmail.com> wrote: > > > > > > Hi Shveta, > > > > > > On Sun, Sep 8, 2024 at 11:16 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > > > > I was trying to have a look at the patch again, it doesn't apply on > > > > the head, needs rebase. > > > > > > > > > > Rebased with the latest changes. > > > > > > > Regarding 'mode = SyncRepWaitMode', FWIW, SyncRepWaitForLSN() also > > > > does in a similar way. It gets mode in local var initially and uses it > > > > later. See [1]. So isn't there a chance too that 'SyncRepWaitMode' can > > > > change in between. > > > > > > > > [1]: > > > > mode = SyncRepWaitMode; > > > > ..... > > > > .... > > > > if (!WalSndCtl->sync_standbys_defined || > > > > lsn <= WalSndCtl->lsn[mode]) > > > > { > > > > LWLockRelease(SyncRepLock); > > > > return; > > > > } > > > > > > You are right, thanks for the correction. I tried reproducing with GDB > > > where SyncRepWaitMode > > > changes due to pg_ctl reload but was unable to do so. It seems like > > > SIGHUP only sets ConfigReloadPending = true, > > > which gets processed in the next loop in WalSndLoop() and that's > > > probably where I was getting confused. > > > > yes, SIGHUP will be processed in the caller of > > StandbySlotsHaveCaughtup() (see ProcessConfigFile() in > > WaitForStandbyConfirmation()). So we can use 'SyncRepWaitMode' > > directly as it is not going to change in StandbySlotsHaveCaughtup() > > even if user triggers the change. And thus it was okay to use it even > > in the local variable too similar to how SyncRepWaitForLSN() does it. > > > > > In the latest patch, I've added: > > > > > > Assert(SyncRepWaitMode >= 0); > > > > > > which should be true since we call SyncRepConfigured() at the > > > beginning of StandbySlotsHaveCaughtup(), > > > and used SyncRepWaitMode directly. > > > > Yes, it should be okay I think. As SyncRepRequested() in the beginning > > makes sure synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH and > > thus SyncRepWaitMode should be mapped to either of > > WAIT_WRITE/FLUSH/APPLY etc. Will review further. > > > > I was wondering if we need somethign similar to SyncRepWaitForLSN() here: > > /* Cap the level for anything other than commit to remote flush only. */ > if (commit) > mode = SyncRepWaitMode; > else > mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH); > > The header comment says: > * 'lsn' represents the LSN to wait for. 'commit' indicates whether this LSN > * represents a commit record. If it doesn't, then we wait only for the WAL > * to be flushed if synchronous_commit is set to the higher level of > * remote_apply, because only commit records provide apply feedback. > > If we don't do something similar, then aren't there chances that we > keep on waiting on the wrong lsn[mode] for the case when mode = > SYNC_REP_WAIT_APPLY while sync-rep-wait infrastructure is updating > different mode's lsn. Is my understanding correct? > I think here we always need the lsn values corresponding to SYNC_REP_WAIT_FLUSH as we want to ensure that the WAL has to be flushed on physical standby before sending it to the logical subscriber. See ProcessStandbyReplyMessage() where we always use flushPtr to advance the physical_slot via PhysicalConfirmReceivedLocation(). Another question aside from the above point, what if someone has specified logical subscribers in synchronous_standby_names? In the case of synchronized_standby_slots, we won't proceed with such slots. -- With Regards, Amit Kapila.
On Mon, Sep 16, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Sep 13, 2024 at 3:13 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Thu, Sep 12, 2024 at 3:04 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > On Wed, Sep 11, 2024 at 2:40 AM John H <johnhyvr@gmail.com> wrote: > > > > > > > > Hi Shveta, > > > > > > > > On Sun, Sep 8, 2024 at 11:16 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > > > > > > > I was trying to have a look at the patch again, it doesn't apply on > > > > > the head, needs rebase. > > > > > > > > > > > > > Rebased with the latest changes. > > > > > > > > > Regarding 'mode = SyncRepWaitMode', FWIW, SyncRepWaitForLSN() also > > > > > does in a similar way. It gets mode in local var initially and uses it > > > > > later. See [1]. So isn't there a chance too that 'SyncRepWaitMode' can > > > > > change in between. > > > > > > > > > > [1]: > > > > > mode = SyncRepWaitMode; > > > > > ..... > > > > > .... > > > > > if (!WalSndCtl->sync_standbys_defined || > > > > > lsn <= WalSndCtl->lsn[mode]) > > > > > { > > > > > LWLockRelease(SyncRepLock); > > > > > return; > > > > > } > > > > > > > > You are right, thanks for the correction. I tried reproducing with GDB > > > > where SyncRepWaitMode > > > > changes due to pg_ctl reload but was unable to do so. It seems like > > > > SIGHUP only sets ConfigReloadPending = true, > > > > which gets processed in the next loop in WalSndLoop() and that's > > > > probably where I was getting confused. > > > > > > yes, SIGHUP will be processed in the caller of > > > StandbySlotsHaveCaughtup() (see ProcessConfigFile() in > > > WaitForStandbyConfirmation()). So we can use 'SyncRepWaitMode' > > > directly as it is not going to change in StandbySlotsHaveCaughtup() > > > even if user triggers the change. And thus it was okay to use it even > > > in the local variable too similar to how SyncRepWaitForLSN() does it. > > > > > > > In the latest patch, I've added: > > > > > > > > Assert(SyncRepWaitMode >= 0); > > > > > > > > which should be true since we call SyncRepConfigured() at the > > > > beginning of StandbySlotsHaveCaughtup(), > > > > and used SyncRepWaitMode directly. > > > > > > Yes, it should be okay I think. As SyncRepRequested() in the beginning > > > makes sure synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH and > > > thus SyncRepWaitMode should be mapped to either of > > > WAIT_WRITE/FLUSH/APPLY etc. Will review further. > > > > > > > I was wondering if we need somethign similar to SyncRepWaitForLSN() here: > > > > /* Cap the level for anything other than commit to remote flush only. */ > > if (commit) > > mode = SyncRepWaitMode; > > else > > mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH); > > > > The header comment says: > > * 'lsn' represents the LSN to wait for. 'commit' indicates whether this LSN > > * represents a commit record. If it doesn't, then we wait only for the WAL > > * to be flushed if synchronous_commit is set to the higher level of > > * remote_apply, because only commit records provide apply feedback. > > > > If we don't do something similar, then aren't there chances that we > > keep on waiting on the wrong lsn[mode] for the case when mode = > > SYNC_REP_WAIT_APPLY while sync-rep-wait infrastructure is updating > > different mode's lsn. Is my understanding correct? > > > > I think here we always need the lsn values corresponding to > SYNC_REP_WAIT_FLUSH as we want to ensure that the WAL has to be > flushed on physical standby before sending it to the logical > subscriber. See ProcessStandbyReplyMessage() where we always use > flushPtr to advance the physical_slot via > PhysicalConfirmReceivedLocation(). I agree. So even if the mode is SYNC_REP_WAIT_WRITE (lower one) or SYNC_REP_WAIT_APPLY (higher one), we need to wait for lsn[SYNC_REP_WAIT_FLUSH]. > Another question aside from the above point, what if someone has > specified logical subscribers in synchronous_standby_names? In the > case of synchronized_standby_slots, we won't proceed with such slots. > Yes, it is a possibility. I have missed this point earlier. Now I tried a case where I give a mix of logical subscriber and physical standby in 'synchronous_standby_names' on pgHead, it even takes that 'mix' configuration and starts waiting accordingly. synchronous_standby_names = 'FIRST 2(logicalsub_1, phy_standby_1, phy_standby_2)'; thanks Shveta
On Mon, Sep 16, 2024 at 2:55 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Mon, Sep 16, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > Another question aside from the above point, what if someone has > > specified logical subscribers in synchronous_standby_names? In the > > case of synchronized_standby_slots, we won't proceed with such slots. > > > > Yes, it is a possibility. I have missed this point earlier. Now I > tried a case where I give a mix of logical subscriber and physical > standby in 'synchronous_standby_names' on pgHead, it even takes that > 'mix' configuration and starts waiting accordingly. > > synchronous_standby_names = 'FIRST 2(logicalsub_1, phy_standby_1, > phy_standby_2)'; > This should not happen as we don't support syncing failover slots on logical subscribers. The other point to consider here is that the user may not have set 'sync_replication_slots' on all the physical standbys mentioned in 'synchronous_standby_names' and in that case, it doesn't make sense to wait for WAL to get flushed on those standbys. What do you think? -- With Regards, Amit Kapila.
On Mon, Sep 16, 2024 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Sep 16, 2024 at 2:55 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Mon, Sep 16, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > Another question aside from the above point, what if someone has > > > specified logical subscribers in synchronous_standby_names? In the > > > case of synchronized_standby_slots, we won't proceed with such slots. > > > > > > > Yes, it is a possibility. I have missed this point earlier. Now I > > tried a case where I give a mix of logical subscriber and physical > > standby in 'synchronous_standby_names' on pgHead, it even takes that > > 'mix' configuration and starts waiting accordingly. > > > > synchronous_standby_names = 'FIRST 2(logicalsub_1, phy_standby_1, > > phy_standby_2)'; > > > > This should not happen as we don't support syncing failover slots on > logical subscribers. +1 > The other point to consider here is that the user > may not have set 'sync_replication_slots' on all the physical standbys > mentioned in 'synchronous_standby_names' and in that case, it doesn't > make sense to wait for WAL to get flushed on those standbys. What do > you think? > Yes, it is a possibility. But then it is a possibility in case of 'synchronized_standby_slots' as well. User may always configure one of the standbys in 'synchronized_standby_slots' while may not configure slot-sync GUCs on that standby (hot_standby_feedback, sync_replication_slots etc). In such a case, logical replication is dependent upon the concerned physical standby even though latter is not syncing failover slots. But there is no reliable way to detect this at the publisher side to stop the 'wait' for the concerned physical standby. We tried in the past but it was not that simple as the sync related GUCs may change anytime on the physical standby and thus need consistent feedback mechanism to detect this. IMO, we can explain the recommendations and risks for 'synchronous_standby_names' in docs similar to what we do for 'sync_replication_slots'. Or do you have something else in mind? thanks Shveta
On Tue, Sep 17, 2024 at 9:08 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Mon, Sep 16, 2024 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Sep 16, 2024 at 2:55 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > On Mon, Sep 16, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > Another question aside from the above point, what if someone has > > > > specified logical subscribers in synchronous_standby_names? In the > > > > case of synchronized_standby_slots, we won't proceed with such slots. > > > > > > > > > > Yes, it is a possibility. I have missed this point earlier. Now I > > > tried a case where I give a mix of logical subscriber and physical > > > standby in 'synchronous_standby_names' on pgHead, it even takes that > > > 'mix' configuration and starts waiting accordingly. > > > > > > synchronous_standby_names = 'FIRST 2(logicalsub_1, phy_standby_1, > > > phy_standby_2)'; > > > > > > > This should not happen as we don't support syncing failover slots on > > logical subscribers. > > +1 > > > The other point to consider here is that the user > > may not have set 'sync_replication_slots' on all the physical standbys > > mentioned in 'synchronous_standby_names' and in that case, it doesn't > > make sense to wait for WAL to get flushed on those standbys. What do > > you think? > > > > Yes, it is a possibility. But then it is a possibility in case of > 'synchronized_standby_slots' as well. User may always configure one of > the standbys in 'synchronized_standby_slots' while may not configure > slot-sync GUCs on that standby (hot_standby_feedback, > sync_replication_slots etc). In such a case, logical replication is > dependent upon the concerned physical standby even though latter is > not syncing failover slots. > The difference is that the purpose of 'synchronized_standby_slots' is to mention slot names for which the user expects logical walsenders to wait before sending the logical changes to subscribers. OTOH, 'synchronous_standby_names' has a different purpose as well. It is not clear to me if the users would be interested in syncing failover slots to all the standbys mentioned in 'synchronous_standby_names'. -- With Regards, Amit Kapila.
On Thu, Sep 19, 2024 at 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Sep 17, 2024 at 9:08 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Mon, Sep 16, 2024 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Sep 16, 2024 at 2:55 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > On Mon, Sep 16, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > Another question aside from the above point, what if someone has > > > > > specified logical subscribers in synchronous_standby_names? In the > > > > > case of synchronized_standby_slots, we won't proceed with such slots. > > > > > > > > > > > > > Yes, it is a possibility. I have missed this point earlier. Now I > > > > tried a case where I give a mix of logical subscriber and physical > > > > standby in 'synchronous_standby_names' on pgHead, it even takes that > > > > 'mix' configuration and starts waiting accordingly. > > > > > > > > synchronous_standby_names = 'FIRST 2(logicalsub_1, phy_standby_1, > > > > phy_standby_2)'; > > > > > > > > > > This should not happen as we don't support syncing failover slots on > > > logical subscribers. > > > > +1 > > > > > The other point to consider here is that the user > > > may not have set 'sync_replication_slots' on all the physical standbys > > > mentioned in 'synchronous_standby_names' and in that case, it doesn't > > > make sense to wait for WAL to get flushed on those standbys. What do > > > you think? > > > > > > > Yes, it is a possibility. But then it is a possibility in case of > > 'synchronized_standby_slots' as well. User may always configure one of > > the standbys in 'synchronized_standby_slots' while may not configure > > slot-sync GUCs on that standby (hot_standby_feedback, > > sync_replication_slots etc). In such a case, logical replication is > > dependent upon the concerned physical standby even though latter is > > not syncing failover slots. > > > > The difference is that the purpose of 'synchronized_standby_slots' is > to mention slot names for which the user expects logical walsenders to > wait before sending the logical changes to subscribers. OTOH, > 'synchronous_standby_names' has a different purpose as well. It is not > clear to me if the users would be interested in syncing failover slots > to all the standbys mentioned in 'synchronous_standby_names'. > Okay, I see your point. But not sure what could be the solution here except documenting. But let me think more. thanks Shveta
Hi, On Mon, Sep 16, 2024 at 2:25 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > If we don't do something similar, then aren't there chances that we > > > keep on waiting on the wrong lsn[mode] for the case when mode = > > > SYNC_REP_WAIT_APPLY while sync-rep-wait infrastructure is updating > > > different mode's lsn. Is my understanding correct? > > > Let me take a deeper look at this, I think you're right though. > > I agree. So even if the mode is SYNC_REP_WAIT_WRITE (lower one) or > SYNC_REP_WAIT_APPLY (higher one), we need to wait for > lsn[SYNC_REP_WAIT_FLUSH]. > I'm not sure if I agree with that. I think the sychronous_commit mode should be a good enough proxy for what the user wants from a durability perspective for their application. For an application writing to the database, if they've set mode as SYNC_REP_WAIT_WRITE as fine being when a commit is treated as durable, why do we need to be concerned with overriding that to SYNC_REP_WAIT_FLUSH? Similarly, if a user has mode set to SYNC_REP_WAIT_APPLY, to me it's even more confusing that there can be scenarios where the application wouldn't see the data as committed nor would subsequent reads but a logical consumer would be able to. The database should be treated as the source of truth and I don't think logical consumers should be ever ahead of what the database is treating as committed. Thanks, -- John Hsu - Amazon Web Services
Hi, On Fri, Sep 20, 2024 at 2:44 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > The difference is that the purpose of 'synchronized_standby_slots' is > > to mention slot names for which the user expects logical walsenders to > > wait before sending the logical changes to subscribers. OTOH, > > 'synchronous_standby_names' has a different purpose as well. It is not > > clear to me if the users would be interested in syncing failover slots > > to all the standbys mentioned in 'synchronous_standby_names'. > > > > Okay, I see your point. But not sure what could be the solution here > except documenting. But let me think more. > That's a great find. I didn't consider mixed physical and logical replicas in synchronous_standby_names. I wonder if there are users running synchronous_standby_names with a mix of logical and physical replicas and what the use case would be. Not sure if there's anything straight forward we could do in general for slot syncing if synchronous_standby_names refers to application_names of logical replicas, the feature can't be supported. -- John Hsu - Amazon Web Services
On Sat, Sep 21, 2024 at 6:34 AM John H <johnhyvr@gmail.com> wrote: > > On Fri, Sep 20, 2024 at 2:44 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > > > The difference is that the purpose of 'synchronized_standby_slots' is > > > to mention slot names for which the user expects logical walsenders to > > > wait before sending the logical changes to subscribers. OTOH, > > > 'synchronous_standby_names' has a different purpose as well. It is not > > > clear to me if the users would be interested in syncing failover slots > > > to all the standbys mentioned in 'synchronous_standby_names'. > > > > > > > Okay, I see your point. But not sure what could be the solution here > > except documenting. But let me think more. > > > > That's a great find. I didn't consider mixed physical and logical > replicas in synchronous_standby_names. > I wonder if there are users running synchronous_standby_names with a > mix of logical and > physical replicas and what the use case would be. > I am also not aware of the actual use cases of mixing physical and logical synchronous standbys but as we provide that functionality, we can't ignore it. BTW, I am also not sure if users would like the slots to be synced on all the standbys mentioned in synchronous_standby_names. and even, if they are, it is better to have an explicit way of letting users specify it. One possible approach is to extend the syntax of "synchronized_standby_slots" similar to "synchronous_standby_names" so that users can directly specify slots similarly and avoid waiting for more than required standbys. -- With Regards, Amit Kapila.