Thread: Allow logical failover slots to wait on synchronous replication

Hi hackers,

Building on bf279ddd1c, this patch introduces a GUC
'standby_slot_names_from_syncrep' which allows logical failover slots
to wait for changes to have been synchronously replicated before sending
the decoded changes to logical subscribers.

The existing 'standby_slot_names' isn't great for users who are running
clusters with quorum-based synchronous replicas. For instance, if
the user has  synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
bit tedious to have to reconfigure the standby_slot_names to set it to
the most updated 3 sync replicas whenever different sync replicas start
lagging. In the event that both GUCs are set, 'standby_slot_names' takes
precedence.

I did some very brief pgbench runs to compare the latency. Client instance
was running pgbench and 10 logical clients while the Postgres box hosted
the writer and 5 synchronous replicas.

There's a hit to TPS, which I'm thinking is due to more contention on the
SyncRepLock, and that scales with the number of logical walsenders. I'm
guessing we can avoid this if we introduce another set of
lsn[NUM_SYNC_REP_WAIT_MODE] and have the logical walsenders check
and wait on that instead but I wasn't sure if that's the right approach.

pgbench numbers:

// Empty standby_slot_names_from_syncrep
query mode: simple
number of clients: 8
number of threads: 8
maximum number of tries: 1
duration: 1800 s
number of transactions actually processed: 1720173
number of failed transactions: 0 (0.000%)
latency average = 8.371 ms
initial connection time = 7.963 ms
tps = 955.651025 (without initial connection time)

// standby_slot_names_from_syncrep = 'true'
scaling factor: 200
query mode: simple
number of clients: 8
number of threads: 8
maximum number of tries: 1
duration: 1800 s
number of transactions actually processed: 1630105
number of failed transactions: 0 (0.000%)
latency average = 8.834 ms
initial connection time = 7.670 ms
tps = 905.610230 (without initial connection time)

Thanks,

--
John Hsu - Amazon Web Services

Attachment

Re: Allow logical failover slots to wait on synchronous replication

From
Nathan Bossart
Date:
On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote:
> The existing 'standby_slot_names' isn't great for users who are running
> clusters with quorum-based synchronous replicas. For instance, if
> the user has  synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
> bit tedious to have to reconfigure the standby_slot_names to set it to
> the most updated 3 sync replicas whenever different sync replicas start
> lagging. In the event that both GUCs are set, 'standby_slot_names' takes
> precedence.

Hm.  IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E"
to get the desired behavior today.  That might ordinarily be okay, but it
could cause logical replication to be held back unnecessarily if one of the
replicas falls behind for whatever reason.  A way to tie standby_slot_names
to synchronous replication instead does seem like it would be useful in
this case.

-- 
nathan



Re: Allow logical failover slots to wait on synchronous replication

From
Bertrand Drouvot
Date:
Hi,

On Mon, Jun 10, 2024 at 09:25:10PM -0500, Nathan Bossart wrote:
> On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote:
> > The existing 'standby_slot_names' isn't great for users who are running
> > clusters with quorum-based synchronous replicas. For instance, if
> > the user has  synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
> > bit tedious to have to reconfigure the standby_slot_names to set it to
> > the most updated 3 sync replicas whenever different sync replicas start
> > lagging. In the event that both GUCs are set, 'standby_slot_names' takes
> > precedence.
> 
> Hm.  IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E"
> to get the desired behavior today.  That might ordinarily be okay, but it
> could cause logical replication to be held back unnecessarily if one of the
> replicas falls behind for whatever reason.  A way to tie standby_slot_names
> to synchronous replication instead does seem like it would be useful in
> this case.

FWIW, I have the same understanding and also think your proposal would be
useful in this case.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Allow logical failover slots to wait on synchronous replication

From
Bertrand Drouvot
Date:
Hi,

On Tue, Jun 11, 2024 at 10:00:46AM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Mon, Jun 10, 2024 at 09:25:10PM -0500, Nathan Bossart wrote:
> > On Mon, Jun 10, 2024 at 03:51:05PM -0700, John H wrote:
> > > The existing 'standby_slot_names' isn't great for users who are running
> > > clusters with quorum-based synchronous replicas. For instance, if
> > > the user has  synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
> > > bit tedious to have to reconfigure the standby_slot_names to set it to
> > > the most updated 3 sync replicas whenever different sync replicas start
> > > lagging. In the event that both GUCs are set, 'standby_slot_names' takes
> > > precedence.
> > 
> > Hm.  IIUC you'd essentially need to set standby_slot_names to "A,B,C,D,E"
> > to get the desired behavior today.  That might ordinarily be okay, but it
> > could cause logical replication to be held back unnecessarily if one of the
> > replicas falls behind for whatever reason.  A way to tie standby_slot_names
> > to synchronous replication instead does seem like it would be useful in
> > this case.
> 
> FWIW, I have the same understanding and also think your proposal would be
> useful in this case.

A few random comments about v1:

1 ====

+               int mode = SyncRepWaitMode;

It's set to SyncRepWaitMode and then never change. Worth to get rid of "mode"?

2 ====

+               static XLogRecPtr       lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr};

I did some testing and saw that the lsn[] values were not always set to
InvalidXLogRecPtr right after. It looks like that, in that case, we should
avoid setting the lsn[] values at compile time. Then, what about?

1. remove the "static". 

or

2. keep the static but set the lsn[] values after its declaration.

3 ====

-       /*
-        * Return false if not all the standbys have caught up to the specified
-        * WAL location.
-        */
-       if (caught_up_slot_num != standby_slot_names_config->nslotnames)
-               return false;
+               if (!XLogRecPtrIsInvalid(lsn[mode]) && lsn[mode] >= wait_for_lsn)
+                       return true;

lsn[] values are/(should have been, see 2 above) just been initialized to
InvalidXLogRecPtr so that XLogRecPtrIsInvalid(lsn[mode]) will always return
true. I think this check is not needed then.

4 ====

> > > I did some very brief pgbench runs to compare the latency. Client instance
> > > was running pgbench and 10 logical clients while the Postgres box hosted
> > > the writer and 5 synchronous replicas.

> > > There's a hit to TPS

Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
and standby_slot_names not empty?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Jun 11, 2024 at 4:21 AM John H <johnhyvr@gmail.com> wrote:
>
> Building on bf279ddd1c, this patch introduces a GUC
> 'standby_slot_names_from_syncrep' which allows logical failover slots
> to wait for changes to have been synchronously replicated before sending
> the decoded changes to logical subscribers.
>
> The existing 'standby_slot_names' isn't great for users who are running
> clusters with quorum-based synchronous replicas. For instance, if
> the user has  synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' it's a
> bit tedious to have to reconfigure the standby_slot_names to set it to
> the most updated 3 sync replicas whenever different sync replicas start
> lagging. In the event that both GUCs are set, 'standby_slot_names' takes
> precedence.
>
> I did some very brief pgbench runs to compare the latency. Client instance
> was running pgbench and 10 logical clients while the Postgres box hosted
> the writer and 5 synchronous replicas.
>
> There's a hit to TPS, which I'm thinking is due to more contention on the
> SyncRepLock, and that scales with the number of logical walsenders. I'm
> guessing we can avoid this if we introduce another set of
> lsn[NUM_SYNC_REP_WAIT_MODE] and have the logical walsenders check
> and wait on that instead but I wasn't sure if that's the right approach.
>
> pgbench numbers:
>
> // Empty standby_slot_names_from_syncrep
> query mode: simple
..
> latency average = 8.371 ms
> initial connection time = 7.963 ms
> tps = 955.651025 (without initial connection time)
>
> // standby_slot_names_from_syncrep = 'true'
> scaling factor: 200
...
> latency average = 8.834 ms
> initial connection time = 7.670 ms
> tps = 905.610230 (without initial connection time)
>

The reading indicates when you set 'standby_slot_names_from_syncrep',
the TPS reduces as compared to when it is not set. It would be better
to see the data comparing 'standby_slot_names_from_syncrep' and the
existing parameter 'standby_slot_names'.

I see the value in your idea but was wondering if can we do something
without introducing a new GUC for this. Can we make it a default
behavior that logical slots marked with a failover option will wait
for 'synchronous_standby_names' as per your patch's idea unless
'standby_slot_names' is specified? I don't know if there is any value
in setting the 'failover' option for a slot without specifying
'standby_slot_names', so was wondering if we can additionally tie it
to 'synchronous_standby_names'. Any better ideas?

--
With Regards,
Amit Kapila.



Hi,

Thanks Bertrand for taking a look at the patch.

On Mon, Jun 17, 2024 at 8:19 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

>
> +               int mode = SyncRepWaitMode;
>
> It's set to SyncRepWaitMode and then never change. Worth to get rid of "mode"?
>

I took a deeper look at this with GDB and I think it's necessary to
cache the value of "mode".
We first check:

if (mode == SYNC_REP_NO_WAIT)
return true;

However after this check it's possible to receive a SIGHUP changing
SyncRepWaitMode
to SYNC_REP_NO_WAIT (e.g. synchronous_commit = 'on' -> 'off'), leading
to lsn[-1].

> 2 ====
>
> +               static XLogRecPtr       lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr};
>
> I did some testing and saw that the lsn[] values were not always set to
> InvalidXLogRecPtr right after. It looks like that, in that case, we should
> avoid setting the lsn[] values at compile time. Then, what about?
>
> 1. remove the "static".
>
> or
>
> 2. keep the static but set the lsn[] values after its declaration.

I'd prefer to keep the static because it reduces unnecessary
contention on SyncRepLock if logical client has fallen behind.
I'll add a change with your second suggestion.



> 3 ====
>
> -       /*
> -        * Return false if not all the standbys have caught up to the specified
> -        * WAL location.
> -        */
> -       if (caught_up_slot_num != standby_slot_names_config->nslotnames)
> -               return false;
> +               if (!XLogRecPtrIsInvalid(lsn[mode]) && lsn[mode] >= wait_for_lsn)
> +                       return true;
>
> lsn[] values are/(should have been, see 2 above) just been initialized to
> InvalidXLogRecPtr so that XLogRecPtrIsInvalid(lsn[mode]) will always return
> true. I think this check is not needed then.

Removed.

> Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
> and standby_slot_names not empty?

I didn't think 'standby_slot_names' would impact TPS as much since
it's not grabbing the SyncRepLock but here's a quick test.
Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
pgbench all running from the same server.

Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5

Result with: standby_slot_names =
'replica_1,replica_2,replica_3,replica_4,replica_5'

latency average = 5.600 ms
latency stddev = 2.854 ms
initial connection time = 5.503 ms
tps = 714.148263 (without initial connection time)

Result with: standby_slot_names_from_syncrep = 'true',
synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'

latency average = 5.740 ms
latency stddev = 2.543 ms
initial connection time = 4.093 ms
tps = 696.776249 (without initial connection time)

Result with nothing set:

latency average = 5.090 ms
latency stddev = 3.467 ms
initial connection time = 4.989 ms
tps = 785.665963 (without initial connection time)

Again I think it's possible to improve the synchronous numbers if we
cache but I'll try that out in a bit.



--
John Hsu - Amazon Web Services



Hi Amit,

Thanks for taking a look.

On Tue, Jun 18, 2024 at 10:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>

>
> The reading indicates when you set 'standby_slot_names_from_syncrep',
> the TPS reduces as compared to when it is not set. It would be better
> to see the data comparing 'standby_slot_names_from_syncrep' and the
> existing parameter 'standby_slot_names'.

I added new benchmark numbers in the reply to Bertrand, but I'll
include in this thread for posterity.

Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
pgbench all running from the same server.
Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5

Result with: standby_slot_names =
'replica_1,replica_2,replica_3,replica_4,replica_5'

latency average = 5.600 ms
latency stddev = 2.854 ms
initial connection time = 5.503 ms
tps = 714.148263 (without initial connection time)

Result with: standby_slot_names_from_syncrep = 'true',
synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'

latency average = 5.740 ms
latency stddev = 2.543 ms
initial connection time = 4.093 ms
tps = 696.776249 (without initial connection time)

Result with nothing set:

latency average = 5.090 ms
latency stddev = 3.467 ms
initial connection time = 4.989 ms
tps = 785.665963 (without initial connection time)

> Can we make it a default
> behavior that logical slots marked with a failover option will wait
> for 'synchronous_standby_names' as per your patch's idea unless
> 'standby_slot_names' is specified? I don't know if there is any value
> in setting the 'failover' option for a slot without specifying
> 'standby_slot_names', so was wondering if we can additionally tie it
> to 'synchronous_standby_names'. Any better ideas?
>

No, I think that works pretty cleanly actually. Reserving some special
keyword isn't great
which is the only other thing I can think of. I've updated the patch
and tests to reflect that.

Attached the patch that addresses these changes.



--
John Hsu - Amazon Web Services

Attachment

Re: Allow logical failover slots to wait on synchronous replication

From
Bertrand Drouvot
Date:
Hi,

On Mon, Jul 08, 2024 at 12:08:58PM -0700, John H wrote:
> I took a deeper look at this with GDB and I think it's necessary to
> cache the value of "mode".
> We first check:
> 
> if (mode == SYNC_REP_NO_WAIT)
> return true;
> 
> However after this check it's possible to receive a SIGHUP changing
> SyncRepWaitMode
> to SYNC_REP_NO_WAIT (e.g. synchronous_commit = 'on' -> 'off'), leading
> to lsn[-1].

What about adding "SyncRepWaitMode" as a third StandbySlotsHaveCaughtup()
parameter then? (so that the function will used whatever value was passed during
the call).

> > 2 ====
> >
> > +               static XLogRecPtr       lsn[NUM_SYNC_REP_WAIT_MODE] = {InvalidXLogRecPtr};
> >
> > I did some testing and saw that the lsn[] values were not always set to
> > InvalidXLogRecPtr right after. It looks like that, in that case, we should
> > avoid setting the lsn[] values at compile time. Then, what about?
> >
> > 1. remove the "static".
> >
> > or
> >
> > 2. keep the static but set the lsn[] values after its declaration.
> 
> I'd prefer to keep the static because it reduces unnecessary
> contention on SyncRepLock if logical client has fallen behind.
> I'll add a change with your second suggestion.

Got it, you want lsn[] to be initialized only one time and that each call to
StandbySlotsHaveCaughtup() relies on the values that were previously stored in 
lsn[] and then return if lsn[mode] >= wait_for_lsn.

Then I think that:

1 ===

That's worth additional comments in the code.

2 ===

+               if (!initialized)
+               {
+                       for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+                       {
+                               lsn[i] = InvalidXLogRecPtr;
+                       }
+               }

Looks like setting initialized to true is missing once done.

Also,

3 ===

+               /* Cache values to reduce contention on lock */
+               for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+               {
+                       lsn[i] = walsndctl->lsn[i];
+               }

NUM_SYNC_REP_WAIT_MODE is small but as the goal is the keep the lock time as
short as possible I wonder if it wouldn't be better to use memcpy() here instead
of this for loop.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Jul 9, 2024 at 12:42 AM John H <johnhyvr@gmail.com> wrote:
>
>
> > Can we make it a default
> > behavior that logical slots marked with a failover option will wait
> > for 'synchronous_standby_names' as per your patch's idea unless
> > 'standby_slot_names' is specified? I don't know if there is any value
> > in setting the 'failover' option for a slot without specifying
> > 'standby_slot_names', so was wondering if we can additionally tie it
> > to 'synchronous_standby_names'. Any better ideas?
> >
>
> No, I think that works pretty cleanly actually. Reserving some special
> keyword isn't great
> which is the only other thing I can think of. I've updated the patch
> and tests to reflect that.
>
> Attached the patch that addresses these changes.

Thank You for the patch. I like the overall idea, it is a useful one
and will make user experience better. Please find few comments.

1)
I agree with the idea that instead of introducing new GUC, we can make
failover logical slots wait on 'synchronous_standby_names', but this
will leave user no option to unlink failover-enabled logical
subscribers from the wait on synchronous standbys. We provide user a
way to switch off automatic slot-sync by disabling
'sync_replication_slots' and we also provide user a way to manually
sync the slots using function 'pg_sync_replication_slots()' and
nowhere we make it mandatory to set 'synchronized_standby_slots'; in
fact in docs, it is a recommended setting and not a mandatory one.
User can always create failover slots, switch off automatic slot sync
and disable wait on standbys by not specifying
'synchronized_standby_slots' and do the slot-sync and consistency
checks manually when needed. I feel, for worst case scenarios, we
should provide user an option to delink failover-enabled logical
subscriptions from  'synchronous_standby_names'.

We can have 'synchronized_standby_slots' (standby_slot_names) to
accept one such option as in 'SAME_AS_SYNCREP_STANDBYS' or  say
'DEFAULT'. So when 'synchronous_standby_names' is comma separated
list, we pick those slots; if it is empty, then no wait on standbys,
and if its value is 'DEFAULT' as configured by user, then go with
'synchronous_standby_names'. Thoughts?


2)
When  'synchronized_standby_slots' is configured but standby named in
it is down blocking logical replication, then we get a WARNING in
subscriber's log file:

WARNING:  replication slot "standby_2" specified in parameter
synchronized_standby_slots does not have active_pid.
DETAIL:  Logical replication is waiting on the standby associated with
"standby_2".
HINT:  Consider starting standby associated with "standby_2" or amend
parameter synchronized_standby_slots.

But OTOH, when  'synchronous_standby_names' is configured instead of
'synchronized_standby_slots' and any of the standbys listed is down
blocking logical replication, we do not get any sort of warning. It is
inconsistent behavior. Also user might be left clueless on why
subscribers are not getting changes.

thanks
Shveta



On Thu, Jul 18, 2024 at 9:25 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Jul 9, 2024 at 12:42 AM John H <johnhyvr@gmail.com> wrote:
> >
> >
> > > Can we make it a default
> > > behavior that logical slots marked with a failover option will wait
> > > for 'synchronous_standby_names' as per your patch's idea unless
> > > 'standby_slot_names' is specified? I don't know if there is any value
> > > in setting the 'failover' option for a slot without specifying
> > > 'standby_slot_names', so was wondering if we can additionally tie it
> > > to 'synchronous_standby_names'. Any better ideas?
> > >
> >
> > No, I think that works pretty cleanly actually. Reserving some special
> > keyword isn't great
> > which is the only other thing I can think of. I've updated the patch
> > and tests to reflect that.
> >
> > Attached the patch that addresses these changes.
>
> Thank You for the patch. I like the overall idea, it is a useful one
> and will make user experience better. Please find few comments.
>
> 1)
> I agree with the idea that instead of introducing new GUC, we can make
> failover logical slots wait on 'synchronous_standby_names', but this
> will leave user no option to unlink failover-enabled logical
> subscribers from the wait on synchronous standbys. We provide user a
> way to switch off automatic slot-sync by disabling
> 'sync_replication_slots' and we also provide user a way to manually
> sync the slots using function 'pg_sync_replication_slots()' and
> nowhere we make it mandatory to set 'synchronized_standby_slots'; in
> fact in docs, it is a recommended setting and not a mandatory one.
> User can always create failover slots, switch off automatic slot sync
> and disable wait on standbys by not specifying
> 'synchronized_standby_slots' and do the slot-sync and consistency
> checks manually when needed. I feel, for worst case scenarios, we
> should provide user an option to delink failover-enabled logical
> subscriptions from  'synchronous_standby_names'.
>
> We can have 'synchronized_standby_slots' (standby_slot_names) to
> accept one such option as in 'SAME_AS_SYNCREP_STANDBYS' or  say
> 'DEFAULT'. So when 'synchronous_standby_names' is comma separated
> list, we pick those slots; if it is empty, then no wait on standbys,
> and if its value is 'DEFAULT' as configured by the user, then go with
> 'synchronous_standby_names'. Thoughts?

One correction here
('synchronous_standby_names-->synchronized_standby_slots). Corrected
version:

So when 'synchronized_standby_slots' is comma separated list, we pick
those slots; if it is empty, then no wait on standbys, and if its
value is 'DEFAULT' as configured by user, then go with
'synchronous_standby_names'. Thoughts?

>
> 2)
> When  'synchronized_standby_slots' is configured but standby named in
> it is down blocking logical replication, then we get a WARNING in
> subscriber's log file:
>
> WARNING:  replication slot "standby_2" specified in parameter
> synchronized_standby_slots does not have active_pid.
> DETAIL:  Logical replication is waiting on the standby associated with
> "standby_2".
> HINT:  Consider starting standby associated with "standby_2" or amend
> parameter synchronized_standby_slots.
>
> But OTOH, when  'synchronous_standby_names' is configured instead of
> 'synchronized_standby_slots' and any of the standbys listed is down
> blocking logical replication, we do not get any sort of warning. It is
> inconsistent behavior. Also user might be left clueless on why
> subscribers are not getting changes.
>
> thanks
> Shveta



Hi Shveta,

Thanks for taking a look at the patch.

> > will leave user no option to unlink failover-enabled logical
> > subscribers from the wait on synchronous standbys.

That's a good point. I'm a bit biased in that I don't think there's a
great reason why someone would
want to replicate logical changes out of the synchronous cluster
without it having been synchronously replicated
 but yes this would be different behavior compared to strictly the slot one.

> ...
> So when 'synchronized_standby_slots' is comma separated list, we pick
> those slots; if it is empty, then no wait on standbys, and if its
> value is 'DEFAULT' as configured by user, then go with
> 'synchronous_standby_names'. Thoughts?

I think I'd prefer having a separate GUC if the alternative is to reserve
special keywords in 'synchronized_standby_slots' but I'm not sure if I
feel strongly about that.


> > 2)
> > When  'synchronized_standby_slots' is configured but standby named in
> > it is down blocking logical replication, then we get a WARNING in
> > subscriber's log file:
> >
> > WARNING:  replication slot "standby_2" specified in parameter
> > synchronized_standby_slots does not have active_pid.
> > DETAIL:  Logical replication is waiting on the standby associated with
> > "standby_2".
> > HINT:  Consider starting standby associated with "standby_2" or amend
> > parameter synchronized_standby_slots.
> >
> > But OTOH, when  'synchronous_standby_names' is configured instead of
> > 'synchronized_standby_slots' and any of the standbys listed is down
> > blocking logical replication, we do not get any sort of warning. It is
> > inconsistent behavior. Also user might be left clueless on why
> > subscribers are not getting changes.

Ah that's a gap. Let me add some logging/warning in a similar fashion.
Although I think I'd have the warning be relatively generic (e.g.
changes are blocked because
they're not synchronously committed)

Thanks,

--
John Hsu - Amazon Web Services



Hi Bertrand,

> 1 ===
> ...
> That's worth additional comments in the code.

There's this comment already about caching the value already, not sure
if you prefer something more?

/* Cache values to reduce contention on lock */

> 2 ===
> ...
> Looks like setting initialized to true is missing once done.

Thanks, will update.

> 3 ===
> ...
> NUM_SYNC_REP_WAIT_MODE is small but as the goal is the keep the lock time as
> short as possible I wonder if it wouldn't be better to use memcpy() here instead
> of this for loop.
>

It results in a "Wdiscarded-qualifiers" which is safe given we take
the lock, but adds noise?
What do you think?

"slot.c:2756:46: warning: passing argument 2 of ‘memcpy’ discards
‘volatile’ qualifier from pointer target type
[-Wdiscarded-qualifiers]"

Thanks,


--
John Hsu - Amazon Web Services



On Fri, Jul 19, 2024 at 2:52 AM John H <johnhyvr@gmail.com> wrote:
>
> Hi Shveta,
>
> Thanks for taking a look at the patch.
>
> > > will leave user no option to unlink failover-enabled logical
> > > subscribers from the wait on synchronous standbys.
>
> That's a good point. I'm a bit biased in that I don't think there's a
> great reason why someone would
> want to replicate logical changes out of the synchronous cluster
> without it having been synchronously replicated
>  but yes this would be different behavior compared to strictly the slot one.
>
> > ...
> > So when 'synchronized_standby_slots' is comma separated list, we pick
> > those slots; if it is empty, then no wait on standbys, and if its
> > value is 'DEFAULT' as configured by user, then go with
> > 'synchronous_standby_names'. Thoughts?
>
> I think I'd prefer having a separate GUC if the alternative is to reserve
> special keywords in 'synchronized_standby_slots' but I'm not sure if I
> feel strongly about that.

My only concern is, earlier we provided a way to set the failover
property of slots even without mandatorily wait on physical standbys.
But now we will be changing this behaviour. Okay, we can see what
other comments. If we plan to go this way, we can change docs to
clearly mention this.


> > > 2)
> > > When  'synchronized_standby_slots' is configured but standby named in
> > > it is down blocking logical replication, then we get a WARNING in
> > > subscriber's log file:
> > >
> > > WARNING:  replication slot "standby_2" specified in parameter
> > > synchronized_standby_slots does not have active_pid.
> > > DETAIL:  Logical replication is waiting on the standby associated with
> > > "standby_2".
> > > HINT:  Consider starting standby associated with "standby_2" or amend
> > > parameter synchronized_standby_slots.
> > >
> > > But OTOH, when  'synchronous_standby_names' is configured instead of
> > > 'synchronized_standby_slots' and any of the standbys listed is down
> > > blocking logical replication, we do not get any sort of warning. It is
> > > inconsistent behavior. Also user might be left clueless on why
> > > subscribers are not getting changes.
>
> Ah that's a gap. Let me add some logging/warning in a similar fashion.
> Although I think I'd have the warning be relatively generic (e.g.
> changes are blocked because
> they're not synchronously committed)
>

okay, sounds good.

thanks
Shveta



On Mon, Jul 22, 2024 at 9:12 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Jul 19, 2024 at 2:52 AM John H <johnhyvr@gmail.com> wrote:
> >
> > Hi Shveta,
> >
> > Thanks for taking a look at the patch.
> >
> > > > will leave user no option to unlink failover-enabled logical
> > > > subscribers from the wait on synchronous standbys.
> >
> > That's a good point. I'm a bit biased in that I don't think there's a
> > great reason why someone would
> > want to replicate logical changes out of the synchronous cluster
> > without it having been synchronously replicated
> >  but yes this would be different behavior compared to strictly the slot one.
> >
> > > ...
> > > So when 'synchronized_standby_slots' is comma separated list, we pick
> > > those slots; if it is empty, then no wait on standbys, and if its
> > > value is 'DEFAULT' as configured by user, then go with
> > > 'synchronous_standby_names'. Thoughts?
> >
> > I think I'd prefer having a separate GUC if the alternative is to reserve
> > special keywords in 'synchronized_standby_slots' but I'm not sure if I
> > feel strongly about that.
>
> My only concern is, earlier we provided a way to set the failover
> property of slots even without mandatorily wait on physical standbys.
> But now we will be changing this behaviour.
>

Adding a new GUC as John suggests addressing this concern is one way
to go but we should think some more before adding a new GUC. Then
second as you are proposing to add a special value for GUC
synchronized_standby_slots will also have a downside in that it will
create dependency among two GUCs (synchronized_standby_slots and
synchronous_standby_names) which can also make the code complex.

Yet another possibility is to have a slot-level parameter (something
like failover_wait_for) which can be used to decide the GUC preference
for failover-enabled slots.

As this is a new feature and we haven't got much feedback from users
so like John, I am also not very sure how much merit we have in
retaining the old behavior where failover slots don't need to wait for
any of the standbys. But anyway, we have at least some escape route
where logical subscribers keep on waiting for some physical standby
that is down to come back and one may want to use that in some
situations, so there is clearly some value in retaining old behavior.

--
With Regards,
Amit Kapila.



On Tue, Jul 9, 2024 at 12:39 AM John H <johnhyvr@gmail.com> wrote:
>
> > Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
> > and standby_slot_names not empty?
>
> I didn't think 'standby_slot_names' would impact TPS as much since
> it's not grabbing the SyncRepLock but here's a quick test.
> Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
> pgbench all running from the same server.
>
> Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5
>
> Result with: standby_slot_names =
> 'replica_1,replica_2,replica_3,replica_4,replica_5'
>
> latency average = 5.600 ms
> latency stddev = 2.854 ms
> initial connection time = 5.503 ms
> tps = 714.148263 (without initial connection time)
>
> Result with: standby_slot_names_from_syncrep = 'true',
> synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
>
> latency average = 5.740 ms
> latency stddev = 2.543 ms
> initial connection time = 4.093 ms
> tps = 696.776249 (without initial connection time)
>
> Result with nothing set:
>
> latency average = 5.090 ms
> latency stddev = 3.467 ms
> initial connection time = 4.989 ms
> tps = 785.665963 (without initial connection time)
>
> Again I think it's possible to improve the synchronous numbers if we
> cache but I'll try that out in a bit.
>

Okay, so the tests done till now conclude that we won't get the
benefit by using 'standby_slot_names_from_syncrep'. Now, if we
increase the number of standby's in both lists and still keep ANY 3 in
synchronous_standby_names then the results may vary. We should try to
find out if there is a performance benefit with the use of
synchronous_standby_names in the normal configurations like the one
you used in the above tests to prove the value of this patch.

--
With Regards,
Amit Kapila.



On Tue, Jul 23, 2024 at 10:35 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 9, 2024 at 12:39 AM John H <johnhyvr@gmail.com> wrote:
> >
> > > Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
> > > and standby_slot_names not empty?
> >
> > I didn't think 'standby_slot_names' would impact TPS as much since
> > it's not grabbing the SyncRepLock but here's a quick test.
> > Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
> > pgbench all running from the same server.
> >
> > Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5
> >
> > Result with: standby_slot_names =
> > 'replica_1,replica_2,replica_3,replica_4,replica_5'
> >
> > latency average = 5.600 ms
> > latency stddev = 2.854 ms
> > initial connection time = 5.503 ms
> > tps = 714.148263 (without initial connection time)
> >
> > Result with: standby_slot_names_from_syncrep = 'true',
> > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> >
> > latency average = 5.740 ms
> > latency stddev = 2.543 ms
> > initial connection time = 4.093 ms
> > tps = 696.776249 (without initial connection time)
> >
> > Result with nothing set:
> >
> > latency average = 5.090 ms
> > latency stddev = 3.467 ms
> > initial connection time = 4.989 ms
> > tps = 785.665963 (without initial connection time)
> >
> > Again I think it's possible to improve the synchronous numbers if we
> > cache but I'll try that out in a bit.
> >
>
> Okay, so the tests done till now conclude that we won't get the
> benefit by using 'standby_slot_names_from_syncrep'. Now, if we
> increase the number of standby's in both lists and still keep ANY 3 in
> synchronous_standby_names then the results may vary. We should try to
> find out if there is a performance benefit with the use of
> synchronous_standby_names in the normal configurations like the one
> you used in the above tests to prove the value of this patch.
>

I didn't fully understand the parameters mentioned above, specifically
what 'latency stddev' and 'latency average' represent.. But shouldn't
we see the benefit/value of this patch by having a setup where a
particular standby is slow in sending the response back to primary
(could be due to network lag or other reasons) and then measuring the
latency in receiving changes on failover-enabled logical subscribers?
We can perform this test with both of the below settings and say make
D and E slow in sending responses:
1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot.

thanks
Shveta



On Fri, Jul 26, 2024 at 3:28 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Jul 23, 2024 at 10:35 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jul 9, 2024 at 12:39 AM John H <johnhyvr@gmail.com> wrote:
> > >
> > > > Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
> > > > and standby_slot_names not empty?
> > >
> > > I didn't think 'standby_slot_names' would impact TPS as much since
> > > it's not grabbing the SyncRepLock but here's a quick test.
> > > Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
> > > pgbench all running from the same server.
> > >
> > > Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5
> > >
> > > Result with: standby_slot_names =
> > > 'replica_1,replica_2,replica_3,replica_4,replica_5'
> > >
> > > latency average = 5.600 ms
> > > latency stddev = 2.854 ms
> > > initial connection time = 5.503 ms
> > > tps = 714.148263 (without initial connection time)
> > >
> > > Result with: standby_slot_names_from_syncrep = 'true',
> > > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> > >
> > > latency average = 5.740 ms
> > > latency stddev = 2.543 ms
> > > initial connection time = 4.093 ms
> > > tps = 696.776249 (without initial connection time)
> > >
> > > Result with nothing set:
> > >
> > > latency average = 5.090 ms
> > > latency stddev = 3.467 ms
> > > initial connection time = 4.989 ms
> > > tps = 785.665963 (without initial connection time)
> > >
> > > Again I think it's possible to improve the synchronous numbers if we
> > > cache but I'll try that out in a bit.
> > >
> >
> > Okay, so the tests done till now conclude that we won't get the
> > benefit by using 'standby_slot_names_from_syncrep'. Now, if we
> > increase the number of standby's in both lists and still keep ANY 3 in
> > synchronous_standby_names then the results may vary. We should try to
> > find out if there is a performance benefit with the use of
> > synchronous_standby_names in the normal configurations like the one
> > you used in the above tests to prove the value of this patch.
> >
>
> I didn't fully understand the parameters mentioned above, specifically
> what 'latency stddev' and 'latency average' represent.. But shouldn't
> we see the benefit/value of this patch by having a setup where a
> particular standby is slow in sending the response back to primary
> (could be due to network lag or other reasons) and then measuring the
> latency in receiving changes on failover-enabled logical subscribers?
> We can perform this test with both of the below settings and say make
> D and E slow in sending responses:
> 1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> 2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot.
>

Yes, I also expect the patch should perform better in such a scenario
but it is better to test it. Also, irrespective of that, we should
investigate why the reported case is slower for
synchronous_standby_names and see if we can improve it.

BTW, you for 2), I think you wanted to say synchronized_standby_slots,
not standby_slot_names. We have recently changed the GUC name.


--
With Regards,
Amit Kapila.



On Fri, Jul 26, 2024 at 5:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jul 26, 2024 at 3:28 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Tue, Jul 23, 2024 at 10:35 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Jul 9, 2024 at 12:39 AM John H <johnhyvr@gmail.com> wrote:
> > > >
> > > > > Out of curiosity, did you compare with standby_slot_names_from_syncrep set to off
> > > > > and standby_slot_names not empty?
> > > >
> > > > I didn't think 'standby_slot_names' would impact TPS as much since
> > > > it's not grabbing the SyncRepLock but here's a quick test.
> > > > Writer with 5 synchronous replicas, 10 pg_recvlogical clients and
> > > > pgbench all running from the same server.
> > > >
> > > > Command: pgbench -c 4 -j 4 -T 600 -U "ec2-user" -d postgres -r -P 5
> > > >
> > > > Result with: standby_slot_names =
> > > > 'replica_1,replica_2,replica_3,replica_4,replica_5'
> > > >
> > > > latency average = 5.600 ms
> > > > latency stddev = 2.854 ms
> > > > initial connection time = 5.503 ms
> > > > tps = 714.148263 (without initial connection time)
> > > >
> > > > Result with: standby_slot_names_from_syncrep = 'true',
> > > > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> > > >
> > > > latency average = 5.740 ms
> > > > latency stddev = 2.543 ms
> > > > initial connection time = 4.093 ms
> > > > tps = 696.776249 (without initial connection time)
> > > >
> > > > Result with nothing set:
> > > >
> > > > latency average = 5.090 ms
> > > > latency stddev = 3.467 ms
> > > > initial connection time = 4.989 ms
> > > > tps = 785.665963 (without initial connection time)
> > > >
> > > > Again I think it's possible to improve the synchronous numbers if we
> > > > cache but I'll try that out in a bit.
> > > >
> > >
> > > Okay, so the tests done till now conclude that we won't get the
> > > benefit by using 'standby_slot_names_from_syncrep'. Now, if we
> > > increase the number of standby's in both lists and still keep ANY 3 in
> > > synchronous_standby_names then the results may vary. We should try to
> > > find out if there is a performance benefit with the use of
> > > synchronous_standby_names in the normal configurations like the one
> > > you used in the above tests to prove the value of this patch.
> > >
> >
> > I didn't fully understand the parameters mentioned above, specifically
> > what 'latency stddev' and 'latency average' represent.. But shouldn't
> > we see the benefit/value of this patch by having a setup where a
> > particular standby is slow in sending the response back to primary
> > (could be due to network lag or other reasons) and then measuring the
> > latency in receiving changes on failover-enabled logical subscribers?
> > We can perform this test with both of the below settings and say make
> > D and E slow in sending responses:
> > 1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> > 2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot.
> >
>
> Yes, I also expect the patch should perform better in such a scenario
> but it is better to test it. Also, irrespective of that, we should
> investigate why the reported case is slower for
> synchronous_standby_names and see if we can improve it.

+1

> BTW, you for 2), I think you wanted to say synchronized_standby_slots,
> not standby_slot_names. We have recently changed the GUC name.

yes, sorry, synchronized_standby_slots it is.

thanks
Shveta



Re: Allow logical failover slots to wait on synchronous replication

From
Bertrand Drouvot
Date:
Hi John,

On Thu, Jul 18, 2024 at 02:22:08PM -0700, John H wrote:
> Hi Bertrand,
> 
> > 1 ===
> > ...
> > That's worth additional comments in the code.
> 
> There's this comment already about caching the value already, not sure
> if you prefer something more?
> 
> /* Cache values to reduce contention on lock */

Yeah, at the same place as the static lsn[] declaration, something like:

static XLogRecPtr       lsn[NUM_SYNC_REP_WAIT_MODE]; /* cached LSNs */

but that may just be a matter of taste.

> > 3 ===
> > ...
> > NUM_SYNC_REP_WAIT_MODE is small but as the goal is the keep the lock time as
> > short as possible I wonder if it wouldn't be better to use memcpy() here instead
> > of this for loop.
> >
> 
> It results in a "Wdiscarded-qualifiers" which is safe given we take
> the lock, but adds noise?
> What do you think?
> 
> "slot.c:2756:46: warning: passing argument 2 of ‘memcpy’ discards
> ‘volatile’ qualifier from pointer target type
> [-Wdiscarded-qualifiers]"

Right, we may want to cast it then but given that the for loop is "small" I think
that's also fine to keep the for loop.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi Shveta,

On Sun, Jul 21, 2024 at 8:42 PM shveta malik <shveta.malik@gmail.com> wrote:

> > Ah that's a gap. Let me add some logging/warning in a similar fashion.
> > Although I think I'd have the warning be relatively generic (e.g.
> > changes are blocked because
> > they're not synchronously committed)
> >
>
> okay, sounds good.
>
> thanks
> Shveta

I took a look at having similar warnings the existing
'synchronized_standby_slots' feature has
and I don't think it's particularly feasible.

The checks/warnings for 'synchronized_standby_slots' are intended to
protect against misconfiguration.
They consist of slot validation (valid slot_name, not logical slot,
slot has not been invalidated), and
whether or not the slot is active.

I don't think there's a "misconfiguration" equivalent for waiting on
synchronous_commit.
With the current proposal, once you have (synchronous_commit enabled
&& failover_slots), logical
decoding is dependent on whether or not the writes have been
replicated to a synchronous replica.
If there is no data being replicated out of the logical slot, it is
because from the perspective of the
database no writes have been committed yet. I don't think it would
make sense to add logging/warning as to
why a transaction is still not committed (e.g. which potential replica
is the one lagging). There isn't a
nice way to determine why synchronous commit is waiting without being
particularly invasive, and even then
it could be particularly noisy (e.g. provide all the application_names
that we are potentially waiting on).

Thanks,


--
John Hsu - Amazon Web Services



Hi Bertrand,

On Sun, Jul 28, 2024 at 10:00 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

> Yeah, at the same place as the static lsn[] declaration, something like:
>
> static XLogRecPtr       lsn[NUM_SYNC_REP_WAIT_MODE]; /* cached LSNs */
>
> but that may just be a matter of taste.
>

I've updated the patch to reflect that.

>
> Right, we may want to cast it then but given that the for loop is "small" I think
> that's also fine to keep the for loop.
>

Ah I see what you mean. I've updated these changes and attached the
patch to the other thread.


Thanks,
--
John Hsu - Amazon Web Services



Re: Allow logical failover slots to wait on synchronous replication

From
Amit Kapila
Date:
On Tue, Aug 27, 2024 at 12:58 AM John H <johnhyvr@gmail.com> wrote:
>
> For instance, in Shveta's suggestion of
>
> > > > We can perform this test with both of the below settings and say make
> > > > D and E slow in sending responses:
> > > > 1) synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> > > > 2) standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot.
>
> if the server associated with E_slot is just down or undergoing
> some sort of maintenance, then all logical consumers would start lagging until
> the server is back up. I could also mimic a network lag of 20 seconds
> and it's guaranteed
> that this patch will perform better.
>

I wanted a simple test where in the first case you use
synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' and in the second case
use standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot. You
can try some variations of it as well. The idea is that even if the
performance is less for synchronous_standby_names configuration, we
should be able to document it.  This will help users to decide what is
best for them.

> I re-ran the benchmarks with a longer run time of 3 hours, and testing
> a new shared cache
> for walsenders to check the value before obtaining the SyncRepLock.
>
> I also saw I was being throttled on storage in my previous benchmarks
> so I moved to a new setup.
> I benchmarked a new test case with an additional shared cache between
> all the walsenders to
> reduce potential contention on SyncRepLock, and have attached said patch.
>
> Database: Writer on it's own disk, 5 RRs on the other disk together
> Client: 10 logical clients, pgbench running from here as well
>
> 'pgbench -c 32 -j 4 -T 10800 -U "ec2-user" -d postgres -r -P 1'
>
> # Test failover_slots with synchronized_standby_slots = 'rr_1, rr_2,
> rr_3, rr_4, rr_5'
> latency average = 10.683 ms
> latency stddev = 11.851 ms
> initial connection time = 145.876 ms
> tps = 2994.595673 (without initial connection time)
>
> # Test failover_slots waiting on sync_rep no new shared cache
> latency average = 10.684 ms
> latency stddev = 12.247 ms
> initial connection time = 142.561 ms
> tps = 2994.160136 (without initial connection time)
> statement latencies in milliseconds and failures:
>
> # Test failover slots with additional shared cache
> latency average = 10.674 ms
> latency stddev = 11.917 ms
> initial connection time = 142.486 ms
> tps = 2997.315874 (without initial connection time)
>
> The tps improvement between no cache and shared_cache seems marginal, but we do
> see the slight improvement  in stddev which makes sense from a
> contention perspective.
>

What is the difference between "Test failover_slots with
synchronized_standby_slots = 'rr_1, rr_2,
> rr_3, rr_4, rr_5'" and "Test failover_slots waiting on sync_rep no new shared cache"? I want to know what
configurationdid you used for synchronous_standby_names in the latter case. 

> I think the cache would demonstrate a lot more improvement if we had
> say 1000 logical slots
> and all of them are trying to obtain SyncRepLock for updating its values.
>
> I've attached the patch but don't feel particularly strongly about the
> new shared LSN values.
>

I am also not sure especially as the test results didn't shown much
improvement and the code also becomes bit complicated. BTW, in the
0003 version in the below code:
+ /* Cache values to reduce contention */
+ LWLockAcquire(SyncRepLock, LW_SHARED);
+ memcpy((XLogRecPtr *) walsndctl->cached_lsn, (XLogRecPtr *)
walsndctl->lsn, sizeof(lsn));
+ LWLockRelease(SyncRepLock);

Which mode lsn is being copied? I am not sure if I understood this
part of the code.

In the 0002 version, in the following code [1], you are referring to
LSN mode which is enabled for logical walsender irrespective of the
mode used by the physical walsender. It is possible that they are
always the same but that is not evident from the code or comments in
the patch.
[1] :
+ /* Cache values to reduce contention on lock */
+ for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+ {
+ lsn[i] = walsndctl->lsn[i];
+ }

- ss_oldest_flush_lsn = min_restart_lsn;
+ LWLockRelease(SyncRepLock);

- return true;
+ if (lsn[mode] >= wait_for_lsn)
+ return true;

--
With Regards,
Amit Kapila.



Re: Allow logical failover slots to wait on synchronous replication

From
shveta malik
Date:
On Tue, Aug 27, 2024 at 12:56 AM John H <johnhyvr@gmail.com> wrote:
>
> Hi Shveta,
>
> On Sun, Jul 21, 2024 at 8:42 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> > > Ah that's a gap. Let me add some logging/warning in a similar fashion.
> > > Although I think I'd have the warning be relatively generic (e.g.
> > > changes are blocked because
> > > they're not synchronously committed)
> > >
> >
> > okay, sounds good.
> >
> > thanks
> > Shveta
>
> I took a look at having similar warnings the existing
> 'synchronized_standby_slots' feature has
> and I don't think it's particularly feasible.
>
> The checks/warnings for 'synchronized_standby_slots' are intended to
> protect against misconfiguration.
> They consist of slot validation (valid slot_name, not logical slot,
> slot has not been invalidated), and
> whether or not the slot is active.
>
> I don't think there's a "misconfiguration" equivalent for waiting on
> synchronous_commit.
> With the current proposal, once you have (synchronous_commit enabled
> && failover_slots), logical
> decoding is dependent on whether or not the writes have been
> replicated to a synchronous replica.
> If there is no data being replicated out of the logical slot, it is
> because from the perspective of the
> database no writes have been committed yet. I don't think it would
> make sense to add logging/warning as to
> why a transaction is still not committed (e.g. which potential replica
> is the one lagging). There isn't a
> nice way to determine why synchronous commit is waiting without being
> particularly invasive, and even then
> it could be particularly noisy (e.g. provide all the application_names
> that we are potentially waiting on).
>

Okay. Thanks for the details. I see your point. I will review to see
if anything comes to my mind for a simpler way to do this.

thanks
Shveta



Re: Allow logical failover slots to wait on synchronous replication

From
shveta malik
Date:
On Thu, Aug 29, 2024 at 2:31 AM John H <johnhyvr@gmail.com> wrote:
>
> Hi Amit,
>
> On Mon, Aug 26, 2024 at 11:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I wanted a simple test where in the first case you use
> > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' and in the second case
> > use standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot. You
> > can try some variations of it as well. The idea is that even if the
> > performance is less for synchronous_standby_names configuration, we
> > should be able to document it.  This will help users to decide what is
> > ...
> > What is the difference between "Test failover_slots with
> > synchronized_standby_slots = 'rr_1, rr_2,
> > > rr_3, rr_4, rr_5'" and "Test failover_slots waiting on sync_rep no new shared cache"? I want to know what
configurationdid you used for synchronous_standby_names in the latter case. 
>
> Sorry for the confusion due to the bad-naming of the test cases, let
> me rephrase.
>  All three tests had synchronous_standby_names = 'ANY 3 (A,B,C,D,E)'
> set with synchronous_commit = 'on', and failover_slots = 'on'
> for the 10 logical slots.
>
> # Test failover_slots with synchronized_standby_slots = 'rr_1, rr_2,
> rr_3, rr_4, rr_5'
> This is the test you wanted where the logical clients are waiting on
> all 5 slots to acknowledge the change since
> 'synchronized_standby_slots' takes priority when set.
>
> # Test failover_slots sync rep no cache
> This test has 'synchronized_standby_slots' commented out, and without
> relying on the new cache introduced in 0003.
> Logical clients will wait on synchronous_standby_names in this case.
>
> # Test failover slots with additional shared cache
> This test also has  'synchronized_standby_slots' commented out, and
> logical clients will wait on the LSNs
> reported from synchronous_standby_names but it relies on a new cache
> to reduce contention on SyncRepLock.
>
> > The idea is that even if the
> > performance is less for synchronous_standby_names configuration, we
> > should be able to document it.  This will help users to decide what is
> > best for them.
>
> Makes sense.
>
> > I am also not sure especially as the test results didn't shown much
> > improvement and the code also becomes bit complicated. BTW, in the
> > 0003 version in the below code:
>
> That's fair, I've updated to be more in line with 0002.
>
> > + /* Cache values to reduce contention */
> > + LWLockAcquire(SyncRepLock, LW_SHARED);
> > + memcpy((XLogRecPtr *) walsndctl->cached_lsn, (XLogRecPtr *)
> > walsndctl->lsn, sizeof(lsn));
> > + LWLockRelease(SyncRepLock);
> >
> > Which mode lsn is being copied? I am not sure if I understood this
> > part of the code.
>
> All of the mode LSNs are being copied in case SyncRepWaitMode changes in
> the next iteration. I've removed that part but kept:
>
> > + memcpy(lsn, (XLogRecPtr *) walsndctl->lsn, sizeof(lsn));
>
> as suggested by Bertrand to avoid the for loop updating values one-by-one.
>
> Here's what's logged after the memcpy:
>
> 2024-08-28 19:41:13.798 UTC [1160413] LOG:  lsn[0] after memcpy is: 279/752C7FF0
> 2024-08-28 19:41:13.798 UTC [1160413] LOG:  lsn[1] after memcpy is: 279/752C7F20
> 2024-08-28 19:41:13.798 UTC [1160413] LOG:  lsn[2] after memcpy is: 279/752C7F20
>
> > In the 0002 version, in the following code [1], you are referring to
> > LSN mode which is enabled for logical walsender irrespective of the
> > mode used by the physical walsender. It is possible that they are
> > always the same but that is not evident from the code or comments in
> > the patch.
>
> They are almost always the same, I tried to indicate that with the
> following comment in the patch, but I could make it more explicit?
> > /* Initialize value in case SIGHUP changing to SYNC_REP_NO_WAIT */
>
> At the beginning we set
>
> > int mode = SyncRepWaitMode;
>
> At this time, the logical walsender mode it's checking against is the
> same as what the physical walsenders are using.
> It's possible that this mode is no longer the same when we execute the
> following check:
>
> > if (lsn[mode] >= wait_for_lsn)
>
> because of a SIGHUP to synchronous_commit that changes SyncRepWaitMode
> to some other value
>
> We cache the value instead of
> > if (lsn[SyncRepWaitMode] >= wait_for_lsn)
>
> because SYNC_REP_NO_WAIT is -1. If SyncRepWaitMode is set to this it
> leads to out of bounds access.
>
> I've attached a new patch that removes the shared cache introduced in 0003.
>

Thanks for the patch. Few comments and queries:

1)
+ static XLogRecPtr       lsn[NUM_SYNC_REP_WAIT_MODE];

We shall name it as 'lsns' as there are multiple.

2)

+ for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+ {
+ lsn[i] = InvalidXLogRecPtr;
+ }

Can we do it like below similar to what you have done at another place:
memset(lsn, InvalidXLogRecPtr, sizeof(lsn));

3)
+ if (!initialized)
+ {
+ for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+ {
+ lsn[i] = InvalidXLogRecPtr;
+ }
+ }

I do not see 'initialized' set to TRUE anywhere. Can you please
elaborate the intent here?

4)
+ int mode = SyncRepWaitMode;
+ int i;
+
+ if (!initialized)
+ {
+ for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
+ {
+ lsn[i] = InvalidXLogRecPtr;
+ }
+ }
+ if (mode == SYNC_REP_NO_WAIT)
+ return true;

I do not understand this code well. As Amit also pointed out,  'mode'
may change. When we initialize 'mode' lets say SyncRepWaitMode is
SYNC_REP_NO_WAIT but by the time we check 'if (mode ==
SYNC_REP_NO_WAIT)', SyncRepWaitMode has changed to say
SYNC_REP_WAIT_FLUSH, if so, then we will wrongly return true from
here. Is that a possibility? ProcessConfigFile() is in the caller, and
thus we may end up using the wrong mode.

thanks
Shveta



Hi Shveta,

Thanks for reviewing it so quickly.

On Thu, Aug 29, 2024 at 2:35 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> Thanks for the patch. Few comments and queries:
>
> 1)
> + static XLogRecPtr       lsn[NUM_SYNC_REP_WAIT_MODE];
>
> We shall name it as 'lsns' as there are multiple.
>

This follows the same naming convention in walsender_private.h

> 2)
>
> + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
> + {
> + lsn[i] = InvalidXLogRecPtr;
> + }
>
> Can we do it like below similar to what you have done at another place:
> memset(lsn, InvalidXLogRecPtr, sizeof(lsn));
>

I don't think memset works in this case? Well, I think *technically* works but
not sure if that's something worth optimizing.
If I understand correctly, memset takes in a char for the value and
not XLogRecPtr (uint64).

So something like: memset(lsn, 0, sizeof(lsn))

InvalidXLogRecPtr is defined as 0 so I think it works but there's an
implicit dependency here
for correctness.

> 3)
> + if (!initialized)
> + {
> + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
> + {
> + lsn[i] = InvalidXLogRecPtr;
> + }
> + }
>
> I do not see 'initialized' set to TRUE anywhere. Can you please
> elaborate the intent here?

You're right I thought I fixed this. WIll update.

>
> 4)
> + int mode = SyncRepWaitMode;
> + int i;
> +
> + if (!initialized)
> + {
> + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
> + {
> + lsn[i] = InvalidXLogRecPtr;
> + }
> + }
> + if (mode == SYNC_REP_NO_WAIT)
> + return true;
>
> I do not understand this code well. As Amit also pointed out,  'mode'
> may change. When we initialize 'mode' lets say SyncRepWaitMode is
> SYNC_REP_NO_WAIT but by the time we check 'if (mode ==
> SYNC_REP_NO_WAIT)', SyncRepWaitMode has changed to say
> SYNC_REP_WAIT_FLUSH, if so, then we will wrongly return true from
> here. Is that a possibility? ProcessConfigFile() is in the caller, and
> thus we may end up using the wrong mode.
>

Yes it's possible for mode to change. In my comment to Amit in the other thread,
I think we have to store mode and base our execution of this logic and ignore
SyncRepWaitMode changing due to ProcesConfigFile/SIGHUP for one iteration.

We can store the value of mode later, so something like:

> if (SyncRepWaitMode == SYNC_REP_NO_WAIT)
>     return true;
> mode = SyncRepWaitMode
> if (lsn[mode] >= wait_for_lsn)
>   return true;

But it's the same issue which is when you check lsn[mode],
SyncRepWaitMode could have changed to
something else, so you always have to initialize the value and will
always have this discrepancy.

I'm skeptical end users are changing SyncRepWaitMode in their database
clusters as
it has implications for their durability and I would assume they run
with the same durability guarantees.

Thanks,
--
John Hsu - Amazon Web Services



Re: Allow logical failover slots to wait on synchronous replication

From
shveta malik
Date:
On Fri, Aug 30, 2024 at 12:56 AM John H <johnhyvr@gmail.com> wrote:
>
> Hi Shveta,
>
> Thanks for reviewing it so quickly.
>
> On Thu, Aug 29, 2024 at 2:35 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > Thanks for the patch. Few comments and queries:
> >
> > 1)
> > + static XLogRecPtr       lsn[NUM_SYNC_REP_WAIT_MODE];
> >
> > We shall name it as 'lsns' as there are multiple.
> >
>
> This follows the same naming convention in walsender_private.h
>
> > 2)
> >
> > + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
> > + {
> > + lsn[i] = InvalidXLogRecPtr;
> > + }
> >
> > Can we do it like below similar to what you have done at another place:
> > memset(lsn, InvalidXLogRecPtr, sizeof(lsn));
> >
>
> I don't think memset works in this case? Well, I think *technically* works but
> not sure if that's something worth optimizing.
> If I understand correctly, memset takes in a char for the value and
> not XLogRecPtr (uint64).
>
> So something like: memset(lsn, 0, sizeof(lsn))
>
> InvalidXLogRecPtr is defined as 0 so I think it works but there's an
> implicit dependency here
> for correctness.
>
> > 3)
> > + if (!initialized)
> > + {
> > + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
> > + {
> > + lsn[i] = InvalidXLogRecPtr;
> > + }
> > + }
> >
> > I do not see 'initialized' set to TRUE anywhere. Can you please
> > elaborate the intent here?
>
> You're right I thought I fixed this. WIll update.
>
> >
> > 4)
> > + int mode = SyncRepWaitMode;
> > + int i;
> > +
> > + if (!initialized)
> > + {
> > + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
> > + {
> > + lsn[i] = InvalidXLogRecPtr;
> > + }
> > + }
> > + if (mode == SYNC_REP_NO_WAIT)
> > + return true;
> >
> > I do not understand this code well. As Amit also pointed out,  'mode'
> > may change. When we initialize 'mode' lets say SyncRepWaitMode is
> > SYNC_REP_NO_WAIT but by the time we check 'if (mode ==
> > SYNC_REP_NO_WAIT)', SyncRepWaitMode has changed to say
> > SYNC_REP_WAIT_FLUSH, if so, then we will wrongly return true from
> > here. Is that a possibility? ProcessConfigFile() is in the caller, and
> > thus we may end up using the wrong mode.
> >
>
> Yes it's possible for mode to change. In my comment to Amit in the other thread,
> I think we have to store mode and base our execution of this logic and ignore
> SyncRepWaitMode changing due to ProcesConfigFile/SIGHUP for one iteration.
>
> We can store the value of mode later, so something like:
>
> > if (SyncRepWaitMode == SYNC_REP_NO_WAIT)
> >     return true;
> > mode = SyncRepWaitMode
> > if (lsn[mode] >= wait_for_lsn)
> >   return true;
>
> But it's the same issue which is when you check lsn[mode],
> SyncRepWaitMode could have changed to
> something else, so you always have to initialize the value and will
> always have this discrepancy.
>
> I'm skeptical end users are changing SyncRepWaitMode in their database

> clusters as
> it has implications for their durability and I would assume they run
> with the same durability guarantees.
>

I was trying to have a look at the patch again, it doesn't apply on
the head, needs rebase.

Regarding 'mode = SyncRepWaitMode', FWIW, SyncRepWaitForLSN() also
does in a similar way. It gets mode in local var initially and uses it
later. See [1]. So isn't there a chance too that 'SyncRepWaitMode' can
change in between.

[1]:
mode = SyncRepWaitMode;
.....
....
        if (!WalSndCtl->sync_standbys_defined ||
                lsn <= WalSndCtl->lsn[mode])
        {
                LWLockRelease(SyncRepLock);
                return;
        }



thanks
Shveta



Re: Allow logical failover slots to wait on synchronous replication

From
shveta malik
Date:
On Wed, Sep 11, 2024 at 2:40 AM John H <johnhyvr@gmail.com> wrote:
>
> Hi Shveta,
>
> On Sun, Sep 8, 2024 at 11:16 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> >
> > I was trying to have a look at the patch again, it doesn't apply on
> > the head, needs rebase.
> >
>
> Rebased with the latest changes.
>
> > Regarding 'mode = SyncRepWaitMode', FWIW, SyncRepWaitForLSN() also
> > does in a similar way. It gets mode in local var initially and uses it
> > later. See [1]. So isn't there a chance too that 'SyncRepWaitMode' can
> > change in between.
> >
> > [1]:
> > mode = SyncRepWaitMode;
> > .....
> > ....
> >         if (!WalSndCtl->sync_standbys_defined ||
> >                 lsn <= WalSndCtl->lsn[mode])
> >         {
> >                 LWLockRelease(SyncRepLock);
> >                 return;
> >         }
>
> You are right, thanks for the correction. I tried reproducing with GDB
> where SyncRepWaitMode
> changes due to pg_ctl reload but was unable to do so. It seems like
> SIGHUP only sets ConfigReloadPending = true,
> which gets processed in the next loop in WalSndLoop() and that's
> probably where I was getting confused.

yes, SIGHUP will be processed in the caller of
StandbySlotsHaveCaughtup() (see ProcessConfigFile() in
WaitForStandbyConfirmation()). So we can use 'SyncRepWaitMode'
directly as it is not going to change in StandbySlotsHaveCaughtup()
even if user triggers the change. And thus it was okay to use it even
in the local variable too similar to how SyncRepWaitForLSN() does it.

> In the latest patch, I've added:
>
> Assert(SyncRepWaitMode >= 0);
>
> which should be true since we call SyncRepConfigured() at the
> beginning of StandbySlotsHaveCaughtup(),
> and used SyncRepWaitMode directly.

Yes, it should be okay I think. As SyncRepRequested() in the beginning
makes sure synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH and
thus SyncRepWaitMode should be mapped to either of
WAIT_WRITE/FLUSH/APPLY etc. Will review further.

thanks
Shveta



Re: Allow logical failover slots to wait on synchronous replication

From
shveta malik
Date:
On Thu, Sep 12, 2024 at 3:04 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Sep 11, 2024 at 2:40 AM John H <johnhyvr@gmail.com> wrote:
> >
> > Hi Shveta,
> >
> > On Sun, Sep 8, 2024 at 11:16 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > >
> > > I was trying to have a look at the patch again, it doesn't apply on
> > > the head, needs rebase.
> > >
> >
> > Rebased with the latest changes.
> >
> > > Regarding 'mode = SyncRepWaitMode', FWIW, SyncRepWaitForLSN() also
> > > does in a similar way. It gets mode in local var initially and uses it
> > > later. See [1]. So isn't there a chance too that 'SyncRepWaitMode' can
> > > change in between.
> > >
> > > [1]:
> > > mode = SyncRepWaitMode;
> > > .....
> > > ....
> > >         if (!WalSndCtl->sync_standbys_defined ||
> > >                 lsn <= WalSndCtl->lsn[mode])
> > >         {
> > >                 LWLockRelease(SyncRepLock);
> > >                 return;
> > >         }
> >
> > You are right, thanks for the correction. I tried reproducing with GDB
> > where SyncRepWaitMode
> > changes due to pg_ctl reload but was unable to do so. It seems like
> > SIGHUP only sets ConfigReloadPending = true,
> > which gets processed in the next loop in WalSndLoop() and that's
> > probably where I was getting confused.
>
> yes, SIGHUP will be processed in the caller of
> StandbySlotsHaveCaughtup() (see ProcessConfigFile() in
> WaitForStandbyConfirmation()). So we can use 'SyncRepWaitMode'
> directly as it is not going to change in StandbySlotsHaveCaughtup()
> even if user triggers the change. And thus it was okay to use it even
> in the local variable too similar to how SyncRepWaitForLSN() does it.
>
> > In the latest patch, I've added:
> >
> > Assert(SyncRepWaitMode >= 0);
> >
> > which should be true since we call SyncRepConfigured() at the
> > beginning of StandbySlotsHaveCaughtup(),
> > and used SyncRepWaitMode directly.
>
> Yes, it should be okay I think. As SyncRepRequested() in the beginning
> makes sure synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH and
> thus SyncRepWaitMode should be mapped to either of
> WAIT_WRITE/FLUSH/APPLY etc. Will review further.
>

I was wondering if we need somethign similar to SyncRepWaitForLSN() here:

        /* Cap the level for anything other than commit to remote flush only. */
        if (commit)
                mode = SyncRepWaitMode;
        else
                mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);

The header comment says:
 * 'lsn' represents the LSN to wait for.  'commit' indicates whether this LSN
 * represents a commit record.  If it doesn't, then we wait only for the WAL
 * to be flushed if synchronous_commit is set to the higher level of
 * remote_apply, because only commit records provide apply feedback.

If we don't do something similar, then aren't there chances that we
keep on waiting on the wrong lsn[mode] for the case when mode =
SYNC_REP_WAIT_APPLY while sync-rep-wait infrastructure is updating
different mode's lsn. Is my understanding correct?

thanks
Shveta



Re: Allow logical failover slots to wait on synchronous replication

From
Amit Kapila
Date:
On Fri, Sep 13, 2024 at 3:13 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Sep 12, 2024 at 3:04 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Wed, Sep 11, 2024 at 2:40 AM John H <johnhyvr@gmail.com> wrote:
> > >
> > > Hi Shveta,
> > >
> > > On Sun, Sep 8, 2024 at 11:16 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > >
> > > > I was trying to have a look at the patch again, it doesn't apply on
> > > > the head, needs rebase.
> > > >
> > >
> > > Rebased with the latest changes.
> > >
> > > > Regarding 'mode = SyncRepWaitMode', FWIW, SyncRepWaitForLSN() also
> > > > does in a similar way. It gets mode in local var initially and uses it
> > > > later. See [1]. So isn't there a chance too that 'SyncRepWaitMode' can
> > > > change in between.
> > > >
> > > > [1]:
> > > > mode = SyncRepWaitMode;
> > > > .....
> > > > ....
> > > >         if (!WalSndCtl->sync_standbys_defined ||
> > > >                 lsn <= WalSndCtl->lsn[mode])
> > > >         {
> > > >                 LWLockRelease(SyncRepLock);
> > > >                 return;
> > > >         }
> > >
> > > You are right, thanks for the correction. I tried reproducing with GDB
> > > where SyncRepWaitMode
> > > changes due to pg_ctl reload but was unable to do so. It seems like
> > > SIGHUP only sets ConfigReloadPending = true,
> > > which gets processed in the next loop in WalSndLoop() and that's
> > > probably where I was getting confused.
> >
> > yes, SIGHUP will be processed in the caller of
> > StandbySlotsHaveCaughtup() (see ProcessConfigFile() in
> > WaitForStandbyConfirmation()). So we can use 'SyncRepWaitMode'
> > directly as it is not going to change in StandbySlotsHaveCaughtup()
> > even if user triggers the change. And thus it was okay to use it even
> > in the local variable too similar to how SyncRepWaitForLSN() does it.
> >
> > > In the latest patch, I've added:
> > >
> > > Assert(SyncRepWaitMode >= 0);
> > >
> > > which should be true since we call SyncRepConfigured() at the
> > > beginning of StandbySlotsHaveCaughtup(),
> > > and used SyncRepWaitMode directly.
> >
> > Yes, it should be okay I think. As SyncRepRequested() in the beginning
> > makes sure synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH and
> > thus SyncRepWaitMode should be mapped to either of
> > WAIT_WRITE/FLUSH/APPLY etc. Will review further.
> >
>
> I was wondering if we need somethign similar to SyncRepWaitForLSN() here:
>
>         /* Cap the level for anything other than commit to remote flush only. */
>         if (commit)
>                 mode = SyncRepWaitMode;
>         else
>                 mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);
>
> The header comment says:
>  * 'lsn' represents the LSN to wait for.  'commit' indicates whether this LSN
>  * represents a commit record.  If it doesn't, then we wait only for the WAL
>  * to be flushed if synchronous_commit is set to the higher level of
>  * remote_apply, because only commit records provide apply feedback.
>
> If we don't do something similar, then aren't there chances that we
> keep on waiting on the wrong lsn[mode] for the case when mode =
> SYNC_REP_WAIT_APPLY while sync-rep-wait infrastructure is updating
> different mode's lsn. Is my understanding correct?
>

I think here we always need the lsn values corresponding to
SYNC_REP_WAIT_FLUSH as we want to ensure that the WAL has to be
flushed on physical standby before sending it to the logical
subscriber. See ProcessStandbyReplyMessage() where we always use
flushPtr to advance the physical_slot via
PhysicalConfirmReceivedLocation().

Another question aside from the above point, what if someone has
specified logical subscribers in synchronous_standby_names? In the
case of synchronized_standby_slots, we won't proceed with such slots.

--
With Regards,
Amit Kapila.



Re: Allow logical failover slots to wait on synchronous replication

From
shveta malik
Date:
On Mon, Sep 16, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Sep 13, 2024 at 3:13 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Sep 12, 2024 at 3:04 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Wed, Sep 11, 2024 at 2:40 AM John H <johnhyvr@gmail.com> wrote:
> > > >
> > > > Hi Shveta,
> > > >
> > > > On Sun, Sep 8, 2024 at 11:16 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > >
> > > > > I was trying to have a look at the patch again, it doesn't apply on
> > > > > the head, needs rebase.
> > > > >
> > > >
> > > > Rebased with the latest changes.
> > > >
> > > > > Regarding 'mode = SyncRepWaitMode', FWIW, SyncRepWaitForLSN() also
> > > > > does in a similar way. It gets mode in local var initially and uses it
> > > > > later. See [1]. So isn't there a chance too that 'SyncRepWaitMode' can
> > > > > change in between.
> > > > >
> > > > > [1]:
> > > > > mode = SyncRepWaitMode;
> > > > > .....
> > > > > ....
> > > > >         if (!WalSndCtl->sync_standbys_defined ||
> > > > >                 lsn <= WalSndCtl->lsn[mode])
> > > > >         {
> > > > >                 LWLockRelease(SyncRepLock);
> > > > >                 return;
> > > > >         }
> > > >
> > > > You are right, thanks for the correction. I tried reproducing with GDB
> > > > where SyncRepWaitMode
> > > > changes due to pg_ctl reload but was unable to do so. It seems like
> > > > SIGHUP only sets ConfigReloadPending = true,
> > > > which gets processed in the next loop in WalSndLoop() and that's
> > > > probably where I was getting confused.
> > >
> > > yes, SIGHUP will be processed in the caller of
> > > StandbySlotsHaveCaughtup() (see ProcessConfigFile() in
> > > WaitForStandbyConfirmation()). So we can use 'SyncRepWaitMode'
> > > directly as it is not going to change in StandbySlotsHaveCaughtup()
> > > even if user triggers the change. And thus it was okay to use it even
> > > in the local variable too similar to how SyncRepWaitForLSN() does it.
> > >
> > > > In the latest patch, I've added:
> > > >
> > > > Assert(SyncRepWaitMode >= 0);
> > > >
> > > > which should be true since we call SyncRepConfigured() at the
> > > > beginning of StandbySlotsHaveCaughtup(),
> > > > and used SyncRepWaitMode directly.
> > >
> > > Yes, it should be okay I think. As SyncRepRequested() in the beginning
> > > makes sure synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH and
> > > thus SyncRepWaitMode should be mapped to either of
> > > WAIT_WRITE/FLUSH/APPLY etc. Will review further.
> > >
> >
> > I was wondering if we need somethign similar to SyncRepWaitForLSN() here:
> >
> >         /* Cap the level for anything other than commit to remote flush only. */
> >         if (commit)
> >                 mode = SyncRepWaitMode;
> >         else
> >                 mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);
> >
> > The header comment says:
> >  * 'lsn' represents the LSN to wait for.  'commit' indicates whether this LSN
> >  * represents a commit record.  If it doesn't, then we wait only for the WAL
> >  * to be flushed if synchronous_commit is set to the higher level of
> >  * remote_apply, because only commit records provide apply feedback.
> >
> > If we don't do something similar, then aren't there chances that we
> > keep on waiting on the wrong lsn[mode] for the case when mode =
> > SYNC_REP_WAIT_APPLY while sync-rep-wait infrastructure is updating
> > different mode's lsn. Is my understanding correct?
> >
>
> I think here we always need the lsn values corresponding to
> SYNC_REP_WAIT_FLUSH as we want to ensure that the WAL has to be
> flushed on physical standby before sending it to the logical
> subscriber. See ProcessStandbyReplyMessage() where we always use
> flushPtr to advance the physical_slot via
> PhysicalConfirmReceivedLocation().

I agree. So even if the mode is SYNC_REP_WAIT_WRITE (lower one) or
SYNC_REP_WAIT_APPLY (higher one), we need to wait for
lsn[SYNC_REP_WAIT_FLUSH].

> Another question aside from the above point, what if someone has
> specified logical subscribers in synchronous_standby_names? In the
> case of synchronized_standby_slots, we won't proceed with such slots.
>

Yes, it is a possibility. I have missed this point earlier. Now I
tried a case where I give a mix of logical subscriber and physical
standby in 'synchronous_standby_names' on pgHead, it even takes that
'mix' configuration and starts waiting accordingly.

synchronous_standby_names = 'FIRST 2(logicalsub_1, phy_standby_1,
phy_standby_2)';

thanks
Shveta



Re: Allow logical failover slots to wait on synchronous replication

From
Amit Kapila
Date:
On Mon, Sep 16, 2024 at 2:55 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Sep 16, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > Another question aside from the above point, what if someone has
> > specified logical subscribers in synchronous_standby_names? In the
> > case of synchronized_standby_slots, we won't proceed with such slots.
> >
>
> Yes, it is a possibility. I have missed this point earlier. Now I
> tried a case where I give a mix of logical subscriber and physical
> standby in 'synchronous_standby_names' on pgHead, it even takes that
> 'mix' configuration and starts waiting accordingly.
>
> synchronous_standby_names = 'FIRST 2(logicalsub_1, phy_standby_1,
> phy_standby_2)';
>

This should not happen as we don't support syncing failover slots on
logical subscribers. The other point to consider here is that the user
may not have set 'sync_replication_slots' on all the physical standbys
mentioned in 'synchronous_standby_names' and in that case, it doesn't
make sense to wait for WAL to get flushed on those standbys. What do
you think?

--
With Regards,
Amit Kapila.



Re: Allow logical failover slots to wait on synchronous replication

From
shveta malik
Date:
On Mon, Sep 16, 2024 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Sep 16, 2024 at 2:55 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Mon, Sep 16, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > > Another question aside from the above point, what if someone has
> > > specified logical subscribers in synchronous_standby_names? In the
> > > case of synchronized_standby_slots, we won't proceed with such slots.
> > >
> >
> > Yes, it is a possibility. I have missed this point earlier. Now I
> > tried a case where I give a mix of logical subscriber and physical
> > standby in 'synchronous_standby_names' on pgHead, it even takes that
> > 'mix' configuration and starts waiting accordingly.
> >
> > synchronous_standby_names = 'FIRST 2(logicalsub_1, phy_standby_1,
> > phy_standby_2)';
> >
>
> This should not happen as we don't support syncing failover slots on
> logical subscribers.

+1

> The other point to consider here is that the user
> may not have set 'sync_replication_slots' on all the physical standbys
> mentioned in 'synchronous_standby_names' and in that case, it doesn't
> make sense to wait for WAL to get flushed on those standbys. What do
> you think?
>

Yes, it is a possibility. But then it is a possibility in case of
'synchronized_standby_slots' as well. User may always configure one of
the standbys in  'synchronized_standby_slots' while may not configure
slot-sync GUCs on that standby (hot_standby_feedback,
sync_replication_slots etc). In such a case, logical replication is
dependent upon the concerned physical standby even though latter is
not syncing failover slots.
But there is no reliable way to detect this at the publisher side to
stop the 'wait' for the concerned physical standby. We tried in the
past but it was not that simple as the sync related GUCs may change
anytime on the physical standby and thus need consistent feedback
mechanism to detect this. IMO, we can explain the recommendations and
risks for 'synchronous_standby_names' in docs similar to what we do
for  'sync_replication_slots'. Or do you have something else in mind?

thanks
Shveta



Re: Allow logical failover slots to wait on synchronous replication

From
Amit Kapila
Date:
On Tue, Sep 17, 2024 at 9:08 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Sep 16, 2024 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Sep 16, 2024 at 2:55 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Mon, Sep 16, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > >
> > > > Another question aside from the above point, what if someone has
> > > > specified logical subscribers in synchronous_standby_names? In the
> > > > case of synchronized_standby_slots, we won't proceed with such slots.
> > > >
> > >
> > > Yes, it is a possibility. I have missed this point earlier. Now I
> > > tried a case where I give a mix of logical subscriber and physical
> > > standby in 'synchronous_standby_names' on pgHead, it even takes that
> > > 'mix' configuration and starts waiting accordingly.
> > >
> > > synchronous_standby_names = 'FIRST 2(logicalsub_1, phy_standby_1,
> > > phy_standby_2)';
> > >
> >
> > This should not happen as we don't support syncing failover slots on
> > logical subscribers.
>
> +1
>
> > The other point to consider here is that the user
> > may not have set 'sync_replication_slots' on all the physical standbys
> > mentioned in 'synchronous_standby_names' and in that case, it doesn't
> > make sense to wait for WAL to get flushed on those standbys. What do
> > you think?
> >
>
> Yes, it is a possibility. But then it is a possibility in case of
> 'synchronized_standby_slots' as well. User may always configure one of
> the standbys in  'synchronized_standby_slots' while may not configure
> slot-sync GUCs on that standby (hot_standby_feedback,
> sync_replication_slots etc). In such a case, logical replication is
> dependent upon the concerned physical standby even though latter is
> not syncing failover slots.
>

The difference is that the purpose of 'synchronized_standby_slots' is
to mention slot names for which the user expects logical walsenders to
wait before sending the logical changes to subscribers. OTOH,
'synchronous_standby_names' has a different purpose as well. It is not
clear to me if the users would be interested in syncing failover slots
to all the standbys mentioned in 'synchronous_standby_names'.

--
With Regards,
Amit Kapila.



Re: Allow logical failover slots to wait on synchronous replication

From
shveta malik
Date:
On Thu, Sep 19, 2024 at 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Sep 17, 2024 at 9:08 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Mon, Sep 16, 2024 at 4:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Sep 16, 2024 at 2:55 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > On Mon, Sep 16, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > >
> > > > > Another question aside from the above point, what if someone has
> > > > > specified logical subscribers in synchronous_standby_names? In the
> > > > > case of synchronized_standby_slots, we won't proceed with such slots.
> > > > >
> > > >
> > > > Yes, it is a possibility. I have missed this point earlier. Now I
> > > > tried a case where I give a mix of logical subscriber and physical
> > > > standby in 'synchronous_standby_names' on pgHead, it even takes that
> > > > 'mix' configuration and starts waiting accordingly.
> > > >
> > > > synchronous_standby_names = 'FIRST 2(logicalsub_1, phy_standby_1,
> > > > phy_standby_2)';
> > > >
> > >
> > > This should not happen as we don't support syncing failover slots on
> > > logical subscribers.
> >
> > +1
> >
> > > The other point to consider here is that the user
> > > may not have set 'sync_replication_slots' on all the physical standbys
> > > mentioned in 'synchronous_standby_names' and in that case, it doesn't
> > > make sense to wait for WAL to get flushed on those standbys. What do
> > > you think?
> > >
> >
> > Yes, it is a possibility. But then it is a possibility in case of
> > 'synchronized_standby_slots' as well. User may always configure one of
> > the standbys in  'synchronized_standby_slots' while may not configure
> > slot-sync GUCs on that standby (hot_standby_feedback,
> > sync_replication_slots etc). In such a case, logical replication is
> > dependent upon the concerned physical standby even though latter is
> > not syncing failover slots.
> >
>
> The difference is that the purpose of 'synchronized_standby_slots' is
> to mention slot names for which the user expects logical walsenders to
> wait before sending the logical changes to subscribers. OTOH,
> 'synchronous_standby_names' has a different purpose as well. It is not
> clear to me if the users would be interested in syncing failover slots
> to all the standbys mentioned in 'synchronous_standby_names'.
>

Okay, I see your point. But not sure what could be the solution here
except documenting. But let me think more.

thanks
Shveta



Hi,

On Mon, Sep 16, 2024 at 2:25 AM shveta malik <shveta.malik@gmail.com> wrote:

> > >
> > > If we don't do something similar, then aren't there chances that we
> > > keep on waiting on the wrong lsn[mode] for the case when mode =
> > > SYNC_REP_WAIT_APPLY while sync-rep-wait infrastructure is updating
> > > different mode's lsn. Is my understanding correct?
> > >

Let me take a deeper look at this, I think you're right though.

>
> I agree. So even if the mode is SYNC_REP_WAIT_WRITE (lower one) or
> SYNC_REP_WAIT_APPLY (higher one), we need to wait for
> lsn[SYNC_REP_WAIT_FLUSH].
>

I'm not sure if I agree with that. I think the sychronous_commit mode should be
a good enough proxy for what the user wants from a durability
perspective for their
application.

For an application writing to the database, if they've set mode as
SYNC_REP_WAIT_WRITE
as fine being when a commit is treated as durable, why do we need to
be concerned
with overriding that to SYNC_REP_WAIT_FLUSH?

Similarly, if a user has mode set to SYNC_REP_WAIT_APPLY, to me it's even more
confusing that there can be scenarios where the application wouldn't
see the data as committed
nor would subsequent reads but a logical consumer would be able to.
The database should be
treated as the source of truth and I don't think logical consumers
should be ever ahead of
what the database is treating as committed.


Thanks,

--
John Hsu - Amazon Web Services



Hi,

On Fri, Sep 20, 2024 at 2:44 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> >
> > The difference is that the purpose of 'synchronized_standby_slots' is
> > to mention slot names for which the user expects logical walsenders to
> > wait before sending the logical changes to subscribers. OTOH,
> > 'synchronous_standby_names' has a different purpose as well. It is not
> > clear to me if the users would be interested in syncing failover slots
> > to all the standbys mentioned in 'synchronous_standby_names'.
> >
>
> Okay, I see your point. But not sure what could be the solution here
> except documenting. But let me think more.
>

That's a great find. I didn't consider mixed physical and logical
replicas in synchronous_standby_names.
I wonder if there are users running synchronous_standby_names with a
mix of logical and
physical replicas and what the use case would be.

Not sure if there's anything straight forward we could do in general
for slot syncing if synchronous_standby_names
refers to application_names of logical replicas, the feature can't be supported.

--
John Hsu - Amazon Web Services



Re: Allow logical failover slots to wait on synchronous replication

From
Amit Kapila
Date:
On Sat, Sep 21, 2024 at 6:34 AM John H <johnhyvr@gmail.com> wrote:
>
> On Fri, Sep 20, 2024 at 2:44 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > >
> > > The difference is that the purpose of 'synchronized_standby_slots' is
> > > to mention slot names for which the user expects logical walsenders to
> > > wait before sending the logical changes to subscribers. OTOH,
> > > 'synchronous_standby_names' has a different purpose as well. It is not
> > > clear to me if the users would be interested in syncing failover slots
> > > to all the standbys mentioned in 'synchronous_standby_names'.
> > >
> >
> > Okay, I see your point. But not sure what could be the solution here
> > except documenting. But let me think more.
> >
>
> That's a great find. I didn't consider mixed physical and logical
> replicas in synchronous_standby_names.
> I wonder if there are users running synchronous_standby_names with a
> mix of logical and
> physical replicas and what the use case would be.
>

I am also not aware of the actual use cases of mixing physical and
logical synchronous standbys but as we provide that functionality, we
can't ignore it. BTW, I am also not sure if users would like the slots
to be synced on all the standbys mentioned in
synchronous_standby_names. and even, if they are, it is better to have
an explicit way of letting users specify it.

One possible approach is to extend the syntax of
"synchronized_standby_slots" similar to "synchronous_standby_names" so
that users can directly specify slots similarly and avoid waiting for
more than required standbys.

--
With Regards,
Amit Kapila.