Thread: Re: Doc: fix the note related to the GUC "synchronized_standby_slots"

On Mon, Aug 26, 2024 at 1:30 PM <Masahiro.Ikeda@nttdata.com> wrote:
>
> When I read the following documentation related to the "synchronized_standby_slots", I misunderstood that data loss
wouldnot occur in the case of synchronous physical replication. However, this is incorrect (see reproduce.txt). 
>

I think you see such a behavior because you have disabled
'synchronized_standby_slots' in your script (# disable
"synchronized_standby_slots"). You need to enable that to avoid data
loss. Considering that, I don't think your proposed text is an
improvement.

--
With Regards,
Amit Kapila.



Thans for your responses.

> I think you see such a behavior because you have disabled 'synchronized_standby_slots'
> in your script (# disable "synchronized_standby_slots"). You need to enable that to
> avoid data loss. Considering that, I don't think your proposed text is an improvement.
Yes, I know.

As David said, "without losing data" makes me confused because there are three patterns that users
think the data was lost though there may be other cases.

Pattern1. the data which clients get a committed response for from the old primary, but the new primary doesn’t have in
thecase of asynchronous replication 
 -> we can avoid this with synchronous replication. This is not relevant to the failover feature.

Pattern2. the data which the new primary has, but the subscribers don't have
 -> we can avoid this with the failover feature.

Pattern3. the data which the subscribers have, but the new primary doesn't have
 -> we can avoid this with the 'synchronized_standby_slots' parameter.

Currently, I understand that the following documentation says
* the failover feature makes publications without losing pattern 2 data.
* pattern 1 data may be lost if you use asynchronous replication.
* the following doesn't mention pattern 3 at all, which I misunderstood point.

> They can continue subscribing to publications on the new primary server without losing data.
> Note that in the case of asynchronous replication, there remains a risk of data loss for transactions
> committed on the former primary server but have yet to be replicated to the new primary server

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




On Tue, Aug 27, 2024 at 10:18 AM <Masahiro.Ikeda@nttdata.com> wrote:
>
> > I think you see such a behavior because you have disabled 'synchronized_standby_slots'
> > in your script (# disable "synchronized_standby_slots"). You need to enable that to
> > avoid data loss. Considering that, I don't think your proposed text is an improvement.
> Yes, I know.
>
> As David said, "without losing data" makes me confused because there are three patterns that users
> think the data was lost though there may be other cases.
>

So, will it be okay if we just remove ".. without losing data" from
the sentence? Will that avoid the confusion you have?

With Regards,
Amit Kapila.



> So, will it be okay if we just remove ".. without losing data" from the sentence? Will that
> avoid the confusion you have?
Yes. Additionally, it would be better to add notes about data consistency after failover for example

Note that data consistency after failover can vary depending on the configurations. If
"synchronized_standby_slots" is not configured, there may be data that only the subscribers hold,
even though the new primary does not. Additionally, in the case of asynchronous physical replication,
there remains a risk of data loss for transactions committed on the former primary server
but have yet to be replicated to the new primary server.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION



On Tue, Aug 27, 2024 at 3:05 PM <Masahiro.Ikeda@nttdata.com> wrote:
>
> > So, will it be okay if we just remove ".. without losing data" from the sentence? Will that
> > avoid the confusion you have?
> Yes. Additionally, it would be better to add notes about data consistency after failover for example
>
> Note that data consistency after failover can vary depending on the configurations. If
> "synchronized_standby_slots" is not configured, there may be data that only the subscribers hold,
> even though the new primary does not.
>

This part can be inferred from the description of
synchronized_standby_slots [1] (See: This guarantees that logical
replication failover slots do not consume changes until those changes
are received and flushed to corresponding physical standbys. If a
logical replication connection is meant to switch to a physical
standby after the standby is promoted, the physical replication slot
for the standby should be listed here.)

>
 Additionally, in the case of asynchronous physical replication,
> there remains a risk of data loss for transactions committed on the former primary server
> but have yet to be replicated to the new primary server.
>

This has nothing to do with failover slots. This is a known behavior
of asynchronous replication, so adding here doesn't make much sense.

In general, adding more information unrelated to failover slots can
confuse users.

[1] - https://www.postgresql.org/docs/17/runtime-config-replication.html#GUC-SYNCHRONIZED-STANDBY-SLOTS

--
With Regards,
Amit Kapila.



> > > So, will it be okay if we just remove ".. without losing data" from
> > > the sentence? Will that avoid the confusion you have?
> > Yes. Additionally, it would be better to add notes about data
> > consistency after failover for example
> >
> > Note that data consistency after failover can vary depending on the
> > configurations. If "synchronized_standby_slots" is not configured,
> > there may be data that only the subscribers hold, even though the new primary does
> not.
> >
>
> This part can be inferred from the description of synchronized_standby_slots [1] (See:
> This guarantees that logical replication failover slots do not consume changes until those
> changes are received and flushed to corresponding physical standbys. If a logical
> replication connection is meant to switch to a physical standby after the standby is
> promoted, the physical replication slot for the standby should be listed here.)

OK, it's enough for me just remove ".. without losing data".

> >
>  Additionally, in the case of asynchronous physical replication,
> > there remains a risk of data loss for transactions committed on the
> > former primary server but have yet to be replicated to the new primary server.
> >
>
> This has nothing to do with failover slots. This is a known behavior of asynchronous
> replication, so adding here doesn't make much sense.
>
> In general, adding more information unrelated to failover slots can confuse users.

OK, I agreed to remove the sentence.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION



On Wed, Aug 28, 2024 at 3:02 PM <Masahiro.Ikeda@nttdata.com> wrote:
>
> >
> > The next line related to asynchronous replication is also not required. See attached.
>
> Thanks, I found another ".. without losing data".
>

I'll push this tomorrow unless there are any other suggestions on this patch.

--
With Regards,
Amit Kapila.