Re: Doc: fix the note related to the GUC "synchronized_standby_slots" - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Doc: fix the note related to the GUC "synchronized_standby_slots"
Date
Msg-id CAKFQuwa055ne9bqkLpWcC9rU+e+ss7hWjMPf_O0xSCAzph8XMQ@mail.gmail.com
Whole thread Raw
In response to Re: Doc: fix the note related to the GUC "synchronized_standby_slots"  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Monday, August 26, 2024, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Aug 26, 2024 at 6:38 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Monday, August 26, 2024 5:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Aug 26, 2024 at 1:30 PM <Masahiro.Ikeda@nttdata.com> wrote:
> > >
> > > When I read the following documentation related to the
> > "synchronized_standby_slots", I misunderstood that data loss would not occur
> > in the case of synchronous physical replication. However, this is incorrect (see
> > reproduce.txt).
> > >
> > > > Note that in the case of asynchronous replication, there remains a risk of
> > data loss for transactions committed on the former primary server but have yet
> > to be replicated to the new primary server.
> > > https://www.postgresql.org/docs/17/logical-replication-failover.html
> > >
> > > Am I missing something?
> > >
> >
> > It seems part of the paragraph: "Note that in the case of asynchronous
> > replication, there remains a risk of data loss for transactions committed on the
> > former primary server but have yet to be replicated to the new primary server." is
> > a bit confusing. Will it make things clear to me if we remove that part?
>
> I think the intention is to address a complaint[1] that the date inserted on
> primary after the primary disconnects with the standby is still lost after
> failover. But after rethinking, maybe it's doesn't directly belong to the topic in
> the logical failover section because it's a general fact for async replication.
> If we think it matters, maybe we can remove this part and slightly modify
> another part:
>
>    parameter ensures a seamless transition of those subscriptions after the
>    standby is promoted. They can continue subscribing to publications on the
> -   new primary server without losing data.
> +   new primary server without losing that has already been replicated and
> +    flushed on the standby server.
>

Yeah, we can change that way but not sure if that satisfies the OP's
concern. I am waiting for his response.

I’d suggest getting rid of all mention of “without losing data” and just emphasize the fact that the subscribers can operate in a hot-standby publishing environment in an automated fashion by connecting using “failover” enabled slots, assuming the publishing group prevents any changes from propagating to any logical subscriber until all standbys in the group have been updated.  Whether or not the primary-standby group is resilient in the face of failure during internal group synchronization is out of the hands of logical subscribers - rather they are only guaranteed to see a consistent linear history of activity coming out of the publishing group.  Specifically, if the group synchronizes asynchronously there is no guarantee that every committed transaction on the primary makes its way through to the logical subscriber if a slot failover happens.  But at the same time its view of the world will be consistent with the newly chosen primary.

David J.

pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Redundant Result node
Next
From:
Date:
Subject: RE: Doc: fix the note related to the GUC "synchronized_standby_slots"