Re: Backward movement of confirmed_flush resulting in data duplication. - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Backward movement of confirmed_flush resulting in data duplication.
Date
Msg-id CAFiTN-ucYui=Qf7tnDOg7ai9260cMYee0f0Fxj4ibh8pjq7uYg@mail.gmail.com
Whole thread Raw
In response to Re: Backward movement of confirmed_flush resulting in data duplication.  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Wed, May 14, 2025 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, May 14, 2025 at 11:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, May 14, 2025 at 9:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, May 13, 2025 at 4:22 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Tue, May 13, 2025 at 3:48 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > It is a spin-off thread from earlier discussions at [1] and [2].
> > > > >
> > > > > While analyzing the slot-sync BF failure as stated in [1], it was
> > > > > observed that there are chances that confirmed_flush_lsn may move
> > > > > backward depending on the feedback messages received from the
> > > > > downstream system. It was suspected that the backward movement of
> > > > > confirmed_flush_lsn may result in data duplication issues. Earlier we
> > > > > were able to successfully reproduce the issue with two_phase enabled
> > > > > subscriptions (see[2]). Now on further analysing, it seems possible
> > > > > that data duplication issues may happen without two-phase as well.
> > > >
> > > > Thanks for the detailed explanation. Before we focus on patching the
> > > > symptoms, I’d like to explore whether the issue can be addressed on
> > > > the subscriber side. Specifically, have we analyzed if there’s a way
> > > > to prevent the subscriber from moving the LSN backward in the first
> > > > place? That might lead to a cleaner and more robust solution overall.
> > > >
> > >
> > > The subscriber doesn't move the LSN backwards, it only shares the
> > > information with the publisher, which is the latest value of remote
> > > LSN tracked by the origin. Now, as explained in email [1], the
> > > subscriber doesn't persistently store/advance the LSN, for which it
> > > doesn't have to do anything like DDLs, or any other non-published
> > > DMLs. However, subscribers need to send confirmation of such LSNs for
> > > synchronous replication. This is commented in the code as well, see
> > > comments in CreateDecodingContext (It might seem like we should error
> > > out in this case, but it's pretty common for a client to acknowledge a
> > > LSN it doesn't have to do anything for ...). As mentioned in email[1],
> > > persisting the LSN information that the subscriber doesn't have to do
> > > anything with could be a noticeable performance overhead.
> >
> > Thanks for your response.
> >
> > What I meant wasn’t that the subscriber is moving the confirmed LSN
> > backward, nor was I suggesting we fix it by persisting the LSN on the
> > subscriber side. My point was: the fact that the subscriber is sending
> > an LSN older than one it has already sent, does that indicate a bug on
> > the subscriber side?  And if so, should the logic be fixed there?
> >
> > I understand this might not be feasible, and it may not even be a bug
> > on the subscriber side, it could be an intentional part of the design.
> >
>
> Right, it is how currently the subscriber/publisher communication is designed.
>
> > But my question was whether we’ve already considered and ruled out
> > that possibility.
> >
>
> That is what I explained in my previous response. Basically, to
> achieve what you are saying, we need to persist the remote LSN values
> by advancing the origin for cases, even when the subscriber doesn't
> need to apply such changes like DDLs.

Understood, yeah, it makes sense to fix the way Shveta has fixed.
Sorry for the noise.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: Disable parallel query by default
Next
From: Amit Kapila
Date:
Subject: Re: Small fixes needed by high-availability tools