Re: Backward movement of confirmed_flush resulting in data duplication. - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Backward movement of confirmed_flush resulting in data duplication. |
Date | |
Msg-id | CAFiTN-ucYui=Qf7tnDOg7ai9260cMYee0f0Fxj4ibh8pjq7uYg@mail.gmail.com Whole thread Raw |
In response to | Re: Backward movement of confirmed_flush resulting in data duplication. (Dilip Kumar <dilipbalaut@gmail.com>) |
List | pgsql-hackers |
On Wed, May 14, 2025 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, May 14, 2025 at 11:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, May 14, 2025 at 9:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, May 13, 2025 at 4:22 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Tue, May 13, 2025 at 3:48 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > > > Hi All, > > > > > > > > > > It is a spin-off thread from earlier discussions at [1] and [2]. > > > > > > > > > > While analyzing the slot-sync BF failure as stated in [1], it was > > > > > observed that there are chances that confirmed_flush_lsn may move > > > > > backward depending on the feedback messages received from the > > > > > downstream system. It was suspected that the backward movement of > > > > > confirmed_flush_lsn may result in data duplication issues. Earlier we > > > > > were able to successfully reproduce the issue with two_phase enabled > > > > > subscriptions (see[2]). Now on further analysing, it seems possible > > > > > that data duplication issues may happen without two-phase as well. > > > > > > > > Thanks for the detailed explanation. Before we focus on patching the > > > > symptoms, I’d like to explore whether the issue can be addressed on > > > > the subscriber side. Specifically, have we analyzed if there’s a way > > > > to prevent the subscriber from moving the LSN backward in the first > > > > place? That might lead to a cleaner and more robust solution overall. > > > > > > > > > > The subscriber doesn't move the LSN backwards, it only shares the > > > information with the publisher, which is the latest value of remote > > > LSN tracked by the origin. Now, as explained in email [1], the > > > subscriber doesn't persistently store/advance the LSN, for which it > > > doesn't have to do anything like DDLs, or any other non-published > > > DMLs. However, subscribers need to send confirmation of such LSNs for > > > synchronous replication. This is commented in the code as well, see > > > comments in CreateDecodingContext (It might seem like we should error > > > out in this case, but it's pretty common for a client to acknowledge a > > > LSN it doesn't have to do anything for ...). As mentioned in email[1], > > > persisting the LSN information that the subscriber doesn't have to do > > > anything with could be a noticeable performance overhead. > > > > Thanks for your response. > > > > What I meant wasn’t that the subscriber is moving the confirmed LSN > > backward, nor was I suggesting we fix it by persisting the LSN on the > > subscriber side. My point was: the fact that the subscriber is sending > > an LSN older than one it has already sent, does that indicate a bug on > > the subscriber side? And if so, should the logic be fixed there? > > > > I understand this might not be feasible, and it may not even be a bug > > on the subscriber side, it could be an intentional part of the design. > > > > Right, it is how currently the subscriber/publisher communication is designed. > > > But my question was whether we’ve already considered and ruled out > > that possibility. > > > > That is what I explained in my previous response. Basically, to > achieve what you are saying, we need to persist the remote LSN values > by advancing the origin for cases, even when the subscriber doesn't > need to apply such changes like DDLs. Understood, yeah, it makes sense to fix the way Shveta has fixed. Sorry for the noise. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: