On Tue, Apr 29, 2025 at 1:17 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Mon, 28 Apr 2025 at 10:28, vignesh C <vignesh21@gmail.com> wrote:
> >
> > With this approach, there is a risk of starting from the next WAL
> > record after the consistent point. For example, if the slot returns a
> > consistent point at 0/1715E10, after the fix we would begin replaying
> > from the next WAL record, such as 0/1715E40, which could potentially
> > lead to data loss.
> > As an alternative, we could set recovery_target_inclusive to false in
> > the setup_recovery function. This way, recovery would stop just before
> > the recovery target, allowing the publisher to start replicating
> > exactly from the consistent point.
> > Thoughts?
>
> This approach looks better to me.
> I have prepared the patch for the same.
>
We should find out in which case and why the consisten_lsn is a start
point LSN of a commit record. We use slot's confirm_flush LSN location
as a consistent_lsn, which normally should be the end point of
running_xacts record or commit_end LSN record (in case client sends
ack).
If we decide to fix in the way proposed here, then we also need to
investigate whether we need an additional WAL record added by commit
03b08c8f5f3e30c97e5908f3d3d76872dab8a9dc. The reason why that
additional WAL record was added is discussed in email [1].
[1] - https://www.postgresql.org/message-id/flat/2377319.1719766794%40sss.pgh.pa.us#bba9f5ee0efc73151cc521a6bd5182ed
--
With Regards,
Amit Kapila.