Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load
Date
Msg-id CAA4eK1Knp03ZHJCKRNPq6_hbRR+vfu775jpv9_E64LmQGNodhw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
List pgsql-bugs
On Tue, Apr 29, 2025 at 1:17 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Mon, 28 Apr 2025 at 10:28, vignesh C <vignesh21@gmail.com> wrote:
> >
> > With this approach, there is a risk of starting from the next WAL
> > record after the consistent point. For example, if the slot returns a
> > consistent point at 0/1715E10, after the fix we would begin replaying
> > from the next WAL record, such as 0/1715E40, which could potentially
> > lead to data loss.
> > As an alternative, we could set recovery_target_inclusive to false in
> > the setup_recovery function. This way, recovery would stop just before
> > the recovery target, allowing the publisher to start replicating
> > exactly from the consistent point.
> > Thoughts?
>
> This approach looks better to me.
> I have prepared the patch for the same.
>

We should find out in which case and why the consisten_lsn is a start
point LSN of a commit record. We use slot's confirm_flush LSN location
as a consistent_lsn, which normally should be the end point of
running_xacts record or commit_end LSN record (in case client sends
ack).

If we decide to fix in the way proposed here, then we also need to
investigate whether we need an additional WAL record added by commit
03b08c8f5f3e30c97e5908f3d3d76872dab8a9dc. The reason why that
additional WAL record was added is discussed in email [1].

[1] - https://www.postgresql.org/message-id/flat/2377319.1719766794%40sss.pgh.pa.us#bba9f5ee0efc73151cc521a6bd5182ed

--
With Regards,
Amit Kapila.



pgsql-bugs by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: Unexpected behavior when setting "idle_replication_slot_timeout"
Next
From: a.mitrokhin@postgrespro.ru
Date:
Subject: Starting a PostgreSQL server on a dynamic port (parameter port=0)