Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Conflict detection for update_deleted in logical replication
Date
Msg-id CAD21AoAKC5-dzJSgJ0hWNmBmme9+UvL+WpczyHoeA3u2hLj4Kw@mail.gmail.com
Whole thread Raw
In response to RE: Conflict detection for update_deleted in logical replication  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses RE: Conflict detection for update_deleted in logical replication
List pgsql-hackers
On Mon, Jan 6, 2025 at 3:22 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Friday, January 3, 2025 2:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> >
> > I have one comment on the 0001 patch:
>
> Thanks for the comments!
>
> >
> > +       /*
> > +        * The changes made by this and later transactions are still
> > non-removable
> > +        * to allow for the detection of update_deleted conflicts when
> > applying
> > +        * changes in this logical replication worker.
> > +        *
> > +        * Note that this info cannot directly protect dead tuples from being
> > +        * prematurely frozen or removed. The logical replication launcher
> > +        * asynchronously collects this info to determine whether to advance
> > the
> > +        * xmin value of the replication slot.
> > +        *
> > +        * Therefore, FullTransactionId that includes both the
> > transaction ID and
> > +        * its epoch is used here instead of a single Transaction ID. This is
> > +        * critical because without considering the epoch, the transaction ID
> > +        * alone may appear as if it is in the future due to transaction ID
> > +        * wraparound.
> > +        */
> > +       FullTransactionId oldest_nonremovable_xid;
> >
> > The last paragraph of the comment mentions that we need to use
> > FullTransactionId to properly compare XIDs even after the XID wraparound
> > happens. But once we set the oldest-nonremovable-xid it prevents XIDs from
> > being wraparound, no? I mean that workers'
> > oldest-nonremovable-xid values and slot's non-removal-xid (i.e., its
> > xmin) are never away from more than 2^31 XIDs.
>
> I think the issue is that the launcher may create the replication slot after
> the apply worker has already set the 'oldest_nonremovable_xid' because the
> launcher are doing that asynchronously. So, Before the slot is created, there's
> a window where transaction IDs might wrap around. If initially the apply worker
> has computed a candidate_xid (755) and the xid wraparound before the launcher
> creates the slot, causing the new current xid to be (740), then the old
> candidate_xid(755) looks like a xid in the future, and the launcher could
> advance the xmin to 755 which cause the dead tuples to be removed prematurely.
> (We are trying to reproduce this to ensure that it's a real issue and will
> share after finishing)

The slot's first xmin is calculated by
GetOldestSafeDecodingTransactionId(false). The initial computed
cancidate_xid could be newer than this xid?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: "Andrey M. Borodin"
Date:
Subject: Re: Sort functions with specialized comparators
Next
From: "赵宇鹏(宇彭)"
Date:
Subject: Re: Temporary Views Cleanup Issue