Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Conflict detection for update_deleted in logical replication
Date
Msg-id CAA4eK1Li8XLJ5f-pYvPJ8pXxyA3G-QsyBLNzHY940amF7jm=3A@mail.gmail.com
Whole thread Raw
In response to RE: Conflict detection for update_deleted in logical replication  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses Re: Conflict detection for update_deleted in logical replication
List pgsql-hackers
On Mon, Jan 6, 2025 at 4:52 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Friday, January 3, 2025 2:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >
> > +       /*
> > +        * The changes made by this and later transactions are still
> > non-removable
> > +        * to allow for the detection of update_deleted conflicts when
> > applying
> > +        * changes in this logical replication worker.
> > +        *
> > +        * Note that this info cannot directly protect dead tuples from being
> > +        * prematurely frozen or removed. The logical replication launcher
> > +        * asynchronously collects this info to determine whether to advance
> > the
> > +        * xmin value of the replication slot.
> > +        *
> > +        * Therefore, FullTransactionId that includes both the
> > transaction ID and
> > +        * its epoch is used here instead of a single Transaction ID. This is
> > +        * critical because without considering the epoch, the transaction ID
> > +        * alone may appear as if it is in the future due to transaction ID
> > +        * wraparound.
> > +        */
> > +       FullTransactionId oldest_nonremovable_xid;
> >
> > The last paragraph of the comment mentions that we need to use
> > FullTransactionId to properly compare XIDs even after the XID wraparound
> > happens. But once we set the oldest-nonremovable-xid it prevents XIDs from
> > being wraparound, no? I mean that workers'
> > oldest-nonremovable-xid values and slot's non-removal-xid (i.e., its
> > xmin) are never away from more than 2^31 XIDs.
>
> I think the issue is that the launcher may create the replication slot after
> the apply worker has already set the 'oldest_nonremovable_xid' because the
> launcher are doing that asynchronously. So, Before the slot is created, there's
> a window where transaction IDs might wrap around. If initially the apply worker
> has computed a candidate_xid (755) and the xid wraparound before the launcher
> creates the slot, causing the new current xid to be (740), then the old
> candidate_xid(755) looks like a xid in the future, and the launcher could
> advance the xmin to 755 which cause the dead tuples to be removed prematurely.
> (We are trying to reproduce this to ensure that it's a real issue and will
> share after finishing)
>
> We thought of another approach, which is to create/drop this slot first as
> soon as one enables/disables detect_update_deleted (E.g. create/drop slot
> during DDL). But it seems complicate to control the concurrent slot
> create/drop. For example, if one backend A enables detect_update_deteled, it
> will create a slot. But if another backend B is disabling the
> detect_update_deteled at the same time, then the newly created slot may be
> dropped by backend B. I thought about checking the number of subscriptions that
> enables detect_update_deteled before dropping the slot in backend B, but the
> subscription changes caused by backend A may not visable yet (e.g. not
> committed yet).
>

This means that for the transaction whose changes are not yet visible,
we may have already created the slot and the backend B would end up
dropping it. Is it possible that during the change of this new option
via DDL, we take AccessExclusiveLock on pg_subscription as we do in
DropSubscription() to ensure that concurrent transactions can't drop
the slot? Will that help in solving the above scenario?

The second idea could be that each worker first checks whether a slot
exists along with a subscription flag (new option). Checking the
existence of a slot each time would be costly, so we somehow need to
cache it. But if we do that then we need to invent some cache
invalidation mechanism for the slot. I am not sure if we can design a
race-free mechanism for that. I mean we need to think of a solution
for race conditions between the launcher and apply workers to ensure
that after dropping the slot, launcher doesn't recreate the slot (say
if some subscription enables this option) before all the workers can
clear their existing values of oldest_nonremovable_xid.

The third idea to avoid the race condition could be that in the
function InitializeLogRepWorker() after CommitTransactionCommand(), we
check if the retain_dead_tuples flag is true for MySubscription then
we check whether the system slot exists. If exits then go ahead,
otherwise, wait till the slot is created. It could be some additional
cycles during worker start up but it is a one-time effort and that too
only when the flag is set. In addition to this, we anyway need to
create the slot in the launcher before launching the workers, and
after re-reading the subscription, the change in retain_dead_tuples
flag (off->on) should cause the worker restart.

Now, in the third idea, the issue can still arise if, after waiting
for the slot to be created, the user sets the retain_dead_tuples to
false and back to true again immediately. Because the launcher may
have noticed the "retain_dead_tuples=false" operation and dropped the
slot, while the apply worker has not noticed and still holds an old
candidate_xid. The xid may wraparound in this window before setting
the retain_dead_tuples back to true. And, the apply worker would not
restart because after it calls maybe_reread_subscription(), the
retain_dead_tuples would have been set back to true again. Again, to
avoid this race condition, the launcher can wait for each worker to
reset the oldest_nonremovamble_xid before dropping the slot.

Even after doing the above, the third idea could still have another
race condition:
1. The launcher creates the replication slot and starts a worker with
retain_dead_tuples = true, the worker is waiting for publish status
and has not set oldest_nonremovable_xid.
2. The user set the option retain_dead_tuples to false, the launcher
noticed that and drop the replication slot.
3. The worker received the status and set oldest_nonremovable_xid to a
valid value (say 750).
4. Xid wraparound happened at this point and say new_available_xid becomes 740
5. User set retain_dead_tuples = true again.

After the above steps, the apply worker holds an old
oldest_nonremovable_xid (750) and will not restart if it does not call
maybe_reread_subscription() before step 5. So, such a case can again
create a problem of incorrect slot->xmin value. We can probably try to
find some way to avoid this race condition as well but I haven't
thought more about this as this idea sounds a bit risky and bug-prone
to me.

Among the above ideas, the first idea of taking AccessExclusiveLock on
pg_subscription sounds safest to me. I haven't evaluated the changes
for the first approach so I could be missing something that makes it
difficult to achieve but I think it is worth investigating unless we
have better ideas or we think that the current approach used in patch
to use FullTransactionId is okay.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Maxim Orlov
Date:
Subject: Re: POC: make mxidoff 64 bits
Next
From: Yasuo Honda
Date:
Subject: Re: CREATE TABLE NOT VALID for check and foreign key