Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From shveta malik
Subject Re: Conflict detection for update_deleted in logical replication
Date
Msg-id CAJpy0uC5x_dD7THcRMJMHn8jje4rHtyy4=O-=W7Vysrw_de5Xw@mail.gmail.com
Whole thread Raw
In response to RE: Conflict detection for update_deleted in logical replication  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Thu, Apr 24, 2025 at 6:11 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

> > Few comments for patch004:
> > Config.sgml:
> > 1)
> > +       <para>
> > +        Maximum duration (in milliseconds) for which conflict
> > +        information can be retained for conflict detection by the apply worker.
> > +        The default value is <literal>0</literal>, indicating that conflict
> > +        information is retained until it is no longer needed for detection
> > +        purposes.
> > +       </para>
> >
> > IIUC, the above is not entirely accurate. Suppose the subscriber manages to
> > catch up and sets oldest_nonremovable_xid to 100, which is then updated in
> > slot. After this, the apply worker takes a nap and begins a new xid update cycle.
> > Now, let’s say the next candidate_xid is 200, but this time the subscriber fails
> > to keep up and exceeds max_conflict_retention_duration. As a result, it sets
> > oldest_nonremovable_xid to InvalidTransactionId, and the launcher skips
> > updating the slot’s xmin.
>
> If the time exceeds the max_conflict_retention_duration, the launcher would
> Invalidate the slot, instead of skipping updating it. So the conflict info(e.g.,
> dead tuples) would not be retained anymore.
>

launcher will not invalidate the slot until all subscriptions have
stopped conflict_info retention. So info of dead tuples for a
particular oldest_xmin of a particular apply worker could be retained
for much longer than this configured duration. If other apply workers
are actively working (catching up with primary), then they should keep
on advancing xmin of shared slot but if xmin of shared slot remains
same for say 15min+15min+15min for 3 apply-workers (assuming they are
marking themselves with stop_conflict_retention one after other and
xmin of slot has not been advanced), then the first apply worker
having marked itself with stop_conflict_retention still has access to
the oldest_xmin's data for 45 mins instead of 15 mins. (where
max_conflict_retention_duration=15 mins). Please let me know if my
understanding is wrong.

> > However, the previous xmin value (100) is still there
> > in the slot, causing its data to be retained beyond the
> > max_conflict_retention_duration. The xid 200 which actually honors
> > max_conflict_retention_duration was never marked for retention. If my
> > understanding is correct, then the documentation doesn’t fully capture this
> > scenario.
>
> As mentioned above, the strategy here is to invalidate the slot.

Please consider the case with multiple subscribers. Sorry if I missed
to mention in my previous email that it was a multi-sub case.

thanks
Shveta



pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Fix premature xmin advancement during fast forward decoding
Next
From: Amit Kapila
Date:
Subject: Re: Fix premature xmin advancement during fast forward decoding