RE: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Conflict detection for update_deleted in logical replication
Date
Msg-id TYAPR01MB572428A7A4A875C069567B8E94852@TYAPR01MB5724.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Conflict detection for update_deleted in logical replication  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Conflict detection for update_deleted in logical replication
Re: Conflict detection for update_deleted in logical replication
List pgsql-hackers
On Thu, Apr 17, 2025 at 12:19 PM shveta malik wrote:

> 
> On Wed, Apr 16, 2025 at 10:30 AM shveta malik <shveta.malik@gmail.com>
> wrote:
> >
> > On Wed, Mar 26, 2025 at 4:17 PM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > Here's a rebased version of the patch series.
> > >
> >

> 
> Thanks Hou-San for the patches. I am going through this long thread and
> patches. One doubt I have is whenever there is a chance of conflict-slot update
> (either xmin or possibility of its invalidation), apply-worker gives a wake-up call
> to the launcher (ApplyLauncherWakeup). Shouldn't that suffice to wake-up
> launcher irrespective of its nap-time? Do we actually need to introduce
> MIN/MAX_NAPTIME_PER_SLOT_UPDATE in the launcher and the logic
> around it?

Thanks for reviewing. After rethinking, I agree that the wakeup is
sufficient, so I removed the nap-time logic in this version.

> Few comments for patch004:
> Config.sgml:
> 1)
> +       <para>
> +        Maximum duration (in milliseconds) for which conflict
> +        information can be retained for conflict detection by the apply worker.
> +        The default value is <literal>0</literal>, indicating that conflict
> +        information is retained until it is no longer needed for detection
> +        purposes.
> +       </para>
> 
> IIUC, the above is not entirely accurate. Suppose the subscriber manages to
> catch up and sets oldest_nonremovable_xid to 100, which is then updated in
> slot. After this, the apply worker takes a nap and begins a new xid update cycle.
> Now, let’s say the next candidate_xid is 200, but this time the subscriber fails
> to keep up and exceeds max_conflict_retention_duration. As a result, it sets
> oldest_nonremovable_xid to InvalidTransactionId, and the launcher skips
> updating the slot’s xmin. 

If the time exceeds the max_conflict_retention_duration, the launcher would
Invalidate the slot, instead of skipping updating it. So the conflict info(e.g.,
dead tuples) would not be retained anymore.

> However, the previous xmin value (100) is still there
> in the slot, causing its data to be retained beyond the
> max_conflict_retention_duration. The xid 200 which actually honors
> max_conflict_retention_duration was never marked for retention. If my
> understanding is correct, then the documentation doesn’t fully capture this
> scenario.

As mentioned above, the strategy here is to invalidate the slot.

> 
> 2)
> +        The replication slot
> +        <quote><literal>pg_conflict_detection</literal></quote> that
> used to
> +        retain conflict information will be invalidated if all apply workers
> +        associated with the subscription, where
> 
> Subscription --> subscriptions
> 
> 3)
> Name stop_conflict_retention in MyLogicalRepWorker is confusing. Shall it be
> stop_conflict_info_retention?

Changed.

Here is V30 patch set includes the following changes:

* Addressed above comments.
* Added the retention timeout check in wait_for_local_flush(), as suggested by Nisha[1].
* Improved upgrade codes and added a test for upgrade of retain_conflict_info option,
  as suggested by Kuroda-san[2].

[1] https://www.postgresql.org/message-id/CABdArM4Ft8q3dZv4Bqw%3DrbS5_LFMXDJMRr3vC8a_KMCX1qatpg%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/OSCPR01MB14966269726272F2F2B2BD3B0F5B22%40OSCPR01MB14966.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade-breaking release
Next
From: torikoshia
Date:
Subject: Re: RFC: Logging plan of the running query