RE: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Conflict detection for update_deleted in logical replication
Date
Msg-id OS0PR01MB5716D54A53EB329C96C847369434A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Conflict detection for update_deleted in logical replication  (shveta malik <shveta.malik@gmail.com>)
List pgsql-hackers
On Thursday, August 14, 2025 11:46 AM shveta malik <shveta.malik@gmail.com> wrote:
> 
> On Wed, Aug 13, 2025 at 4:15 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> >
> > On Wed, Aug 13, 2025 at 10:41 AM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > >
> > > Here is the V61 patch set which addressed above comments and the
> comment by Nisha[2].
> > >
> >
> > Thank You for the patch. I tested the patch, please find a few comments:
> >
> > 1)
> > Now when it stops-retention and later resumes it due to the fact that
> > max_duration is meanwhile altered to 0, I get log:
> >
> > LOG:  logical replication worker for subscription "sub1" resumes
> > retaining the information for detecting conflicts
> > DETAIL:  The time spent applying changes up to LSN 0/17DD728 is now
> > within the maximum limit of 0 ms.
> >
> > I did not get which lsn it is pointing to? Is it some dangling lsn
> > from when it was retaining info?  Also the msg looks odd, when it says
> > 'is now within the maximum limit of 0 ms.'
> >
> > 2)
> > While stopping the message is:
> > LOG:  logical replication worker for subscription "sub1" will stop
> > retaining conflict information
> > DETAIL:  The time spent advancing the non-removable transaction ID has
> > exceeded the maximum limit of 1000 ms.
> >
> > And while resuming:
> > logical replication worker for subscription "sub1" resumes retaining
> > the information for detecting conflicts
> > ----------
> >
> > We can make both similar. Both can have 'retaining the information for
> > detecting conflicts' instead of 'conflict information' in first one.
> >
> > 3)
> > I believe the tenses should also be updated. When stopping, we can say:
> >
> > Logical replication worker for subscription "sub1" has stopped...
> >
> > This is appropriate because it has already stopped by pre-setting
> > oldest_nonremovable_xid to Invalid.
> >
> > When resuming, we can say:
> > Logical replication worker for subscription "sub1" will resume...
> >
> > This is because it will begin resuming from the next cycle onward,
> > specifically after the launcher sets its oldest_xid.
> >
> > 4)
> > For the DETAIL part of resume and stop messages, how about these:
> >
> > The retention duration for information used in conflict detection has
> > exceeded the limit of xx.
> > The retention duration for information used in conflict detection is
> > now within the acceptable limit of xx.
> > The retention duration for information used in conflict detection is
> > now indefinite.
> >

Thanks for the comments, I have adjusted the log messages
according to the suggestions.


> 
> 5)
> Say there 2-3 subs, all have stopped-retention and the slot is set to have invalid
> xmin; now if I  create a new sub, it will start with stopped-flag set to true due to
> the fact that slot has invalid xmin to begin with. But then immediately, it will
> dump a resume message. It looks odd, as at first, it has not even stopped, as it
> is a new sub.
> Is there anything we can do to improve this situation?

I changed the logic to recovery the slot immediately on starting a new worker
that has retain_dead_tuples enabled.

Here is the V62 patch set which addressed above comments and [1].

[1] https://www.postgresql.org/message-id/CAJpy0uBW8G2RNY%3DJjxzr_ootQ2MTxPQG98hz%3D-wdJzn86yapVg%40mail.gmail.com

Best Regards,
Hou zj

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Potential deadlock in pgaio_io_wait()
Next
From: Kirill Reshke
Date:
Subject: Re: Add mode column to pg_stat_progress_vacuum