RE: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Conflict detection for update_deleted in logical replication
Date
Msg-id OS0PR01MB5716688F74F6121B8CE797119450A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Conflict detection for update_deleted in logical replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Conflict detection for update_deleted in logical replication
List pgsql-hackers
On Friday, July 18, 2025 1:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> On Fri, Jul 11, 2025 at 3:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Jul 10, 2025 at 6:46 PM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Jul 9, 2025 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > >
> > > > > I think that even with retain_conflict_info = off, there is
> > > > > probably a point at which the subscriber can no longer keep up
> > > > > with the publisher. For example, if with retain_conflict_info =
> > > > > off we can withstand 100 clients running at the same time, then
> > > > > the fact that this performance degradation occurred with 15
> > > > > clients explains that performance degradation is much more
> > > > > likely to occur because of retain_conflict_info = on.
> > > > >
> > > > > Test cases 3 and 4 are typical cases where this feature is used
> > > > > since the  conflicts actually happen on the subscriber, so I
> > > > > think it's important to look at the performance in these cases.
> > > > > The worst case scenario for this feature is that when this
> > > > > feature is turned on, the subscriber cannot keep up even with a
> > > > > small load, and with max_conflict_retetion_duration we enter a
> > > > > loop of slot invalidation and re-creating, which means that
> > > > > conflict cannot be detected reliably.
> > > > >
> > > >
> > > > As per the above observations, it is less of a regression of this
> > > > feature but more of a lack of parallel apply or some kind of
> > > > pre-fetch for apply, as is recently proposed [1]. I feel there are
> > > > use cases, as explained above, for which this feature would work
> > > > without any downside, but due to a lack of some sort of parallel
> > > > apply, we may not be able to use it without any downside for cases
> > > > where the contention is only on a smaller set of tables. We have
> > > > not tried, but may in cases where contention is on a smaller set
> > > > of tables, if users distribute workload among different pub-sub
> > > > pairs by using row filters, there also, we may also see less
> > > > regression. We can try that as well.
> > >
> > > While I understand that there are some possible solutions we have
> > > today to reduce the contention, I'm not really sure these are really
> > > practical solutions as it increases the operational costs instead.
> > >
> >
> > I assume by operational costs you mean defining the replication
> > definitions such that workload is distributed among multiple apply
> > workers via subscriptions either by row_filters, or by defining
> > separate pub-sub pairs of a set of tables, right? If so, I agree with
> > you but I can't think of a better alternative. Even without this
> > feature as well, we know in such cases the replication lag could be
> > large as is evident in recent thread [1] and some offlist feedback by
> > people using native logical replication. As per a POC in the
> > thread[1], parallelizing apply or by using some prefetch, we could
> > reduce the lag but we need to wait for that work to mature to see the
> > actual effect of it.
> 
> I don't have a better alternative either.
> 
> I agree that this feature will work without any problem when logical replication
> is properly configured. It's a good point that update-delete conflicts can be
> detected reliably without additional performance overhead in scenarios with
> minimal replication lag.
> However, this approach requires users to carefully pay particular attention to
> replication performance and potential delays. My primary concern is that, given
> the current logical replication performance limitations, most users who want to
> use this feature will likely need such dedicated care for replication lag.
> Nevertheless, most features involve certain trade-offs. Given that this is an
> opt-in feature and future performance improvements will reduce these
> challenges for users, it would be reasonable to have this feature at this stage.
> 
> >
> > The path I see with this work is to clearly document the cases
> > (configuration) where this feature could be used without much downside
> > and keep the default value of subscription option to enable this as
> > false (which is already the case with the patch).
> 
> +1

Thanks for the discussion. Here is the V49 patch which includes the suggested
doc change in 0002. I will rebase the remaining patches once the first one is
pushed.

Thanks to Shveta for preparing the doc change.

Best Regards,
Hou zj

Attachment

pgsql-hackers by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: IPC/MultixactCreation on the Standby server
Next
From: Ashutosh Bapat
Date:
Subject: Re: Upgrade from Fedora 40 to Fedora 42, or from PostgreSQL 16.3 to PostgreSQL 16.9