RE: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | OS0PR01MB5716688F74F6121B8CE797119450A@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Conflict detection for update_deleted in logical replication (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Conflict detection for update_deleted in logical replication
|
List | pgsql-hackers |
On Friday, July 18, 2025 1:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Jul 11, 2025 at 3:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Jul 10, 2025 at 6:46 PM Masahiko Sawada > <sawada.mshk@gmail.com> wrote: > > > > > > On Wed, Jul 9, 2025 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > > > > > > > > > I think that even with retain_conflict_info = off, there is > > > > > probably a point at which the subscriber can no longer keep up > > > > > with the publisher. For example, if with retain_conflict_info = > > > > > off we can withstand 100 clients running at the same time, then > > > > > the fact that this performance degradation occurred with 15 > > > > > clients explains that performance degradation is much more > > > > > likely to occur because of retain_conflict_info = on. > > > > > > > > > > Test cases 3 and 4 are typical cases where this feature is used > > > > > since the conflicts actually happen on the subscriber, so I > > > > > think it's important to look at the performance in these cases. > > > > > The worst case scenario for this feature is that when this > > > > > feature is turned on, the subscriber cannot keep up even with a > > > > > small load, and with max_conflict_retetion_duration we enter a > > > > > loop of slot invalidation and re-creating, which means that > > > > > conflict cannot be detected reliably. > > > > > > > > > > > > > As per the above observations, it is less of a regression of this > > > > feature but more of a lack of parallel apply or some kind of > > > > pre-fetch for apply, as is recently proposed [1]. I feel there are > > > > use cases, as explained above, for which this feature would work > > > > without any downside, but due to a lack of some sort of parallel > > > > apply, we may not be able to use it without any downside for cases > > > > where the contention is only on a smaller set of tables. We have > > > > not tried, but may in cases where contention is on a smaller set > > > > of tables, if users distribute workload among different pub-sub > > > > pairs by using row filters, there also, we may also see less > > > > regression. We can try that as well. > > > > > > While I understand that there are some possible solutions we have > > > today to reduce the contention, I'm not really sure these are really > > > practical solutions as it increases the operational costs instead. > > > > > > > I assume by operational costs you mean defining the replication > > definitions such that workload is distributed among multiple apply > > workers via subscriptions either by row_filters, or by defining > > separate pub-sub pairs of a set of tables, right? If so, I agree with > > you but I can't think of a better alternative. Even without this > > feature as well, we know in such cases the replication lag could be > > large as is evident in recent thread [1] and some offlist feedback by > > people using native logical replication. As per a POC in the > > thread[1], parallelizing apply or by using some prefetch, we could > > reduce the lag but we need to wait for that work to mature to see the > > actual effect of it. > > I don't have a better alternative either. > > I agree that this feature will work without any problem when logical replication > is properly configured. It's a good point that update-delete conflicts can be > detected reliably without additional performance overhead in scenarios with > minimal replication lag. > However, this approach requires users to carefully pay particular attention to > replication performance and potential delays. My primary concern is that, given > the current logical replication performance limitations, most users who want to > use this feature will likely need such dedicated care for replication lag. > Nevertheless, most features involve certain trade-offs. Given that this is an > opt-in feature and future performance improvements will reduce these > challenges for users, it would be reasonable to have this feature at this stage. > > > > > The path I see with this work is to clearly document the cases > > (configuration) where this feature could be used without much downside > > and keep the default value of subscription option to enable this as > > false (which is already the case with the patch). > > +1 Thanks for the discussion. Here is the V49 patch which includes the suggested doc change in 0002. I will rebase the remaining patches once the first one is pushed. Thanks to Shveta for preparing the doc change. Best Regards, Hou zj
Attachment
pgsql-hackers by date: