Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | CAA4eK1KDsEPCcE019cAvuzrzJt9FT4FfQuOUmJcrCoDzarQbtQ@mail.gmail.com Whole thread Raw |
In response to | RE: Conflict detection for update_deleted in logical replication ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
List | pgsql-hackers |
On Sat, Feb 1, 2025 at 2:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Jan 30, 2025 at 10:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jan 31, 2025 at 4:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I have one question about the 0004 patch; it implemented > > > max_conflict_retntion_duration as a subscription parameter. But the > > > launcher invalidates the pg_conflict_detection slot only if all > > > subscriptions with retain_conflict_info stopped retaining dead tuples > > > due to the max_conflict_retention_duration parameter. Therefore, even > > > if users set the parameter to a low value to avoid table bloats, it > > > would not make sense if other subscriptions set it to a larger value. > > > Is my understanding correct? > > > > > > > Yes, your understanding is correct. I think this could be helpful > > during resolution because the worker for which the duration has > > exceeded cannot detect conflicts reliably but others can. So, this > > info can be useful while performing resolutions. Do you have an > > opinion/suggestion on this matter? > > I imagined a scenario like where two apply workers are running and > have different max_conflict_retention_duration values (say '5 min' and > '15 min'). Suppose both workers are roughly the same behind the > publisher(s), when both workers cannot advance the workers' xmin > values for 5 min or longer, one worker stops retaining dead tuples. > However, the pg_conflict_detection slot is not invalidated yet since > another worker is still using it, so both workers would continue to be > getting slower. The subscriber would end up retaining dead tuples > until both workers are behind for 15 min or longer, before > invalidating the slot. In this case, stopping dead tuple retention on > the first worker would help neither advance the slot's xmin nor > improve another worker's performance. Won't the same be true for 'retain_conflict_info' option as well? I mean even if one worker is retaining dead tuples, the performance of others will also be impacted. > > I was not sure of the point of > making the max_conflict_retention_duration a per-subscription > parameter. > The idea is to keep it at the same level as the other related parameter 'retain_conflict_info'. It could be useful for cases where publishers are from two different nodes (NP1 and NP2) and we have separate subscriptions for both nodes. Now, it is possible that users won't expect conflicts on the tables from one of the nodes NP1 then she could choose to enable 'retain_conflict_info' and 'max_conflict_retention_duration' only for the subscription pointing to publisher NP2. Now, say the publisher node that can generate conflicts (NP2) has fewer writes and the corresponding apply worker could easily catch up and almost always be in sync with the publisher. In contrast, the other node that has no conflicts has a large number of writes. In such cases, giving new options at the subscription level will be helpful. If we want to provide it at the global level, then the performance or dead tuple control may not be any better than the current patch but won't allow the provision for the above kinds of cases. Second, adding two new GUCs is another thing I want to prevent. But OTOH, the implementation could be slightly simpler if we provide these options as GUC though I am not completely sure of that point. Having said that, I am open to changing it to a non-subscription level. Do you think it would be better to provide one or both of these parameters as GUCs or do you have something else in mind? -- With Regards, Amit Kapila.
pgsql-hackers by date: