Re: Conflict Detection and Resolution - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Conflict Detection and Resolution
Date
Msg-id 0cb26238-204a-4375-9647-e8b49dc821d6@enterprisedb.com
Whole thread Raw
In response to Re: Conflict Detection and Resolution  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Conflict Detection and Resolution
List pgsql-hackers

On 6/3/24 09:30, Amit Kapila wrote:
> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 5/23/24 08:36, shveta malik wrote:
>>>
>>> Conflict Resolution
>>> ----------------
>>> a) latest_timestamp_wins:    The change with later commit timestamp wins.
>>> b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.
>>> c) apply:   Always apply the remote change.
>>> d) skip:    Remote change is skipped.
>>> e) error:   Error out on conflict. Replication is stopped, manual
>>> action is needed.
>>>
>>
>> Why not to have some support for user-defined conflict resolution
>> methods, allowing to do more complex stuff (e.g. merging the rows in
>> some way, perhaps even with datatype-specific behavior)?
>>
>>> The change will be converted to 'UPDATE' and applied if the decision
>>> is in favor of applying remote change.
>>>
>>> It is important to have commit timestamp info available on subscriber
>>> when latest_timestamp_wins or earliest_timestamp_wins method is chosen
>>> as resolution method.  Thus ‘track_commit_timestamp’ must be enabled
>>> on subscriber, in absence of which, configuring the said
>>> timestamp-based resolution methods will result in error.
>>>
>>> Note: If the user has chosen the latest or earliest_timestamp_wins,
>>> and the remote and local timestamps are the same, then it will go by
>>> system identifier. The change with a higher system identifier will
>>> win. This will ensure that the same change is picked on all the nodes.
>>
>> How is this going to deal with the fact that commit LSN and timestamps
>> may not correlate perfectly? That is, commits may happen with LSN1 <
>> LSN2 but with T1 > T2.
>>
> 
> One of the possible scenarios discussed at pgconf.dev with Tomas for
> this was as follows:
> 
> Say there are two publisher nodes PN1, PN2, and subscriber node SN3.
> The logical replication is configured such that a subscription on SN3
> has publications from both PN1 and PN2. For example, SN3 (sub) -> PN1,
> PN2 (p1, p2)
> 
> Now, on PN1, we have the following operations that update the same row:
> 
> T1
> Update-1 on table t1 at LSN1 (1000) on time (200)
> 
> T2
> Update-2 on table t1 at LSN2 (2000) on time (100)
> 
> Then in parallel, we have the following operation on node PN2 that
> updates the same row as Update-1, and Update-2 on node PN1.
> 
> T3
> Update-3 on table t1 at LSN(1500) on time (150)
> 
> By theory, we can have a different state on subscribers depending on
> the order of updates arriving at SN3 which shouldn't happen. Say, the
> order in which they reach SN3 is: Update-1, Update-2, Update-3 then
> the final row we have is by Update-3 considering we have configured
> last_update_wins as a conflict resolution method. Now, consider the
> other order:  Update-1, Update-3, Update-2, in this case, the final
> row will be by Update-2 because when we try to apply Update-3, it will
> generate a conflict and as per the resolution method
> (last_update_wins) we need to retain Update-1.
> 
> On further thinking, the operations on node-1 PN-1 as defined above
> seem impossible because one of the Updates needs to wait for the other
> to write a commit record. So the commits may happen with LSN1 < LSN2
> but with T1 > T2 but they can't be on the same row due to locks. So,
> the order of apply should still be consistent. Am, I missing
> something?
> 

Sorry, I should have read your message before responding a couple
minutes ago. I think you're right this exact example can't happen, due
to the dependency between transactions.

But as I wrote, I'm not quite convinced this means there are not other
issues with this way of resolving conflicts. It's more likely a more
complex scenario is required.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Assert in heapgettup_pagemode() fails due to underlying buffer change
Next
From: Tomas Vondra
Date:
Subject: Re: Compress ReorderBuffer spill files using LZ4