Re: Conflict Detection and Resolution - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Conflict Detection and Resolution
Date
Msg-id CAA4eK1KGOJ8+TSHFcTx87K8VVLpA_WRV2DkmO=Mbe4ocUtPz1w@mail.gmail.com
Whole thread Raw
In response to Re: Conflict Detection and Resolution  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: Conflict Detection and Resolution
List pgsql-hackers
On Tue, Jun 11, 2024 at 3:12 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Sat, Jun 8, 2024 at 3:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Jun 7, 2024 at 5:39 PM Ashutosh Bapat
>> <ashutosh.bapat.oss@gmail.com> wrote:
>> >
>> > On Thu, Jun 6, 2024 at 5:16 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
>> >>
>> >> >
>> >>
>> >> Here are more use cases of the "earliest_timestamp_wins" resolution method:
>> >> 1) Applications where the record of first occurrence of an event is
>> >> important. For example, sensor based applications like earthquake
>> >> detection systems, capturing the first seismic wave's time is crucial.
>> >> 2) Scheduling systems, like appointment booking, prioritize the
>> >> earliest request when handling concurrent ones.
>> >> 3) In contexts where maintaining chronological order is important -
>> >>   a) Social media platforms display comments ensuring that the
>> >> earliest ones are visible first.
>> >>   b) Finance transaction processing systems rely on timestamps to
>> >> prioritize the processing of transactions, ensuring that the earliest
>> >> transaction is handled first
>> >
>> >
>> > Thanks for sharing examples. However, these scenarios would be handled by the application and not during
replication.What we are discussing here is the timestamp when a row was updated/inserted/deleted (or rather when the
transactionthat updated row committed/became visible) and not a DML on column which is of type timestamp. Some
implementationsuse a hidden timestamp column but that's different from a user column which captures timestamp of (say)
anevent. The conflict resolution will be based on the timestamp when that column's value was recorded in the database
whichmay be different from the value of the column itself. 
>> >
>>
>> It depends on how these operations are performed. For example, the
>> appointment booking system could be prioritized via a transaction
>> updating a row with columns emp_name, emp_id, reserved, time_slot.
>> Now, if two employees at different geographical locations try to book
>> the calendar, the earlier transaction will win.
>
>
> I doubt that it would be that simple. The application will have to intervene and tell one of the employees that their
reservationhas failed. It looks natural that the first one to reserve the room should get the reservation, but
implementingthat is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside
database.
>

Sure, the application needs some handling but I have tried to explain
with a simple way that comes to my mind and how it can be realized
with db involved. This is a known conflict detection method but note
that I am not insisting to have "earliest_timestamp_wins". Even, if we
want this we can have a separate discussion on this and add it later.

>>
>>
>> > If we use the transaction commit timestamp as basis for resolution, a transaction where multiple rows conflict may
endup with different rows affected by that transaction being resolved differently. Say three transactions T1, T2 and T3
onseparate origins with timestamps t1, t2, and t3 respectively changed rows r1, r2 and r2, r3 and r1, r4 respectively.
Changesto r1 and r2 will conflict. Let's say T2 and T3 are applied first and then T1 is applied. If t2 < t1 < t3, r1
willend up with version of T3 and r2 will end up with version of T1 after applying all the three transactions. 
>> >
>>
>> Are you telling the results based on latest_timestamp_wins? If so,
>> then it is correct. OTOH, if the user has configured
>> "earliest_timestamp_wins" resolution method, then we should end up
>> with a version of r1 from T1 because t1 < t3. Also, due to the same
>> reason, we should have version r2 from T2.
>>
>> >
>>  Would that introduce an inconsistency between r1 and r2?
>> >
>>
>> As per my understanding, this shouldn't be an inconsistency. Won't it
>> be true even when the transactions are performed on a single node with
>> the same timing?
>>
>
> The inconsistency will arise irrespective of conflict resolution method. On a single system effects of whichever
transactionruns last will be visible entirely. But in the example above the node where T1, T2, and T3 (from
*different*)origins) are applied, we might end up with a situation where some changes from T1 are applied whereas some
changesfrom T3 are applied. 
>

I still think it will lead to the same result if all three T1, T2, T3
happen on the same node in the same order as you mentioned. Say, we
have a pre-existing table with rows r1, r2, r3, r4. Now, if we use the
order of transactions to be applied on the same node based on t2 < t1
< t3. First T2 will be applied, so for now, r1 is a pre-existing
version and r2 is from T2. Next, when T1 is performed, both r1 and r2
are from T1. Lastly, when T3 is applied, r1 will be from T3 and r2
will be from T1. This is what you mentioned will happen after conflict
resolution in the above example.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Document NULL
Next
From: "Li, Yong"
Date:
Subject: Re: Separate HEAP WAL replay logic into its own file