Re: Conflict Detection and Resolution - Mailing list pgsql-hackers

From Jonathan S. Katz
Subject Re: Conflict Detection and Resolution
Date
Msg-id 1eb9242f-dcb6-45c3-871c-98ec324e03ef@postgresql.org
Whole thread Raw
In response to Re: Conflict Detection and Resolution  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Conflict Detection and Resolution
List pgsql-hackers
On 6/13/24 7:28 AM, Amit Kapila wrote:

> You are right that users would wish to detect the conflicts and
> probably the extra effort would only be in the 'update_differ' case
> where we need to consult committs module and that we will only do when
> 'track_commit_timestamp' is true. BTW, I think for Inserts with
> primary/unique key violation, we should catch the ERROR and log it. If
> we want to log the conflicts in a separate table then do we want to do
> that in the catch block after getting pk violation or do an extra scan
> before 'INSERT' to find the conflict? I think logging would need extra
> cost especially if we want to LOG it in some table as you are
> suggesting below that may need some option.
> 
>>> Therefore, additional conflict detection for these cases is currently
>>> omitted to minimize potential overhead. However, the pre-detection for
>>> conflict in these error cases is still essential to support automatic
>>> conflict resolution in the future.
>>
>> I feel that we should log all types of conflict in an uniform way. For
>> example, with detect_conflict being true, the update_differ conflict
>> is reported as "conflict %s detected on relation "%s"", whereas
>> concurrent inserts with the same key is reported as "duplicate key
>> value violates unique constraint "%s"", which could confuse users.
>> Ideally, I think that we log such conflict detection details (table
>> name, column name, conflict type, etc) to somewhere (e.g. a table or
>> server logs) so that the users can resolve them manually.
>>
> 
> It is good to think if there is a value in providing in
> pg_conflicts_history kind of table which will have details of
> conflicts that occurred and then we can extend it to have resolutions.
> I feel we can anyway LOG the conflicts by default. Updating a separate
> table with conflicts should be done by default or with a knob is a
> point to consider.

+1 for logging conflicts uniformly, but I would +100 to exposing the log 
in a way that's easy for the user to query (whether it's a system view 
or a stat table). Arguably, I'd say that would be the most important 
feature to come out of this effort.

Removing how conflicts are resolved, users want to know exactly what row 
had a conflict, and users from other database systems that have dealt 
with these issues will have tooling to be able to review and analyze if 
a conflicts occur. This data is typically stored in a queryable table, 
with data retained for N days. When you add in automatic conflict 
resolution, users then want to have a record of how the conflict was 
resolved, in case they need to manually update it.

Having this data in a table also gives the user opportunity to 
understand conflict stats (e.g. conflict rates) and potentially identify 
portions of the application and other parts of the system to optimize. 
It also makes it easier to import to downstream systems that may perform 
further analysis on conflict resolution, or alarm if a conflict rate 
exceeds a certain threshold.

Thanks,

Jonathan



Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: RFC: adding pytest as a supported test framework
Next
From: Robert Haas
Date:
Subject: Re: RFC: adding pytest as a supported test framework