Re: Conflict Detection and Resolution - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Conflict Detection and Resolution |
Date | |
Msg-id | CAD21AoAa6JzqhXY02uNUPb-aTozu2RY9nMdD1=TUh+FpskkYtw@mail.gmail.com Whole thread Raw |
In response to | RE: Conflict Detection and Resolution ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Responses |
Re: Conflict Detection and Resolution
RE: Conflict Detection and Resolution |
List | pgsql-hackers |
On Wed, Jun 5, 2024 at 3:32 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > Hi, > > This time at PGconf.dev[1], we had some discussions regarding this > project. The proposed approach is to split the work into two main > components. The first part focuses on conflict detection, which aims to > identify and report conflicts in logical replication. This feature will > enable users to monitor the unexpected conflicts that may occur. The > second part involves the actual conflict resolution. Here, we will provide > built-in resolutions for each conflict and allow user to choose which > resolution will be used for which conflict(as described in the initial > email of this thread). I agree with this direction that we focus on conflict detection (and logging) first and then develop conflict resolution on top of that. > > Of course, we are open to alternative ideas and suggestions, and the > strategy above can be changed based on ongoing discussions and feedback > received. > > Here is the patch of the first part work, which adds a new parameter > detect_conflict for CREATE and ALTER subscription commands. This new > parameter will decide if subscription will go for conflict detection. By > default, conflict detection will be off for a subscription. > > When conflict detection is enabled, additional logging is triggered in the > following conflict scenarios: > > * updating a row that was previously modified by another origin. > * The tuple to be updated is not found. > * The tuple to be deleted is not found. > > While there exist other conflict types in logical replication, such as an > incoming insert conflicting with an existing row due to a primary key or > unique index, these cases already result in constraint violation errors. What does detect_conflict being true actually mean to users? I understand that detect_conflict being true could introduce some overhead to detect conflicts. But in terms of conflict detection, even if detect_confict is false, we detect some conflicts such as concurrent inserts with the same key. Once we introduce the complete conflict detection feature, I'm not sure there is a case where a user wants to detect only some particular types of conflict. > Therefore, additional conflict detection for these cases is currently > omitted to minimize potential overhead. However, the pre-detection for > conflict in these error cases is still essential to support automatic > conflict resolution in the future. I feel that we should log all types of conflict in an uniform way. For example, with detect_conflict being true, the update_differ conflict is reported as "conflict %s detected on relation "%s"", whereas concurrent inserts with the same key is reported as "duplicate key value violates unique constraint "%s"", which could confuse users. Ideally, I think that we log such conflict detection details (table name, column name, conflict type, etc) to somewhere (e.g. a table or server logs) so that the users can resolve them manually. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: