Re: Conflict detection for multiple_unique_conflicts in logical replication - Mailing list pgsql-hackers
From | Nisha Moond |
---|---|
Subject | Re: Conflict detection for multiple_unique_conflicts in logical replication |
Date | |
Msg-id | CABdArM42thrKrgvERxyw9hhc56tM4P+CmjKFpU07hR_9UMM8Zg@mail.gmail.com Whole thread Raw |
In response to | Re: Conflict detection for multiple_unique_conflicts in logical replication (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Conflict detection for multiple_unique_conflicts in logical replication
|
List | pgsql-hackers |
On Tue, Mar 11, 2025 at 11:10 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Feb 20, 2025 at 5:01 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > Hi Hackers, > > (CCing people involved in related discussions) > > > > I am starting this thread to propose a new conflict detection type > > "multiple_unique_conflicts" that identifies when an incoming row > > during logical replication violates more than one UNIQUE constraint. > > If multiple constraints (such as the primary key/replica identity > > along with additional unique indexes) are violated for the same > > incoming tuple, the apply_worker will trigger a dedicated > > multiple_unique_conflicts conflict. > > > > ---- > > ISSUE: > > ---- > > Currently, when the apply_worker encounters an 'insert_exists' > > conflict (where the incoming row conflicts with an existing row based > > on the replica identity), it logs the conflict and errors out > > immediately. If the user tries to resolve this manually by deleting > > the conflicting row, the apply worker may fail again due to another > > unique constraint violation on a different column. This forces users > > to fix conflicts one by one, making resolution tedious and > > inefficient. > > > > Example: > > Schema: > > CREATE TABLE tab1 (col1 integer PRIMARY KEY, col2 integer UNIQUE, col3 > > integer UNIQUE); > > - col1 is Replica Identity. > > > > Data: > > - on pub: (1, 11, 111) > > - on sub: 3 additional local Inserts: (2, 22, 222); (3, 33, 333); (4, 44, 444) > > - Concurrently on pub, new insert: (2, 33, 444) > > > > When the new incoming tuple (2, 33, 444) is applied on the subscriber: > > - The apply worker first detects an 'insert_exists' conflict on col1 > > (primary key) and errors out. > > - If the user deletes the conflicting row (key col1=2) : (2, 22, > > 222), the apply worker fails again because col2=33 violates another > > unique constraint for tuple (3, 33, 333); > > - If the user deletes col2=33, the apply worker fails yet again due > > to tuple (4, 44, 444) because col3=444 is also unique. > > Conflicts on both col2 and col3 (which are independent of each other) > > are an example of a 'Multiple Unique Constraints' violation. > > > > In such cases, users are forced to resolve conflicts one by one, > > making the process slow and error-prone. > > > > --- > > SOLUTION: > > --- > > During an INSERT or UPDATE conflict check, instead of stopping at the > > first encountered conflict, the apply_worker will now check all unique > > indexes before reporting a conflict. If multiple unique key violations > > are found, it will report a 'multiple_unique_conflicts' conflict, > > listing all conflicting tuples in the logs. If only a single key > > conflict is detected, the existing 'insert_exists' conflict will be > > raised as it is now. > > I think it makes sense to report all the unique key conflicts for a > single row at once, rather than stopping after the first one. However, > I don't understand the need to create a new conflict type. Can't we > report multiple conflicts for the row using the existing conflict > type, if different columns in the incoming row conflict with different > existing rows? > The goal of introducing a new conflict type is to handle multiple-key conflicts separately. It will not only allow users to apply different resolution methods for single-key vs multi-key conflicts but will also help them identify such cases directly from stats instead of filtering through the LOG file. For example, with future resolution options, 'last_update_wins' will work well for single-key conflicts (insert_exists/update_exists) since only one local tuple will be replaced if the remote wins. However, for multi-key conflicts, this resolution may not be feasible. Even if supported, resolving the conflict could result in deleting multiple tuples, which users may want to control separately, like opting to skip or error out in case of multi-key conflicts. For reference, databases like EDB-PGD(BDR)[1] support a separate conflict type for multi-key cases. OTOH, Oracle[2] and DB2[3] do not have a separate conflict_type and always ERROR out during conflict resolution if multiple unique constraint violations happen. However, none of these databases use the single-key conflict_type and resolution for the multi-key conflict cases. We’d like to know your preference—should we always ERROR out like Oracle/DB2, or keep it as a separate conflict_type for better control during resolution? [1] https://www.enterprisedb.com/docs/pgd/4/bdr/conflicts/#insert-operations-that-violate-multiple-unique-constraints [2] https://docs.oracle.com/goldengate/c1230/gg-winux/GWUAD/configuring-conflict-detection-and-resolution.htm#GWUAD316 [3] https://www.ibm.com/docs/en/idr/11.4.0?topic=console-setting-conflict-detection-resolution -- Thanks, Nisha Moond
pgsql-hackers by date: