Re: Proposal: Conflict log history table for Logical Replication - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Proposal: Conflict log history table for Logical Replication
Date
Msg-id CAFiTN-vq50N3QP9p3_SH+tJ8Pn=uRDb0X4qEcQZYcGW9AX88rQ@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Conflict log history table for Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Proposal: Conflict log history table for Logical Replication
List pgsql-hackers
On Tue, Dec 2, 2025 at 11:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 1, 2025 at 10:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > The specific scenario we are discussing is when a single row from the
> > publisher attempts to apply an operation that causes a conflict across
> > multiple unique keys, with each of those unique key violations
> > conflicting with a different local row on the subscriber, is very
> > rare.  IMHO this low-frequency scenario does not justify
> > overcomplicating the design with an array field or a multi-level
> > table.
> >
>
> I did some analysis and search on the internet to answer your
> following two questions.
>
> > Consider the infrequency of the root causes:
> > - How often does a table have more than 3 to 4 unique keys?
>
> It is extremely common—in fact, it is considered the industry "best
> practice" for modern database design.
>
> One can find this pattern in almost every enterprise system (e.g.
> banking apps, CRMs). It relies on distinguishing between Technical
> Identity (for the database) and Business Identity (for the real
> world).
>
> 1. The Design Pattern: Surrogate vs. Natural Keys
> Primary Key (Surrogate Key): Usually a meaningless number (e.g.,
> 10452) or a UUID. It is used strictly for the database to join tables
> efficiently. It never changes.
> Unique Key (Natural Key): A real-world value (e.g., john@email.com or
> SSN-123). This is how humans or external systems identify the row. It
> can change (e.g., someone updates their email).
>
> 2. Common Real-World Use Cases
> A. User Management (The most classic example)
> Primary Key: user_id (Integer). Used for foreign keys in the ORDERS table.
> Unique Key 1: email (Varchar). Prevents two people from registering
> with the same email.
> Unique Key 2: username (Varchar). Ensures unique display names.
> Why? If a user changes their email address, you only update one field
> in one table. If you used email as the Primary Key, you would have to
> update millions of rows in the ORDERS table that reference that email.
>
> B. Inventory / E-Commerce
> Primary Key: product_id (Integer). Used internally by the code.
> Unique Key: SKU (Stock Keeping Unit) or Barcode (EAN/UPC).
> Why? Companies often re-organize their SKU formats. If the SKU was the
> Primary Key, a format change would require a massive database
> migration.
>
> C. Government / HR Systems
> Primary Key: employee_id (Integer).
> Unique Key: National_ID (SSN, Aadhaar, Passport Number).
> Why? Privacy and security. You do not want to expose a National ID in
> every URL or API call (e.g., api/employee/552 is safer than
> api/employee/SSN-123).
>
> > - How frequently would each of these keys conflict with a unique row
> > on the subscriber side?
> >
>
> It can occur with medium-to-high probability in following cases. (a)
> In Bi-Directional replication systems; for example, If two users
> create the same "User Profile" on two different servers at the same
> time, the row will conflict on every unique field (ID, Email, SSN)
> simultaneously. (b) The chances of bloat are high, on retrying to fix
> the error as mentioned by Shveta. Say, if Ops team fixes errors by
> just "trying again" without checking the full row, you will hit the ID
> error, fix it, then immediately hit the Email error. (c) The chances
> are medium during initial data-load; If a user is loading data from a
> legacy system with "dirty" data, rows often violate multiple rules
> (e.g., a duplicate user with both a reused ID and a reused Email).
>
> > If resolving this occasional, synthetic conflict requires inserting
> > two or three rows instead of a single one, this is an acceptable
> > trade-off considering how rare it can occur.
> >
>
> As per above analysis and the re-try point Shveta raises, I don't
> think we can ignore the possibility of data-bloat especially for this
> multiple_unique_key conflict. We can consider logging multiple local
> conflicting rows as JSON Array.

Okay, I will try to make multiple local rows as JSON Array in the next version.

--
Regards,
Dilip Kumar
Google



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Parallel Apply
Next
From: Kirill Reshke
Date:
Subject: Re: Allow GUC settings in CREATE SUBSCRIPTION CONNECTION to take effect