Home > mailing lists

Re: Proposal: Conflict log history table for Logical Replication - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Proposal: Conflict log history table for Logical Replication
Date	December 2, 2025 09:08:23
Msg-id	CAA4eK1LbjV0bctib9wUnBpEkC+2rZFPnGuRtrKuc5AtUAzum+A@mail.gmail.com Whole thread Raw
In response to	Re: Proposal: Conflict log history table for Logical Replication (Dilip Kumar <dilipbalaut@gmail.com>)
Responses	Re: Proposal: Conflict log history table for Logical Replication
List	pgsql-hackers

Tree view

On Mon, Dec 1, 2025 at 10:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> The specific scenario we are discussing is when a single row from the
> publisher attempts to apply an operation that causes a conflict across
> multiple unique keys, with each of those unique key violations
> conflicting with a different local row on the subscriber, is very
> rare.  IMHO this low-frequency scenario does not justify
> overcomplicating the design with an array field or a multi-level
> table.
>

I did some analysis and search on the internet to answer your
following two questions.

> Consider the infrequency of the root causes:
> - How often does a table have more than 3 to 4 unique keys?

It is extremely common—in fact, it is considered the industry "best
practice" for modern database design.

One can find this pattern in almost every enterprise system (e.g.
banking apps, CRMs). It relies on distinguishing between Technical
Identity (for the database) and Business Identity (for the real
world).

1. The Design Pattern: Surrogate vs. Natural Keys
Primary Key (Surrogate Key): Usually a meaningless number (e.g.,
10452) or a UUID. It is used strictly for the database to join tables
efficiently. It never changes.
Unique Key (Natural Key): A real-world value (e.g., john@email.com or
SSN-123). This is how humans or external systems identify the row. It
can change (e.g., someone updates their email).

2. Common Real-World Use Cases
A. User Management (The most classic example)
Primary Key: user_id (Integer). Used for foreign keys in the ORDERS table.
Unique Key 1: email (Varchar). Prevents two people from registering
with the same email.
Unique Key 2: username (Varchar). Ensures unique display names.
Why? If a user changes their email address, you only update one field
in one table. If you used email as the Primary Key, you would have to
update millions of rows in the ORDERS table that reference that email.

B. Inventory / E-Commerce
Primary Key: product_id (Integer). Used internally by the code.
Unique Key: SKU (Stock Keeping Unit) or Barcode (EAN/UPC).
Why? Companies often re-organize their SKU formats. If the SKU was the
Primary Key, a format change would require a massive database
migration.

C. Government / HR Systems
Primary Key: employee_id (Integer).
Unique Key: National_ID (SSN, Aadhaar, Passport Number).
Why? Privacy and security. You do not want to expose a National ID in
every URL or API call (e.g., api/employee/552 is safer than
api/employee/SSN-123).

> - How frequently would each of these keys conflict with a unique row
> on the subscriber side?
>

It can occur with medium-to-high probability in following cases. (a)
In Bi-Directional replication systems; for example, If two users
create the same "User Profile" on two different servers at the same
time, the row will conflict on every unique field (ID, Email, SSN)
simultaneously. (b) The chances of bloat are high, on retrying to fix
the error as mentioned by Shveta. Say, if Ops team fixes errors by
just "trying again" without checking the full row, you will hit the ID
error, fix it, then immediately hit the Email error. (c) The chances
are medium during initial data-load; If a user is loading data from a
legacy system with "dirty" data, rows often violate multiple rules
(e.g., a duplicate user with both a reused ID and a reused Email).

> If resolving this occasional, synthetic conflict requires inserting
> two or three rows instead of a single one, this is an acceptable
> trade-off considering how rare it can occur.
>

As per above analysis and the re-try point Shveta raises, I don't
think we can ignore the possibility of data-bloat especially for this
multiple_unique_key conflict. We can consider logging multiple local
conflicting rows as JSON Array.

--
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: Tom Lane
Date: 02 December 2025, 09:05:52
Subject: Re: Inline non-SQL SRFs using SupportRequestSimplify

From: Pavel Stehule
Date: 02 December 2025, 09:14:06
Subject: Re: Inline non-SQL SRFs using SupportRequestSimplify

Re: Proposal: Conflict log history table for Logical Replication - Mailing list pgsql-hackers

Previous

Next