Re: Proposal: Conflict log history table for Logical Replication - Mailing list pgsql-hackers
| From | Amit Kapila |
|---|---|
| Subject | Re: Proposal: Conflict log history table for Logical Replication |
| Date | |
| Msg-id | CAA4eK1LbjV0bctib9wUnBpEkC+2rZFPnGuRtrKuc5AtUAzum+A@mail.gmail.com Whole thread Raw |
| In response to | Re: Proposal: Conflict log history table for Logical Replication (Dilip Kumar <dilipbalaut@gmail.com>) |
| Responses |
Re: Proposal: Conflict log history table for Logical Replication
|
| List | pgsql-hackers |
On Mon, Dec 1, 2025 at 10:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > The specific scenario we are discussing is when a single row from the > publisher attempts to apply an operation that causes a conflict across > multiple unique keys, with each of those unique key violations > conflicting with a different local row on the subscriber, is very > rare. IMHO this low-frequency scenario does not justify > overcomplicating the design with an array field or a multi-level > table. > I did some analysis and search on the internet to answer your following two questions. > Consider the infrequency of the root causes: > - How often does a table have more than 3 to 4 unique keys? It is extremely common—in fact, it is considered the industry "best practice" for modern database design. One can find this pattern in almost every enterprise system (e.g. banking apps, CRMs). It relies on distinguishing between Technical Identity (for the database) and Business Identity (for the real world). 1. The Design Pattern: Surrogate vs. Natural Keys Primary Key (Surrogate Key): Usually a meaningless number (e.g., 10452) or a UUID. It is used strictly for the database to join tables efficiently. It never changes. Unique Key (Natural Key): A real-world value (e.g., john@email.com or SSN-123). This is how humans or external systems identify the row. It can change (e.g., someone updates their email). 2. Common Real-World Use Cases A. User Management (The most classic example) Primary Key: user_id (Integer). Used for foreign keys in the ORDERS table. Unique Key 1: email (Varchar). Prevents two people from registering with the same email. Unique Key 2: username (Varchar). Ensures unique display names. Why? If a user changes their email address, you only update one field in one table. If you used email as the Primary Key, you would have to update millions of rows in the ORDERS table that reference that email. B. Inventory / E-Commerce Primary Key: product_id (Integer). Used internally by the code. Unique Key: SKU (Stock Keeping Unit) or Barcode (EAN/UPC). Why? Companies often re-organize their SKU formats. If the SKU was the Primary Key, a format change would require a massive database migration. C. Government / HR Systems Primary Key: employee_id (Integer). Unique Key: National_ID (SSN, Aadhaar, Passport Number). Why? Privacy and security. You do not want to expose a National ID in every URL or API call (e.g., api/employee/552 is safer than api/employee/SSN-123). > - How frequently would each of these keys conflict with a unique row > on the subscriber side? > It can occur with medium-to-high probability in following cases. (a) In Bi-Directional replication systems; for example, If two users create the same "User Profile" on two different servers at the same time, the row will conflict on every unique field (ID, Email, SSN) simultaneously. (b) The chances of bloat are high, on retrying to fix the error as mentioned by Shveta. Say, if Ops team fixes errors by just "trying again" without checking the full row, you will hit the ID error, fix it, then immediately hit the Email error. (c) The chances are medium during initial data-load; If a user is loading data from a legacy system with "dirty" data, rows often violate multiple rules (e.g., a duplicate user with both a reused ID and a reused Email). > If resolving this occasional, synthetic conflict requires inserting > two or three rows instead of a single one, this is an acceptable > trade-off considering how rare it can occur. > As per above analysis and the re-try point Shveta raises, I don't think we can ignore the possibility of data-bloat especially for this multiple_unique_key conflict. We can consider logging multiple local conflicting rows as JSON Array. -- With Regards, Amit Kapila.
pgsql-hackers by date: