Re: Proposal: Conflict log history table for Logical Replication - Mailing list pgsql-hackers
| From | Dilip Kumar |
|---|---|
| Subject | Re: Proposal: Conflict log history table for Logical Replication |
| Date | |
| Msg-id | CAFiTN-vq50N3QP9p3_SH+tJ8Pn=uRDb0X4qEcQZYcGW9AX88rQ@mail.gmail.com Whole thread Raw |
| In response to | Re: Proposal: Conflict log history table for Logical Replication (Amit Kapila <amit.kapila16@gmail.com>) |
| Responses |
Re: Proposal: Conflict log history table for Logical Replication
|
| List | pgsql-hackers |
On Tue, Dec 2, 2025 at 11:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Dec 1, 2025 at 10:11 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > The specific scenario we are discussing is when a single row from the > > publisher attempts to apply an operation that causes a conflict across > > multiple unique keys, with each of those unique key violations > > conflicting with a different local row on the subscriber, is very > > rare. IMHO this low-frequency scenario does not justify > > overcomplicating the design with an array field or a multi-level > > table. > > > > I did some analysis and search on the internet to answer your > following two questions. > > > Consider the infrequency of the root causes: > > - How often does a table have more than 3 to 4 unique keys? > > It is extremely common—in fact, it is considered the industry "best > practice" for modern database design. > > One can find this pattern in almost every enterprise system (e.g. > banking apps, CRMs). It relies on distinguishing between Technical > Identity (for the database) and Business Identity (for the real > world). > > 1. The Design Pattern: Surrogate vs. Natural Keys > Primary Key (Surrogate Key): Usually a meaningless number (e.g., > 10452) or a UUID. It is used strictly for the database to join tables > efficiently. It never changes. > Unique Key (Natural Key): A real-world value (e.g., john@email.com or > SSN-123). This is how humans or external systems identify the row. It > can change (e.g., someone updates their email). > > 2. Common Real-World Use Cases > A. User Management (The most classic example) > Primary Key: user_id (Integer). Used for foreign keys in the ORDERS table. > Unique Key 1: email (Varchar). Prevents two people from registering > with the same email. > Unique Key 2: username (Varchar). Ensures unique display names. > Why? If a user changes their email address, you only update one field > in one table. If you used email as the Primary Key, you would have to > update millions of rows in the ORDERS table that reference that email. > > B. Inventory / E-Commerce > Primary Key: product_id (Integer). Used internally by the code. > Unique Key: SKU (Stock Keeping Unit) or Barcode (EAN/UPC). > Why? Companies often re-organize their SKU formats. If the SKU was the > Primary Key, a format change would require a massive database > migration. > > C. Government / HR Systems > Primary Key: employee_id (Integer). > Unique Key: National_ID (SSN, Aadhaar, Passport Number). > Why? Privacy and security. You do not want to expose a National ID in > every URL or API call (e.g., api/employee/552 is safer than > api/employee/SSN-123). > > > - How frequently would each of these keys conflict with a unique row > > on the subscriber side? > > > > It can occur with medium-to-high probability in following cases. (a) > In Bi-Directional replication systems; for example, If two users > create the same "User Profile" on two different servers at the same > time, the row will conflict on every unique field (ID, Email, SSN) > simultaneously. (b) The chances of bloat are high, on retrying to fix > the error as mentioned by Shveta. Say, if Ops team fixes errors by > just "trying again" without checking the full row, you will hit the ID > error, fix it, then immediately hit the Email error. (c) The chances > are medium during initial data-load; If a user is loading data from a > legacy system with "dirty" data, rows often violate multiple rules > (e.g., a duplicate user with both a reused ID and a reused Email). > > > If resolving this occasional, synthetic conflict requires inserting > > two or three rows instead of a single one, this is an acceptable > > trade-off considering how rare it can occur. > > > > As per above analysis and the re-try point Shveta raises, I don't > think we can ignore the possibility of data-bloat especially for this > multiple_unique_key conflict. We can consider logging multiple local > conflicting rows as JSON Array. Okay, I will try to make multiple local rows as JSON Array in the next version. -- Regards, Dilip Kumar Google
pgsql-hackers by date: