Re: Proposal: Conflict log history table for Logical Replication - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Proposal: Conflict log history table for Logical Replication |
Date | |
Msg-id | CAJpy0uB3y5Sd_LJyrhsyo48aP_WqTHC3oc0fz5m3obwGfc8JTQ@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Conflict log history table for Logical Replication (shveta malik <shveta.malik@gmail.com>) |
List | pgsql-hackers |
On Thu, Aug 7, 2025 at 12:25 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Aug 5, 2025 at 5:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > Currently we log conflicts to the server's log file and updates, this > > approach has limitations, 1) Difficult to query and analyze, parsing > > plain text log files for conflict details is inefficient. 2) Lack of > > structured data, key conflict attributes (table, operation, old/new > > data, LSN, etc.) are not readily available in a structured, queryable > > format. 3) Difficult for external monitoring tools or custom > > resolution scripts to consume conflict data directly. > > > > This proposal aims to address these limitations by introducing a > > conflict log history table, providing a structured, and queryable > > record of all logical replication conflicts. This should be a > > configurable option whether to log into the conflict log history > > table, server logs or both. > > > > +1 for the idea. > > > This proposal has two main design questions: > > =================================== > > > > 1. How do we store conflicting tuples from different tables? > > Using a JSON column to store the row data seems like the most flexible > > solution, as it can accommodate different table schemas. > > Yes, that is one option. I have not looked into details myself, but > you can also explore 'anyarray' used in pg_statistics to store 'Column > data values of the appropriate kind'. > > > 2. Should this be a system table or a user table? > > a) System Table: Storing this in a system catalog is simple, but > > catalogs aren't designed for ever-growing data. While pg_large_object > > is an exception, this is not what we generally do IMHO. > > b) User Table: This offers more flexibility. We could allow a user to > > specify the table name during CREATE SUBSCRIPTION. Then we choose to > > either create the table internally or let the user create the table > > with a predefined schema. > > > > A potential drawback is that a user might drop or alter the table. > > However, we could mitigate this risk by simply logging a WARNING if > > the table is configured but an insertion fails. > > I believe it makes more sense for this to be a catalog table rather > than a user table. I wanted to check if we already have a large > catalog table of this kind, and I think pg_statistic could be an > example of a sizable catalog table. To get a rough idea of how size > scales with data, I ran a quick experiment: I created 1000 tables, > each with 2 JSON columns, 1 text column, and 2 integer columns. Then, > I inserted 1000 rows into each table and ran ANALYZE to collect > statistics. Here’s what I observed on a fresh database before and > after: > > Before: > pg_statistic row count: 412 > Table size: ~256 kB > > After: > pg_statistic row count: 6,412 > Table size: ~5.3 MB > > Although it isn’t an exact comparison, this gives us some insight into > how the statistics catalog table size grows with the number of rows. > It doesn’t seem excessively large with 6k rows, given the fact that > pg_statistic itself is a complex table having many 'anyarray'-type > columns. > > That said, irrespective of what we decide, it would be ideal to offer > users an option for automatic purging, perhaps via a retention period > parameter like conflict_stats_retention_period (say default to 30 > days), or a manual purge API such as purge_conflict_stats('older than > date'). I wasn’t able to find any such purge mechanism for PostgreSQL > stats tables, but Oracle does provide such purging options for some of > their statistics tables (not related to conflicts), see [1], [2]. > And to manage it better, it could be range partitioned on timestamp. > It seems BDR also has one such conflict-log table which is a catalog table and is also partitioned on time. It has a default retention period of 30 days. See 'bdr.conflict_history' mentioned under 'catalogs' in [1] [1]: https://www.enterprisedb.com/docs/pgd/latest/reference/tables-views-functions/#user-visible-catalogs-and-views thanks Shveta
pgsql-hackers by date: