Re: Proposal: Conflict log history table for Logical Replication - Mailing list pgsql-hackers
| From | shveta malik |
|---|---|
| Subject | Re: Proposal: Conflict log history table for Logical Replication |
| Date | |
| Msg-id | CAJpy0uBeU1dZgaqsSVKc=P=EVUKxRgVuHR8jDXFL-HLibbE-kQ@mail.gmail.com Whole thread Raw |
| In response to | Re: Proposal: Conflict log history table for Logical Replication (Dilip Kumar <dilipbalaut@gmail.com>) |
| List | pgsql-hackers |
On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > Few observations related to publication. > > > ------------------------------ > > Thanks Shveta, for testing and sharing your thoughts. IMHO for > conflict log tables it should be good enough if we restrict it when > ALL TABLE options are used, I don't think we need to put extra effort > to completely restrict it even if users want to explicitly list it > into the publication. > > > > > > > (In the below comments, clt/CLT implies Conflict Log Table) > > > > > > 1) > > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table. > > This function is used while publishing every single change and I don't > think we want to add a cost to check each subscription to identify > whether the table is listed as CLT. > > > > 2) > > > '\d+ clt' shows all-tables publication name. I feel we should not > > > show that for clt. > > I think we should fix this. > > > > 3) > > > I am able to create a publication for clt table, should it be allowed? > > I believe we should not do any specific handling to restrict this but > I am open for the opinions. > > > > create subscription sub1 connection '...' publication pub1 > > > WITH(conflict_log_table='clt'); > > > create publication pub3 for table clt; > > > > > > 4) > > > Is there a reason we have not made '!IsConflictHistoryRelid' check as > > > part of is_publishable_class() itself? If we do so, other code-logics > > > will also get clt as non-publishable always (and will solve a few of > > > the above issues I think). IIUC, there is no place where we want to > > > mark CLT as publishable or is there any? > > IMHO the main reason is performance. > > > > 5) Also, I feel we can add some documentation now to help others to > > > understand/review the patch better without going through the long > > > thread. > > Make sense, I will do that in the next version. > > > > > > > Few observations related to conflict-logging: > > > ------------------------------ > > > 1) > > > I found that for the conflicts which ultimately result in Error, we do > > > not insert any conflict-record in clt. > > > > > > a) > > > Example: insert_exists, update_Exists > > > create table tab1 (i int primary key, j int); > > > sub: insert into tab1 values(30,10); > > > pub: insert into tab1 values(30,10); > > > ERROR: conflict detected on relation "public.tab1": conflict=insert_exists > > > No record in clt. > > > > > > sub: > > > <some pre-data needed> > > > update tab1 set i=40 where i = 30; > > > pub: update tab1 set i=40 where i = 20; > > > ERROR: conflict detected on relation "public.tab1": conflict=update_exists > > > No record in clt. > > Yeah that interesting need to put thought on how to commit this record > when an outer transaction is aborted as we do not have autonomous > transactions which are generally used for this kind of logging. Right > But > we can explore more options like inserting into conflict log tables > outside the outer transaction. Yes, that seems the way to me. I could not find any such existing reference/usage in code though. > > > > b) > > > Another question related to this is, since these conflicts (which > > > results in error) keep on happening until user resolves these or skips > > > these or 'disable_on_error' is set. Then are we going to insert these > > > multiple times? We do count these in 'confl_insert_exists' and > > > 'confl_update_exists' everytime, so it makes sense to log those each > > > time in clt as well. Thoughts? > > I think it make sense to insert every time we see the conflict, but it > would be good to have opinion from others as well. > > > > 2) > > > Conflicts where row on sub is missing, local_ts incorrectly inserted. > > > It is '2000-01-01 05:30:00+05:30'. Should it be Null or something > > > indicating that it is not applicable for this conflict-type? > > > > > > Example: delete_missing, update_missing > > > pub: > > > insert into tab1 values(10,10); > > > insert into tab1 values(20,10); > > > sub: delete from tab1 where i=10; > > > pub: delete from tab1 where i=10; > > Sure I will test this. > > > > > 3) > > We also need to think how we are going to display the info in case of > > multiple_unique_conflicts as there could be multiple local and remote > > tuples conflicting for one single operation. Example: > > > > create table conf_tab (a int primary key, b int unique, c int unique); > > > > sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4); > > > > pub: insert into conf_tab values (2,3,4); > > > > ERROR: conflict detected on relation "public.conf_tab": > > conflict=multiple_unique_conflicts > > DETAIL: Key already exists in unique index "conf_tab_pkey", modified > > locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30. > > Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4). > > Key already exists in unique index "conf_tab_b_key", modified locally > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30. > > Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4). > > Key already exists in unique index "conf_tab_c_key", modified locally > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30. > > Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4). > > CONTEXT: processing remote data for replication origin "pg_16392" > > during message type "INSERT" for replication target relation > > "public.conf_tab" in transaction 781, finished at 0/017FDDA0 > > > > Currently in clt, we have singular terms such as 'key_tuple', > > 'local_tuple', 'remote_tuple'. Shall we have multiple rows inserted? > > But it does not look reasonable to have multiple rows inserted for a > > single conflict raised. I will think more about this. > > Currently I am inserting multiple records in the conflict history > table, the same as each tuple is logged, but couldn't find any better > way for this. Another option is to use an array of tuples instead of a > single tuple but not sure this might make things more complicated to > process by any external tool. It’s arguable and hard to say what the correct behaviour should be. I’m slightly leaning toward having a single row per conflict. IMO, overall the confl_* counters in pg_stat_subscription_stats should align with the number of entries in the conflict history table, which implies one row even for multiple_unique_conflicts. But I also understand that this approach could make things complicated for external tools. For now, we can proceed with logging multiple rows for a single multiple_unique_conflicts occurrence and wait to hear others’ opinions. thanks Shveta
pgsql-hackers by date: