RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 - Mailing list pgsql-bugs

From Zhijie Hou (Fujitsu)
Subject RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18
Date
Msg-id TYRPR01MB14195A04472A71EB78F35E42B945AA@TYRPR01MB14195.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
List pgsql-bugs
On Friday, April 3, 2026 3:24 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> On Saturday, January 10, 2026 8:57 AM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Jan 8, 2026 at 8:46 PM Dilip Kumar <dilipbalaut@gmail.com>
> wrote:
> > >
> > > On Fri, Jan 9, 2026 at 4:17 AM Masahiko Sawada
> > <sawada.mshk@gmail.com> wrote:
> > > > Can we somehow
> > > > share the apply worker's origin with tablesync workers so that
> > > > they can refer to the same origin ID? Or can we invent special
> > > > origin IDs (e.g., > 0x00FF) that are the same as the normal origin
> > > > ID except for being ignored by the conflict detection system?
> > >
> > > How will this distinguish between the initial sync is done from the
> > > publisher node we are getting the update vs the initial sync is done
> > > from some other node?  Can we always ignore conflict checking for
> > > initial synced data or do we just want to ignore if the  initial
> > > sync is done from the same node?
> >
> > I imagined the former idea; always ignore conflict checking, so we
> > don't need to distinguish them. IOW we treat the changes via the
> > initial tablesync as if the changes made by the normal backend process
> > (who doesn't use replication origin) while using the replication
> > tracking ability of the replication origin.
> 
> I think for changes made by backend process without setting up the origin, the
> apply worker still treat that as a conflict change when applying the remote
> changes as that's necessary to local vs. remote updates.
> 
> I personally prefer to let the tablesync worker share the apply worker's origin
> ID while keeping a separate origin for progress tracking. Currently, the worker
> first calls replorigin_session_setup() and then stores the origin ID in
> replorigin_xact_state. The natural implementation is for the tablesync worker
> to still set up its own origin for tracking, but assign the apply worker's origin ID
> to the global state. This gives us per‑tablesync progress tracking while
> ensuring that changes from both workers appear to come from the same
> origin.
> 

After further analysis, I think the approach I mentioned earlier is unsafe. When
replaying the commit record during recovery, if only the main apply origin ID is
present, we cannot recover the progress status for each tablesync origin. The
idea of using a special origin ID for all tablesync origins suffers from the
same problem, e.g., progress cannot be recovered when replaying commit WAL
records.

I have been trying to find a way to fix this issue within the proposed
approaches, but I haven't been able to come up with a better solution for now.

One attempt was to continue WAL‑logging the tablesync's own origin ID, but only
store the main origin ID in the commit timestamp module. However, this also has
a problem during recovery: it cannot identify which main origin corresponds to a
given tablesync origin recorded in the commit WAL record. (One might think we
could store this top‑level relationship in the catalog, but since catalogs are
not accessible during recovery, that approach would not work.) Consequently, we
cannot restore the same origin ID in the commit timestamp module during recovery
as was present during normal commit.

The remaining idea: storing the origin ID in pg_subscription_rel and teaching
the apply worker to skip reporting origin_differs if the origin of the update
matches the one stored in pg_subscription_rel, seems worth considering, if we
cannot find an easier solution. There was a concern about performance, but since
we could cache those tablesync origins in a local hash table and consult it
during conflict detection, the performance impact might not be significant.

That said, I may have missed some points. I will continue to think about this
and try to update the patch later.

Best Regards,
Hou zj

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #19451: In postgreSQL v13 For 1000 tables how to update from rrn column to GUID. Each table more than 2crore
Next
From: Chris Hofstaedtler
Date:
Subject: Re: BUG #19416: Backend SIGSEGV in ExecShutdownHashJoin/ExecHashTableDetach/dsa_free