RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 - Mailing list pgsql-bugs
| From | Zhijie Hou (Fujitsu) |
|---|---|
| Subject | RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 |
| Date | |
| Msg-id | TYRPR01MB14195A04472A71EB78F35E42B945AA@TYRPR01MB14195.jpnprd01.prod.outlook.com Whole thread Raw |
| In response to | RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
| List | pgsql-bugs |
On Friday, April 3, 2026 3:24 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > On Saturday, January 10, 2026 8:57 AM Masahiko Sawada > <sawada.mshk@gmail.com> wrote: > > > > On Thu, Jan 8, 2026 at 8:46 PM Dilip Kumar <dilipbalaut@gmail.com> > wrote: > > > > > > On Fri, Jan 9, 2026 at 4:17 AM Masahiko Sawada > > <sawada.mshk@gmail.com> wrote: > > > > Can we somehow > > > > share the apply worker's origin with tablesync workers so that > > > > they can refer to the same origin ID? Or can we invent special > > > > origin IDs (e.g., > 0x00FF) that are the same as the normal origin > > > > ID except for being ignored by the conflict detection system? > > > > > > How will this distinguish between the initial sync is done from the > > > publisher node we are getting the update vs the initial sync is done > > > from some other node? Can we always ignore conflict checking for > > > initial synced data or do we just want to ignore if the initial > > > sync is done from the same node? > > > > I imagined the former idea; always ignore conflict checking, so we > > don't need to distinguish them. IOW we treat the changes via the > > initial tablesync as if the changes made by the normal backend process > > (who doesn't use replication origin) while using the replication > > tracking ability of the replication origin. > > I think for changes made by backend process without setting up the origin, the > apply worker still treat that as a conflict change when applying the remote > changes as that's necessary to local vs. remote updates. > > I personally prefer to let the tablesync worker share the apply worker's origin > ID while keeping a separate origin for progress tracking. Currently, the worker > first calls replorigin_session_setup() and then stores the origin ID in > replorigin_xact_state. The natural implementation is for the tablesync worker > to still set up its own origin for tracking, but assign the apply worker's origin ID > to the global state. This gives us per‑tablesync progress tracking while > ensuring that changes from both workers appear to come from the same > origin. > After further analysis, I think the approach I mentioned earlier is unsafe. When replaying the commit record during recovery, if only the main apply origin ID is present, we cannot recover the progress status for each tablesync origin. The idea of using a special origin ID for all tablesync origins suffers from the same problem, e.g., progress cannot be recovered when replaying commit WAL records. I have been trying to find a way to fix this issue within the proposed approaches, but I haven't been able to come up with a better solution for now. One attempt was to continue WAL‑logging the tablesync's own origin ID, but only store the main origin ID in the commit timestamp module. However, this also has a problem during recovery: it cannot identify which main origin corresponds to a given tablesync origin recorded in the commit WAL record. (One might think we could store this top‑level relationship in the catalog, but since catalogs are not accessible during recovery, that approach would not work.) Consequently, we cannot restore the same origin ID in the commit timestamp module during recovery as was present during normal commit. The remaining idea: storing the origin ID in pg_subscription_rel and teaching the apply worker to skip reporting origin_differs if the origin of the update matches the one stored in pg_subscription_rel, seems worth considering, if we cannot find an easier solution. There was a concern about performance, but since we could cache those tablesync origins in a local hash table and consult it during conflict detection, the performance impact might not be significant. That said, I may have missed some points. I will continue to think about this and try to update the patch later. Best Regards, Hou zj
pgsql-bugs by date: