RE: Orphaned records in pg_replication_origin_status after subscription drop - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Orphaned records in pg_replication_origin_status after subscription drop
Date
Msg-id TY4PR01MB16907837DE56FBF456FD4213394B4A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Orphaned records in pg_replication_origin_status after subscription drop  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Orphaned records in pg_replication_origin_status after subscription drop
List pgsql-hackers
On Monday, December 22, 2025 7:01 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Sat, Dec 20, 2025 at 02:55:15PM +0530, Amit Kapila wrote:
> > As of today, I can't think of a case where next time (restart after
> > error) we won't generate the same origin_name but I think this will
> > add the dependency that each time the origin name should be generated
> > the same.
>
> ReplicationOriginNameForLogicalRep() would generate the origin name as
> pg_suboid_relid, so we would always reuse the same origin name on restart
> as long as the subscription is not recreated, wouldn't we?
>
> > Also, moving just repl_origin_create part (leaving other things like
> > origin_advance at its location) would need some explanation in
> > comments which is that it has some dependency on DropSubscription and
> > cleanup. Anyway, if we want to go with creating origin in a separate
> > transaction than COPY, then we should change few comments like: "It is
> > possible that the origin is not yet created for tablesync worker, this
> > can happen for the states before SUBREL_STATE_FINISHEDCOPY." in the
> > code.
>
> Hmm.  So...  Do you think that it should be OK to just create a new transaction
> before we open the transaction that locks
> MyLogicalRepWorker->relid (one that opens a BEGIN READ ONLY on the
> remote side)?  And I guess that we would just move the replorigin_by_name(),
> replorigin_create() and ERROR into this new transaction?  Setting up
> replorigin_session_setup() and replorigin_session_origin could then be
> delayed until we have done the
> replorigin_advance() in BEGIN READ ONLY transaction?  By that I mean to
> leave the replorigin_advance() position untouched.  I have studied this code
> quite a bit.  I "think" that something among these lines would work, but I am
> not 100% sure if things are OK based the relstate we store in each of these
> phases.  If you have an argument that invalidates these lines, please feel free!

I think the solution you outlined works. One nitpick is that, instead of
starting a new transaction, we could create the origin within the same
transaction that updates the DATASYNC states, thereby ensuring the origin
information is available as soon as the catalog is updated. I think the bug
won't happen as long as the origin is created in a transaction separate
from the one setting up the shared memory state.

Additionally, if we relocate the origin creation code, we need to remove the
ERROR report. This is because the origin may already exist if a table sync
restarts due to an ERROR during the initial COPY. This should be safe since the
origin is created with the reserved name "pg_xx," preventing users from creating
another origin with the same name.

I'm attaching a small patch for reference. To verify the fix, I've added a
simple test in 004_sync.pl, which was introduced alongwith ce0fdbfe9722. This
test can reproduce the issue (without the fix) by intentionally causing the
initial COPY to fail.

Best Regards,
Hou zj





Attachment

pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: A few patches to clarify snapshot management, part 2
Next
From: shveta malik
Date:
Subject: Re: Logical Replication of sequences