Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date
Msg-id CAD21AoAxuMzBAkOYD8ggxwNbobRx=sDPmy_JnYMkKP2sesu6+w@mail.gmail.com
Whole thread Raw
In response to Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
List pgsql-hackers
On Mon, Jul 25, 2022 at 10:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Jul 23, 2022 at 8:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jul 22, 2022 at 11:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Jul 20, 2022 at 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Wed, Jul 20, 2022 at 1:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > >
> > > > This is required if we don't want to introduce a new set of functions
> > > > as you proposed above. I am not sure which one is better w.r.t back
> > > > patching effort later but it seems to me using flag stuff would make
> > > > future back patches easier if we make any changes in
> > > > SnapBuildCommitTxn.
> > >
> > > Understood.
> > >
> > > I've implemented this idea as well for discussion. Both patches have
> > > the common change to remember the initial running transactions and to
> > > purge them when decoding xl_running_xacts records. The difference is
> > > how to mark the transactions as needing to be added to the snapshot.
> > >
> > > In v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch,
> > > when the transaction is in the initial running xact list and its
> > > commit record has XINFO_HAS_INVAL flag, we mark both the top
> > > transaction and its all subtransactions as containing catalog changes
> > > (which also means to create ReorderBufferTXN entries for them). These
> > > transactions are added to the snapshot in SnapBuildCommitTxn() since
> > > ReorderBufferXidHasCatalogChanges () for them returns true.
> > >
> > > In poc_mark_top_txn_has_inval.patch, when the transaction is in the
> > > initial running xacts list and its commit record has XINFO_HAS_INVALS
> > > flag, we set a new flag, say RBTXN_COMMIT_HAS_INVALS, only to the top
> > > transaction.
> > >
> >
> > It seems that the patch has missed the part to check if the xid is in
> > the initial running xacts list?
>
> Oops, right.
>
> >
> > > In SnapBuildCommitTxn(), we add all subtransactions to
> > > the snapshot without checking ReorderBufferXidHasCatalogChanges() for
> > > subtransactions if its top transaction has the RBTXN_COMMIT_HAS_INVALS
> > > flag.
> > >
> > > A difference between the two ideas is the scope of changes: the former
> > > changes only snapbuild.c but the latter changes both snapbuild.c and
> > > reorderbuffer.c. Moreover, while the former uses the existing flag,
> > > the latter adds a new flag to the reorder buffer for dealing with only
> > > this case. I think the former idea is simpler in terms of that. But,
> > > an advantage of the latter idea is that the latter idea can save to
> > > create ReorderBufferTXN entries for subtransactions.
> > >
> > > Overall I prefer the former for now but I'd like to hear what others think.
> > >
> >
> > I agree that the latter idea can have better performance in extremely
> > special scenarios but introducing a new flag for the same sounds a bit
> > ugly to me. So, I would also prefer to go with the former idea,
> > however, I would also like to hear what Horiguchi-San and others have
> > to say.
>
> Agreed.
>
> >
> > Few comments on v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during:
> > 1.
> > +void
> > +SnapBuildInitialXactSetCatalogChanges(SnapBuild *builder, TransactionId xid,
> > +   int subxcnt, TransactionId *subxacts,
> > +   XLogRecPtr lsn)
> > +{
> >
> > I think it is better to name this function as
> > SnapBuildXIDSetCatalogChanges as we use this to mark a particular
> > transaction as having catalog changes.
> >
> > 2. Changed/added a few comments in the attached.
>
> Thank you for the comments.
>
> I've attached updated version patches for the master and back branches.

I've attached the patch for REl15 that I forgot.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Refactor to make use of a common function for GetSubscriptionRelations and GetSubscriptionNotReadyRelations.
Next
From: Heikki Linnakangas
Date:
Subject: Re: ReadRecentBuffer() is broken for local buffer