Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date
Msg-id CAD21AoC8NLGJ-Q+u60cGPSnxXqexEFPLo-5-offsgCE_ikVxRw@mail.gmail.com
Whole thread Raw
In response to Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
List pgsql-hackers
On Wed, Jul 20, 2022 at 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jul 20, 2022 at 1:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jul 20, 2022 at 2:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Jul 20, 2022 at 9:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > > Another idea would be to have functions, say
> > > > SnapBuildCommitTxnWithXInfo() and SnapBuildCommitTxn_ext(). The latter
> > > > does actual work of handling transaction commits and both
> > > > SnapBuildCommitTxn() and SnapBuildCommit() call
> > > > SnapBuildCommitTxnWithXInfo() with different arguments.
> > > >
> > >
> > > Do you want to say DecodeCommit() instead of SnapBuildCommit() in
> > > above para?
> >
> > I meant that we will call like DecodeCommit() ->
> > SnapBuildCommitTxnWithXInfo() -> SnapBuildCommitTxn_ext(has_invals =
> > true) -> SnapBuildXidHasCatalogChanges(has_invals = true) -> ... If
> > SnapBuildCommitTxn() gets called, it calls SnapBuildCommitTxn_ext()
> > with has_invals = false and behaves the same as before.
> >
>
> Okay, understood. This will work.
>
> > > Yet another idea could be to have another flag
> > > RBTXN_HAS_INVALS which will be set by DecodeCommit for top-level TXN.
> > > Then, we can retrieve it even for each of the subtxn's if and when
> > > required.
> >
> > Do you mean that when checking if the subtransaction has catalog
> > changes, we check if its top-level XID has this new flag?
> >
>
> Yes.
>
> > Why do we
> > need the new flag?
> >
>
> This is required if we don't want to introduce a new set of functions
> as you proposed above. I am not sure which one is better w.r.t back
> patching effort later but it seems to me using flag stuff would make
> future back patches easier if we make any changes in
> SnapBuildCommitTxn.

Understood.

I've implemented this idea as well for discussion. Both patches have
the common change to remember the initial running transactions and to
purge them when decoding xl_running_xacts records. The difference is
how to mark the transactions as needing to be added to the snapshot.

In v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch,
when the transaction is in the initial running xact list and its
commit record has XINFO_HAS_INVAL flag, we mark both the top
transaction and its all subtransactions as containing catalog changes
(which also means to create ReorderBufferTXN entries for them). These
transactions are added to the snapshot in SnapBuildCommitTxn() since
ReorderBufferXidHasCatalogChanges () for them returns true.

In poc_mark_top_txn_has_inval.patch, when the transaction is in the
initial running xacts list and its commit record has XINFO_HAS_INVALS
flag, we set a new flag, say RBTXN_COMMIT_HAS_INVALS, only to the top
transaction. In SnapBuildCommitTxn(), we add all subtransactions to
the snapshot without checking ReorderBufferXidHasCatalogChanges() for
subtransactions if its top transaction has the RBTXN_COMMIT_HAS_INVALS
flag.

A difference between the two ideas is the scope of changes: the former
changes only snapbuild.c but the latter changes both snapbuild.c and
reorderbuffer.c. Moreover, while the former uses the existing flag,
the latter adds a new flag to the reorder buffer for dealing with only
this case. I think the former idea is simpler in terms of that. But,
an advantage of the latter idea is that the latter idea can save to
create ReorderBufferTXN entries for subtransactions.

Overall I prefer the former for now but I'd like to hear what others think.

FWIW, I didn't try the idea of adding wrapper functions since it would
be costly in terms of back patching effort in the future.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Unprivileged user can induce crash by using an SUSET param in PGOPTIONS
Next
From: Michael Paquier
Date:
Subject: Re: Strange failures on chipmunk