Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date
Msg-id CAD21AoAn9T6mS8sMF5=tFV66Wru92s933ihRVunLPEU_+0=Xiw@mail.gmail.com
Whole thread Raw
In response to Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
List pgsql-hackers
On Mon, May 23, 2022 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, May 23, 2022 at 10:03 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > At Sat, 21 May 2022 15:35:58 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > > I think if we don't have any better ideas then we should go with
> > > either this or one of the other proposals in this thread. The other
> > > idea that occurred to me is whether we can somehow update the snapshot
> > > we have serialized on disk about this information. On each
> > > running_xact record when we serialize the snapshot, we also try to
> > > purge the committed xacts (via SnapBuildPurgeCommittedTxn). So, during
> > > that we can check if there are committed xacts to be purged and if we
> > > have previously serialized the snapshot for the prior running xact
> > > record, if so, we can update it with the list of xacts that have
> > > catalog changes. If this is feasible then I think we need to somehow
> > > remember the point where we last serialized the snapshot (maybe by
> > > using builder->last_serialized_snapshot). Even, if this is feasible we
> > > may not be able to do this in back-branches because of the disk-format
> > > change required for this.
> > >
> > > Thoughts?

It seems to work, could you draft the patch?

> >
> > I didn't look it closer, but it seems to work. I'm not sure how much
> > spurious invalidations at replication start impacts on performance,
> > but it is promising if the impact is significant.
> >
>
> It seems Sawada-San's patch is doing at each commit not at the start
> of replication and I think that is required because we need this each
> time for replication restart. So, I feel this will be an ongoing
> overhead for spurious cases with the current approach.
>
> >  That being said I'm
> > a bit negative for doing that in post-beta1 stage.
> >
>
> Fair point. We can use the do it early in PG-16 if the approach is
> feasible, and backpatch something on lines of what Sawada-San or you
> proposed.

+1.

I proposed two approaches: [1] and [2,] and I prefer [1].
Horiguchi-san's idea[3] also looks good but I think it's better to
somehow deal with the problem he mentioned:

> One problem with this is that change creates the case where multiple
> ReorderBufferTXNs share the same first_lsn.  I haven't come up with a
> clean idea to avoid relaxing the restriction of AssertTXNLsnOrder..

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoAn-k6OpZ6HSAH_G91tpTXR6KYvkf663kg6EqW-f6sz1w%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAD21AoD00wV4gt-53ze%2BZB8n4bqJrdH8J_UnDHddy8S2A%2Ba25g%40mail.gmail.com
[3] https://www.postgresql.org/message-id/20211008.165055.1621145185927268721.horikyota.ntt%40gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Enforce "max_wal_size/ min_wal_size must be at least twice wal_segment_size" limit while setting GUCs
Next
From: Michael Paquier
Date:
Subject: Re: Patch: Don't set LoadedSSL unless secure_initialize succeeds