Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date
Msg-id CAA4eK1KxnYuhU5AyusDAP1SKtVmwAS1wYJ+VNL155x=swbx+gA@mail.gmail.com
Whole thread Raw
In response to Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
List pgsql-hackers
On Tue, Jul 12, 2022 at 1:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jul 12, 2022 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jul 12, 2022 at 11:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Tue, Jul 12, 2022 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > >
> > > > I'm doing benchmark tests and will share the results.
> > > >
> > >
> > > I've done benchmark tests to measure the overhead introduced by doing
> > > bsearch() every time when decoding a commit record. I've simulated a
> > > very intensified situation where we decode 1M commit records while
> > > keeping builder->catchange.xip array but the overhead is negilible:
> > >
> > > HEAD: 584 ms
> > > Patched: 614 ms
> > >
> > > I've attached the benchmark script I used. With increasing
> > > LOG_SNAPSHOT_INTERVAL_MS to 90000, the last decoding by
> > > pg_logicla_slot_get_changes() decodes 1M commit records while keeping
> > > catalog modifying transactions.
> > >
> >
> > Thanks for the test. We should also see how it performs when (a) we
> > don't change LOG_SNAPSHOT_INTERVAL_MS,
>
> What point do you want to see in this test? I think the performance
> overhead depends on how many times we do bsearch() and how many
> transactions are in the list.
>

Right, I am not expecting any visible performance difference in this
case. This is to ensure that we are not incurring any overhead in the
more usual scenarios (or default cases). As per my understanding, the
purpose of increasing the value of LOG_SNAPSHOT_INTERVAL_MS is to
simulate a stress case for the changes made by the patch, and keeping
its value default will test the more usual scenarios.

> I increased this value to easily
> simulate the situation where we decode many commit records while
> keeping catalog modifying transactions. But even if we don't change
> this value, the result would not change if we don't change how many
> commit records we decode.
>
> > and (b) we have more DDL xacts
> > so that the array to search is somewhat bigger
>
> I've done the same performance tests while creating 64 catalog
> modifying transactions. The result is:
>
> HEAD: 595 ms
> Patched: 628 ms
>
> There was no big overhead.
>

Yeah, especially considering you have simulated a stress case for the patch.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: James Vanns
Date:
Subject: Re: Weird behaviour with binary copy, arrays and column count
Next
From: "shiy.fnst@fujitsu.com"
Date:
Subject: RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns