Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date
Msg-id CAD21AoDtvpeP4gDZvXbAGPbhYAj=1-wbpwQUpgEwmNWua8jPFA@mail.gmail.com
Whole thread Raw
In response to Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, Jul 12, 2022 at 7:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 12, 2022 at 2:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jul 12, 2022 at 5:58 PM shiy.fnst@fujitsu.com
> > <shiy.fnst@fujitsu.com> wrote:
> > >
> > >
> > > It happened when executing the following code because it tried to free a NULL
> > > pointer (catchange_xip).
> > >
> > >         /* be tidy */
> > >         if (ondisk)
> > >                 pfree(ondisk);
> > > +       if (catchange_xip)
> > > +               pfree(catchange_xip);
> > >  }
> > >
> > > It seems to be related to configure option. I could reproduce it when using
> > > `./configure --enable-debug`.
> > > But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og -ggdb"`.
> >
> > Hmm, I could not reproduce this problem even if I use ./configure
> > --enable-debug. And it's weird that we checked if catchange_xip is not
> > null but we did pfree for it:
> >
>
> Yeah, this looks weird to me as well but one difference in running
> tests could be the timing of WAL LOG for XLOG_RUNNING_XACTS. That may
> change the timing of SnapBuildSerialize. The other thing we can try is
> by checking the value of catchange_xcnt before pfree.

Yeah, we can try that.

While reading the code, I realized that we try to pfree both ondisk
and catchange_xip also when we jumped to 'out:':

out:
    ReorderBufferSetRestartPoint(builder->reorder,
                                 builder->last_serialized_snapshot);
    /* be tidy */
    if (ondisk)
        pfree(ondisk);
    if (catchange_xip)
        pfree(catchange_xip);

But we use both ondisk and catchange_xip only if we didn't jump to
'out:'. If this problem is related to compiler optimization with
'goto' statement, moving them before 'out:' might be worth trying.

>
> BTW, I think ReorderBufferGetCatalogChangesXacts should have an Assert
> to ensure rb->catchange_ntxns and xcnt are equal. We can probably then
> avoid having xcnt_p as an out parameter as the caller can use
> rb->catchange_ntxns instead.
>

Agreed.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: Reducing Memory Consumption (aset and generation)
Next
From: David Rowley
Date:
Subject: Re: Some clean-up work in get_cheapest_group_keys_order()