Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Date
Msg-id CAA4eK1JWaY5t+20iteUnPZy0E+nJi4KPqH-wEYyKx7YkcHmEqA@mail.gmail.com
Whole thread Raw
In response to Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
List pgsql-hackers
On Tue, Jul 19, 2022 at 6:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Jul 18, 2022 at 1:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jul 15, 2022 at 8:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > This patch should have the fix for the issue that Shi yu reported. Shi
> > > yu, could you please test it again with this patch?
> > >
> >
> > Can you explain the cause of the failure and your fix for the same?
>
> @@ -1694,6 +1788,8 @@ out:
>     /* be tidy */
>     if (ondisk)
>         pfree(ondisk);
> +   if (catchange_xip)
> +       pfree(catchange_xip);
>
> Regarding the above code in the previous version patch, looking at the
> generated assembler code shared by Shi yu offlist, I realized that the
> “if (catchange_xip)” is removed (folded) by gcc optimization. This is
> because we dereference catchange_xip before null-pointer check as
> follow:
>
> +   /* copy catalog modifying xacts */
> +   sz = sizeof(TransactionId) * catchange_xcnt;
> +   memcpy(ondisk_c, catchange_xip, sz);
> +   COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
> +   ondisk_c += sz;
>
> Since sz is 0 in this case, memcpy doesn’t do anything actually.
>
> By checking the assembler code, I’ve confirmed that gcc does the
> optimization for these code and setting
> -fno-delete-null-pointer-checks flag prevents the if statement from
> being folded. Also, I’ve confirmed that adding the check if
> "catchange.xcnt > 0” before the null-pointer check also can prevent
> that. Adding a check  if "catchange.xcnt > 0” looks more robust. I’ve
> added a similar check for builder->committed.xcnt as well for
> consistency. builder->committed.xip could have no transactions.
>

Good work. I wonder without comments this may create a problem in the
future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
freeing the memory any less robust. Also, for consistency, we can use
a similar check based on xcnt in the SnapBuildRestore to free the
memory in the below code:
+ /* set catalog modifying transactions */
+ if (builder->catchange.xip)
+ pfree(builder->catchange.xip);

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: In-placre persistance change of a relation
Next
From: Michael Paquier
Date:
Subject: Re: Windows now has fdatasync()