Thread: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
From
Kyotaro Horiguchi
Date:
At Tue, 19 Jul 2022 17:31:07 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in > On Tue, Jul 19, 2022 at 4:35 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > At Tue, 19 Jul 2022 10:17:15 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > > > Good work. I wonder without comments this may create a problem in the > > > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before > > > freeing the memory any less robust. Also, for consistency, we can use > > > a similar check based on xcnt in the SnapBuildRestore to free the > > > memory in the below code: > > > + /* set catalog modifying transactions */ > > > + if (builder->catchange.xip) > > > + pfree(builder->catchange.xip); > > > > But xip must be positive there. We can add a comment explains that. > > > > Yes, if we add the comment for it, probably we need to explain a gcc's > optimization but it seems to be too much to me. Ah, sorry. I confused with other place in SnapBuildPurgeCommitedTxn. I agree to you, that we don't need additional comment *there*. > > + catchange_xip = ReorderBufferGetCatalogChangesXacts(builder->reorder); > > > > catchange_xip is allocated in the current context, but ondisk is > > allocated in builder->context. I see it kind of inconsistent (even if > > the current context is same with build->context). > > Right. I thought that since the lifetime of catchange_xip is short, > until the end of SnapBuildSerialize() function we didn't need to > allocate it in builder->context. But given ondisk, we need to do that > for catchange_xip as well. Will fix it. Thanks. > > + if (builder->committed.xcnt > 0) > > + { > > > > It seems to me comitted.xip is always non-null, so we don't need this. > > I don't strongly object to do that, though. > > But committed.xcnt could be 0, right? We don't need to copy anything > by calling memcpy with size = 0 in this case. Also, it looks more > consistent with what we do for catchange_xcnt. Mmm. the patch changed that behavior. AllocateSnapshotBuilder always allocate the array with a fixed size. SnapBuildAddCommittedTxn still assumes builder->committed.xip is non-NULL. SnapBuildRestore *kept* ondisk.builder.commited.xip populated with a valid array pointer. But the patch allows committed.xip be NULL, thus in that case, SnapBuildAddCommitedTxn calls repalloc(NULL) which triggers assertion failure. > > + Assert((xcnt > 0) && (xcnt == rb->catchange_ntxns)); > > > > (xcnt > 0) is obvious here (otherwise means dlist_foreach is broken..). > > (xcnt == rb->catchange_ntxns) is not what should be checked here. The > > assert just requires that catchange_txns and catchange_ntxns are > > consistent so it should be checked just after dlist_empty.. I think. > > > > If we want to check if catchange_txns and catchange_ntxns are > consistent, should we check (xcnt == rb->catchange_ntxns) as well, no? > This function requires the caller to use rb->catchange_ntxns as the > length of the returned array. I think this assertion ensures that the > actual length of the array is consistent with the length we > pre-calculated. Sorry again. Please forget the comment about xcnt == rb->catchange_ntxns.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
From
Masahiko Sawada
Date:
On Wed, Jul 20, 2022 at 9:58 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Tue, 19 Jul 2022 17:31:07 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in > > On Tue, Jul 19, 2022 at 4:35 PM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > > > At Tue, 19 Jul 2022 10:17:15 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > > > > Good work. I wonder without comments this may create a problem in the > > > > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before > > > > freeing the memory any less robust. Also, for consistency, we can use > > > > a similar check based on xcnt in the SnapBuildRestore to free the > > > > memory in the below code: > > > > + /* set catalog modifying transactions */ > > > > + if (builder->catchange.xip) > > > > + pfree(builder->catchange.xip); > > > > > > But xip must be positive there. We can add a comment explains that. > > > > > > > Yes, if we add the comment for it, probably we need to explain a gcc's > > optimization but it seems to be too much to me. > > Ah, sorry. I confused with other place in SnapBuildPurgeCommitedTxn. > I agree to you, that we don't need additional comment *there*. > > > > + catchange_xip = ReorderBufferGetCatalogChangesXacts(builder->reorder); > > > > > > catchange_xip is allocated in the current context, but ondisk is > > > allocated in builder->context. I see it kind of inconsistent (even if > > > the current context is same with build->context). > > > > Right. I thought that since the lifetime of catchange_xip is short, > > until the end of SnapBuildSerialize() function we didn't need to > > allocate it in builder->context. But given ondisk, we need to do that > > for catchange_xip as well. Will fix it. > > Thanks. > > > > + if (builder->committed.xcnt > 0) > > > + { > > > > > > It seems to me comitted.xip is always non-null, so we don't need this. > > > I don't strongly object to do that, though. > > > > But committed.xcnt could be 0, right? We don't need to copy anything > > by calling memcpy with size = 0 in this case. Also, it looks more > > consistent with what we do for catchange_xcnt. > > Mmm. the patch changed that behavior. AllocateSnapshotBuilder always > allocate the array with a fixed size. SnapBuildAddCommittedTxn still > assumes builder->committed.xip is non-NULL. SnapBuildRestore *kept* > ondisk.builder.commited.xip populated with a valid array pointer. But > the patch allows committed.xip be NULL, thus in that case, > SnapBuildAddCommitedTxn calls repalloc(NULL) which triggers assertion > failure. IIUC the patch doesn't allow committed.xip to be NULL since we don't overwrite it if builder->committed.xcnt is 0 (i.e., ondisk.builder.committed.xip is NULL): builder->committed.xcnt = ondisk.builder.committed.xcnt; /* We only allocated/stored xcnt, not xcnt_space xids ! */ /* don't overwrite preallocated xip, if we don't have anything here */ if (builder->committed.xcnt > 0) { pfree(builder->committed.xip); builder->committed.xcnt_space = ondisk.builder.committed.xcnt; builder->committed.xip = ondisk.builder.committed.xip; } Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/