Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns |
Date | |
Msg-id | CAA4eK1Jg+u=ouXbC5EVLVnQ4sUhrR1P+p0LQSjPFmMP4jNYmuw@mail.gmail.com Whole thread Raw |
In response to | Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
|
List | pgsql-hackers |
On Wed, Jul 6, 2022 at 7:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I'll post a new version patch in the next email with replying to other comments. > Okay, thanks for working on this. Few comments/suggestions on poc_remember_last_running_xacts_v2 patch: 1. +ReorderBufferSetLastRunningXactsCatalogChanges(ReorderBuffer *rb, TransactionId xid, + uint32 xinfo, int subxcnt, + TransactionId *subxacts, XLogRecPtr lsn) +{ ... ... + + test = bsearch(&xid, rb->last_running_xacts, rb->n_last_running_xacts, + sizeof(TransactionId), xidComparator); + + if (test == NULL) + { + for (int i = 0; i < subxcnt; i++) + { + test = bsearch(&subxacts[i], rb->last_running_xacts, rb->n_last_running_xacts, + sizeof(TransactionId), xidComparator); ... Is there ever a possibility that the top transaction id is not in the running_xacts list but one of its subxids is present? If yes, it is not very obvious at least to me so adding a comment here could be useful. If not, then why do we need this additional check for each of the sub-transaction ids? 2. @@ -627,6 +647,15 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf, commit_time = parsed->origin_timestamp; } + /* + * Set the last running xacts as containing catalog change if necessary. + * This must be done before SnapBuildCommitTxn() so that we include catalog + * change transactions to the historic snapshot. + */ + ReorderBufferSetLastRunningXactsCatalogChanges(ctx->reorder, xid, parsed->xinfo, + parsed->nsubxacts, parsed->subxacts, + buf->origptr); + SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid, parsed->nsubxacts, parsed->subxacts); As mentioned previously as well, marking it before SnapBuildCommitTxn has one disadvantage, we sometimes do this work even if the snapshot state is SNAPBUILD_START or SNAPBUILD_BUILDING_SNAPSHOT in which case SnapBuildCommitTxn wouldn't do anything. Can we instead check whether the particular txn has invalidations and is present in the last_running_xacts list along with the check ReorderBufferXidHasCatalogChanges? I think that has the additional advantage that we don't need this additional marking if the xact is already marked as containing catalog changes. 3. 1. + /* + * We rely on HEAP2_NEW_CID records and XACT_INVALIDATIONS to know + * if the transaction has changed the catalog, and that information + * is not serialized to SnapBuilder. Therefore, if the logical + * decoding decodes the commit record of the transaction that actually + * has done catalog changes without these records, we miss to add + * the xid to the snapshot so up creating the wrong snapshot. The part of the sentence "... snapshot so up creating the wrong snapshot." is not clear. In this comment, at one place you have used two spaces after a full stop, and at another place, there is one space. I think let's follow nearby code practice to use a single space before a new sentence. 4. +void +ReorderBufferProcessLastRunningXacts(ReorderBuffer *rb, xl_running_xacts *running) +{ + /* Quick exit if there is no longer last running xacts */ + if (likely(rb->n_last_running_xacts == 0)) + return; + + /* First call, build the last running xact list */ + if (rb->n_last_running_xacts == -1) + { + int nxacts = running->subxcnt + running->xcnt; + Size sz = sizeof(TransactionId) * nxacts;; + + rb->last_running_xacts = MemoryContextAlloc(rb->context, sz); + memcpy(rb->last_running_xacts, running->xids, sz); + qsort(rb->last_running_xacts, nxacts, sizeof(TransactionId), xidComparator); + + rb->n_last_running_xacts = nxacts; + + return; + } a. Can we add the function header comments for this function? b. We seem to be tracking the running_xact information for the first running_xact record after start/restart. The name last_running_xacts doesn't sound appropriate for that, how about initial_running_xacts? 5. + /* + * Purge xids in the last running xacts list if we can do that for at least + * one xid. + */ + if (NormalTransactionIdPrecedes(rb->last_running_xacts[0], + running->oldestRunningXid)) I think it would be a good idea to add a few lines here explaining why it is safe to purge. IIUC, it is because the commit for those xacts would have already been processed and we don't need such a xid anymore. 6. As per the discussion above in this thread having XACT_XINFO_HAS_INVALS in the commit record doesn't indicate that the xact has catalog changes, so can we add somewhere in comments that for such a case we can't distinguish whether the txn has catalog change but we still mark the txn has catalog changes? Can you please share one example for this case? -- With Regards, Amit Kapila.
pgsql-hackers by date: