Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns |
Date | |
Msg-id | CAD21AoCEF6FAvRRP6LKzq2-2oVvpdQbmZDwzRAdseA9AA2mE-Q@mail.gmail.com Whole thread Raw |
In response to | Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
|
List | pgsql-hackers |
On Wed, Jul 6, 2022 at 3:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 6, 2022 at 7:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I'll post a new version patch in the next email with replying to other comments. > > > > Okay, thanks for working on this. Few comments/suggestions on > poc_remember_last_running_xacts_v2 patch: > > 1. > +ReorderBufferSetLastRunningXactsCatalogChanges(ReorderBuffer *rb, > TransactionId xid, > + uint32 xinfo, int subxcnt, > + TransactionId *subxacts, XLogRecPtr lsn) > +{ > ... > ... > + > + test = bsearch(&xid, rb->last_running_xacts, rb->n_last_running_xacts, > + sizeof(TransactionId), xidComparator); > + > + if (test == NULL) > + { > + for (int i = 0; i < subxcnt; i++) > + { > + test = bsearch(&subxacts[i], rb->last_running_xacts, rb->n_last_running_xacts, > + sizeof(TransactionId), xidComparator); > ... > > Is there ever a possibility that the top transaction id is not in the > running_xacts list but one of its subxids is present? If yes, it is > not very obvious at least to me so adding a comment here could be > useful. If not, then why do we need this additional check for each of > the sub-transaction ids? I think there is no possibility. The check for subtransactions is not necessary. > > 2. > @@ -627,6 +647,15 @@ DecodeCommit(LogicalDecodingContext *ctx, > XLogRecordBuffer *buf, > commit_time = parsed->origin_timestamp; > } > > + /* > + * Set the last running xacts as containing catalog change if necessary. > + * This must be done before SnapBuildCommitTxn() so that we include catalog > + * change transactions to the historic snapshot. > + */ > + ReorderBufferSetLastRunningXactsCatalogChanges(ctx->reorder, xid, > parsed->xinfo, > + parsed->nsubxacts, parsed->subxacts, > + buf->origptr); > + > SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid, > parsed->nsubxacts, parsed->subxacts); > > As mentioned previously as well, marking it before SnapBuildCommitTxn > has one disadvantage, we sometimes do this work even if the snapshot > state is SNAPBUILD_START or SNAPBUILD_BUILDING_SNAPSHOT in which case > SnapBuildCommitTxn wouldn't do anything. Can we instead check whether > the particular txn has invalidations and is present in the > last_running_xacts list along with the check > ReorderBufferXidHasCatalogChanges? I think that has the additional > advantage that we don't need this additional marking if the xact is > already marked as containing catalog changes. Agreed. > > 3. > 1. > + /* > + * We rely on HEAP2_NEW_CID records and XACT_INVALIDATIONS to know > + * if the transaction has changed the catalog, and that information > + * is not serialized to SnapBuilder. Therefore, if the logical > + * decoding decodes the commit record of the transaction that actually > + * has done catalog changes without these records, we miss to add > + * the xid to the snapshot so up creating the wrong snapshot. > > The part of the sentence "... snapshot so up creating the wrong > snapshot." is not clear. In this comment, at one place you have used > two spaces after a full stop, and at another place, there is one > space. I think let's follow nearby code practice to use a single space > before a new sentence. Agreed. > > 4. > +void > +ReorderBufferProcessLastRunningXacts(ReorderBuffer *rb, > xl_running_xacts *running) > +{ > + /* Quick exit if there is no longer last running xacts */ > + if (likely(rb->n_last_running_xacts == 0)) > + return; > + > + /* First call, build the last running xact list */ > + if (rb->n_last_running_xacts == -1) > + { > + int nxacts = running->subxcnt + running->xcnt; > + Size sz = sizeof(TransactionId) * nxacts;; > + > + rb->last_running_xacts = MemoryContextAlloc(rb->context, sz); > + memcpy(rb->last_running_xacts, running->xids, sz); > + qsort(rb->last_running_xacts, nxacts, sizeof(TransactionId), xidComparator); > + > + rb->n_last_running_xacts = nxacts; > + > + return; > + } > > a. Can we add the function header comments for this function? Updated. > b. We seem to be tracking the running_xact information for the first > running_xact record after start/restart. The name last_running_xacts > doesn't sound appropriate for that, how about initial_running_xacts? Sound good, updated. > > 5. > + /* > + * Purge xids in the last running xacts list if we can do that for at least > + * one xid. > + */ > + if (NormalTransactionIdPrecedes(rb->last_running_xacts[0], > + running->oldestRunningXid)) > > I think it would be a good idea to add a few lines here explaining why > it is safe to purge. IIUC, it is because the commit for those xacts > would have already been processed and we don't need such a xid > anymore. Right, updated. > > 6. As per the discussion above in this thread having > XACT_XINFO_HAS_INVALS in the commit record doesn't indicate that the > xact has catalog changes, so can we add somewhere in comments that for > such a case we can't distinguish whether the txn has catalog change > but we still mark the txn has catalog changes? Agreed. > Can you please share one example for this case? I think it depends on what we did in the transaction but one example I have is that a commit record for ALTER DATABASE has only a snapshot invalidation message: =# alter database postgrse set log_statement to 'all'; ALTER DATABASE $ pg_waldump $PGDATA/pg_wal/000000010000000000000001 | tail -1 rmgr: Transaction len (rec/tot): 66/ 66, tx: 821, lsn: 0/019B50A8, prev 0/019B5070, desc: COMMIT 2022-07-11 21:38:44.036513 JST; inval msgs: snapshot 2964 I've attached an updated patch, please review it. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
pgsql-hackers by date: