Re: Duplicated LSN in ReorderBuffer - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Duplicated LSN in ReorderBuffer |
Date | |
Msg-id | 20190727011508.io6q6sum4tzvopad@alap3.anarazel.de Whole thread Raw |
In response to | Re: Duplicated LSN in ReorderBuffer (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: Duplicated LSN in ReorderBuffer
Re: Duplicated LSN in ReorderBuffer Re: Duplicated LSN in ReorderBuffer |
List | pgsql-hackers |
Hi, Petr, Simon, see the potential issue related to fast forward at the bottom. On 2019-07-26 18:46:35 -0400, Alvaro Herrera wrote: > On 2019-Jul-09, Masahiko Sawada wrote: > > > I think the cause of this bug would be that a ReorderBufferTXN entry > > of sub transaction is created as top-level transaction. And this > > happens because we skip to decode ASSIGNMENT during the state of > > snapshot builder < SNAPBUILD_FULL. > > That explanation seems to make sense. Yea. The comment that "in the assignment case we'll not decode those xacts" is true, but it misses that we *do* currently process XLOG_HEAP2_NEW_CID records for transactions that started before reaching FULL_SNAPSHOT. Thinking about it, it was not immediately clear to me that it is necessary to process XLOG_HEAP2_NEW_CID at that stage. We only need the cid mapping when decoding content of the transaction that the XLOG_HEAP2_NEW_CID record was about - which will not happen if it started before SNAPBUILD_FULL. Except that they also are responsible for signalling that a transaction performed catalog modifications (cf ReorderBufferXidSetCatalogChanges() call), which in turn is important for SnapBuildCommitTxn() to know whether to include that transaction needs to be included in historic snapshots. So unless I am missing something - which is entirely possible, I've had this code thoroughly swapped out - that means that we only need to process XLOG_HEAP2_NEW_CID < SNAPBUILD_FULL if there can be transactions with relevant catalog changes, that don't have any invalidations messages. After thinking about it for a bit, that's not guaranteed however. For one, even for system catalog tables, looking at CacheInvalidateHeapTuple() et al there can be catalog modifications that create neither a snapshot invalidation message, nor a catcache one. There's also the remote scenario that we possibly might be only modifying a toast relation. But more importantly, the only modified table could be a user defined catalog table (i.e. WITH (user_catalog_table = true)). Which in all likelihood won't cause invalidation messages. So I think currently it is required to process NEW_ID records - although we don't need to actually execute the ReorderBufferAddNewTupleCids() etc calls. Perhaps the right fix for the future would actually be to not rely on on NEW_ID for recognizing transactions as such, but instead have an xact.c marker that signals whether a transaction performed catalog modifications. Hm, need to think more about this. > > Instead, I wonder if we can decode ASSIGNMENT even when the state of > > snapshot builder < SNAPBUILD_FULL_SNAPSHOT. That way, the > > ReorderBufferTXN entries of both top transaction and sub transaction > > are created properly before we decode NEW_CID. > > Yeah, that seems a sensible remediation to me. That does seems like a reasonable approach. I can see two alternatives: 1) Change SnapBuildProcessNewCid()'s ReorderBufferXidSetCatalogChanges() call to reference the toplevel xid. That has the disadvantage that if the subtransaction that performed DDL rolls back, the main transaction will still be treated as a catalog transaction - i have a hard time seeing that being common, however. That'd then also require SnapBuildProcessNewCid() in SNAPBUILD_FULL_SNAPSHOT to return before processing any data assigned to subtransactions. Which would be good, because that's currently unnecessarily stored in memory. 2) We could simply assign the subtransaction to the parent using ReorderBufferAssignChild() in SnapBuildProcessNewCid() or it's caller. That ought to also fix the bug I also has the advantage that we can save some memory in transactions that have some, but fewer than the ASSIGNMENT limit subtransactions, because it allows us to avoid having a separate base snapshot for them (c.f. ReorderBufferTransferSnapToParent()). Like 1) that could be combined with adding an early return when < SNAPBUILD_FULL_SNAPSHOT, after ReorderBufferXidSetCatalogChanges(), but I don't think it'd be required for correctness in contrast to 1). Both of these would have the advantage that we only would track additional information for transactions that have modified the catalog, whereas the proposal to process ASSIGNMENT earlier, would mean that we additionally track all transactions with more than 64 children. So provided that I didn't mis-analyze here, I think both of my alternatives are preferrable? I think 2) is simpler? > /* > - * No point in doing anything yet, data could not be decoded anyway. It's > - * ok not to call ReorderBufferProcessXid() in that case, except in the > - * assignment case there'll not be any later records with the same xid; > - * and in the assignment case we'll not decode those xacts. > + * If the snapshot isn't yet fully built, we cannot decode anything, so > + * bail out. > + * > + * However, it's critical to process XLOG_XACT_ASSIGNMENT records even > + * when the snapshot is being built: it is possible to get later records > + * that require subxids to be properly assigned. > */ I think I would want this comment to be slightly more expansive. It's not exactly obvious why such records would exist, at least to me. I can't quite come up with something much shorter than the above braindump right now however. I'll try to come up with something more concise. Probably worthwhile to add somewhere, even if we go for one of my alternative proposals. This actually made me look at the nearby changes due to commit 9c7d06d60680c7f00d931233873dee81fdb311c6 Author: Simon Riggs <simon@2ndQuadrant.com> Date: 2018-01-17 11:38:34 +0000 Ability to advance replication slots and uhm, I'm not sure they're fully baked. Something like: /* * If we don't have snapshot or we are just fast-forwarding, there is no * point in decoding changes. */ if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT || ctx->fast_forward) return; case XLOG_HEAP2_MULTI_INSERT: if (!ctx->fast_forward && SnapBuildProcessChange(builder, xid, buf->origptr)) DecodeMultiInsert(ctx, buf); break; is not very suggestive of that (note the second check). And related to the point of the theorizing above, I don't think skipping XLOG_HEAP2_NEW_CID entirely when forwarding is correct. As a NEW_CID record does not imply an invalidation message as discussed above, we'll afaict compute wrong snapshots when such transactions are encountered during forwarding. And we'll then log those snapshots to disk. Which then means the slot cannot safely be used for actual decoding anymore - as we'll use that snapshot when starting up decoding without fast forwarding. Greetings, Andres Freund
pgsql-hackers by date: