Re: Changeset Extraction v7.5 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Changeset Extraction v7.5 |
Date | |
Msg-id | CA+TgmoYTc0gHm_ca=wDWqc8TLY1VQ9vx6jxPSSE6=OifAtW=cQ@mail.gmail.com Whole thread Raw |
In response to | Re: Changeset Extraction v7.5 (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: Changeset Extraction v7.5
(Andres Freund <andres@2ndquadrant.com>)
Re: Changeset Extraction v7.5 (Andres Freund <andres@2ndquadrant.com>) |
List | pgsql-hackers |
On Fri, Feb 7, 2014 at 2:35 PM, Andres Freund <andres@2ndquadrant.com> wrote: > attached you can find the next version of the patchset. As usual, I'm going to be reviewing patch 1. The definition of "patch 1" has changed quite a few times over the past year, but that's usually the one I'm reviewing. + * contents of records in here xexcept turning them into a more usable Typo. + /* + * XXX: There doesn't seem to be a usecase for decoding + * HEAP_NEWPAGE's. Its only used in various indexam's and CLUSTER, + * neither of which should be relevant for the logical + * changestream. + */ There's a level of uncertainty here that doesn't seem consistent with calling this a finished patch. It's also not a complete list of places where log_newpage() is called, but frankly I don't think that should be the aim of this comment. The only relevant question is whether we ever use XLOG_HEAP_NEWPAGE to log heap changes that are relevant to logical replication. I think we don't. + /* FIXME: skip if wrong db? */ It's time to fish or cut bait. + /* + * XXX: As a future feature, we could replay the transaction and + * prepare it as well, allowing for 2PC via logical decoding. + */ Let's try to avoid using XXX (or FIXME) for things that really mean TODO. I think this comment deserves to be expanded a bit, too. Maybe something like: "Right now, logical decoding ignores PREPARE TRANSACTION and simply decodes the subsequent COMMIT TRANSACTION or ROLLBACK TRANSACTION just as it would a regular COMMIT or ROLLBACK. In the future, we might want to change this. Decoding PREPARE might enable future code to prepare each locally prepared transaction on the remote side before doing a COMMIT TRANSACTION locally, allowing for logical synchronous replication." + /* + * If the record wasn't part of a transaction, it will not have + * caused invalidations and thus isn't important when building + * snapshots. If it was part of a transaction, that transaction + * just performed DDL because those are the only codepaths using + * inplace updates. + */ Under what circumstances do we issue in-place updates not associated with a transaction? And under what circumstances do we issue in-place updates that ARE associated with a transaction? + * XXX: At some point we might want to execute the transaction's The XXX again seems needless; the comment is fine as it stands. + /* + * Abort all transactions that we keep track of that are older + * than ->oldestRunningXid. This is the most convenient spot I think writing ->membername is poor commenting style. Just leave out the arrow, or write "the WAL record's oldestRunningXid." +/* + * Get the data from the various forms of commit records and pass it + * on to snapbuild.c and reorderbuffer.c + */ This is a lousy comment. I suggest something like: "Currently, each transaction is decoded only once it commit, so the arrival of a commit record means that we can now decode the changes made by this toplevel transaction and all of its committed subtransactions, unless we have to skip it because the replication system isn't fully initialized yet.Whether decoding the transaction or not, we must takenote of any invalidations it issues, as those will affect the snapshot used for decoding of *other* transactions." +/* + * Get the data from the various forms of abort records and pass it on to + * snapbuild.c and reorderbuffer.c + */ Suggest: "When a transaction abort is detected, we throw away any data we've stashed away for possible future decoding of that transaction. Knowledge of the abort may also help us establish our initial snapshot when logical decoding is first initiated." +/* + * Set the xmin required for decoding snapshots for the specific decoding + * slot. + */ +void +IncreaseLogicalXminForSlot(XLogRecPtr lsn, TransactionId xmin) I'm thinking this and everything that follows, up through LogicalDecodingCountDBSlots, probably should be moved to slot.c. + /* XXX: Add the current LSN? */ +1. + /* shorter lines... */ + slot = MyReplicationSlot; If you're going to do this, which seems like it's probably a good idea, do it at the top of the function and use it all the way through instead of doing it in the middle. + if (MyReplicationSlot == NULL) + elog(ERROR, "need a current slot"); + + if (is_init && start_lsn != InvalidXLogRecPtr) + elog(ERROR, "Cannot INIT_LOGICAL_REPLICATION at a specified LSN"); + + if (is_init && plugin == NULL) + elog(ERROR, "Cannot INIT_LOGICAL_REPLICATION without a specified plugin"); One of these error messages is not like the others. + context = AllocSetContextCreate(CurrentMemoryContext, +"Changeset Extraction Context", +ALLOCSET_DEFAULT_MINSIZE, +ALLOCSET_DEFAULT_INITSIZE, +ALLOCSET_DEFAULT_MAXSIZE); I have my doubts about whether it's wise to make this the child of CurrentMemoryContext. Certainly, if we do that, then expectations around what that context is need to be documented. Short-lived contexts are presumably unsuitable. + * Lets start with enough information if we can, so log a standby + * snapshot and start decoding at exactl that position. Let's. Exactly. + * the xlog records didn't result in anyting relevant for anything. + elog(LOG, "cannot stream from %X/%X, minimum is %X/%X, forwarding", This isn't very clear, and I don't think it follows style guidelines either. + /* register output plugin name with slot */ + strncpy(NameStr(MyReplicationSlot->data.plugin), plugin, + NAMEDATALEN); + NameStr(MyReplicationSlot->data.plugin)[NAMEDATALEN - 1] = '\0'; Hmm. Shouldn't this be delegated to something in slot.c? Why don't we need the lock? Why don't we need to fsync() the change? + /* + * Acquire the current global xmin value and directly set the logical + * xmin before releasing the lock if necessary. We do this so wal + * decoding is guaranteed to have all catalog rows produced by xacts + * with an xid > walsnd->xmin available. + * + * We can't let ReplicationSlotsComputeRequiredXmin() lock the + * procarray as that acquires ProcArrayLock separately which would + * open a short window for the global xmin to advance above our xmin. + */ + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); + slot->effective_catalog_xmin = GetOldestSafeDecodingTransactionId(); + slot->data.catalog_xmin = slot->effective_catalog_xmin; + + ReplicationSlotsComputeRequiredXmin(true); + + LWLockRelease(ProcArrayLock); I don't understand this. + (errmsg("changeset extraction started, extracting changes after %X/%X, reading from %X/%X", Needs some style work. What does "extracting changes after %X/%X" really mean? Also, how about including the slot name in there? + * Performoutput plugin write into tuplestore. Space. + /* + * XXX: maybe we ought to assert ctx->out is in database encoding when + * we're writing textual output. + */ Good idea. + /* + * FIXME: we're going to have to do something more intelligent about + * timelines on standby's. Use readTimeLineHistory() and + * tliOfPointInHistory() to get the proper LSN? + */ So what's the plan for that? + /* + * XXX: It'd be way nicer to be able to use the walsender waiting logic + * here, but that's not available in all environments. + */ I don't understand this. pg_create_decoding_replication_slot() should go in slotfuncs.c. +/* number of changes kept in memory, per transaction */ +const Size max_memtries = 4096; + +/* Size of the slab caches used for frequently allocated objects */ +const Size max_cached_changes = 4096 * 2; +const Size max_cached_tuplebufs = 4096 * 2; /* ~8MB */ +const Size max_cached_transactions = 512; Hmm. Is max_memtries the number of "tries" you keep in "mem"? Or is "tries" an inadvertent abbreviation for "entries" or something? Also, there's no real discussion here of the logic behind these values, or the performance or memory impact of changing them. + * Free an ReorderBufferTXN. Deallocation might be delayed for efficiency + * purposes. Wow, so we have a bespoke dllist for caching these objects, and that's actually material to performance? What is the allocation/deallocation rate here? +/* + * FIXME: better comment and/or name + */ If not now, then when? + ReorderBufferChange *next_change = + dlist_container(ReorderBufferChange, node, next); Formatting. + * ->subxip contains all txids that belong to our transaction which we snap->subxip + * XXX: ->nsubxcnt can be out of date when subtransactions abort, count + * manually. Why is this an XXX? + if (GetTopTransactionIdIfAny() != InvalidTransactionId) + elog(ERROR, "cannot replay using sub, already allocated xid %u", + GetTopTransactionIdIfAny()); Message style. "cannot replay using top", too. Also, what makes this elog() material? If decoding can be performed from a user session this seems likely to be (blech!) user-facing. + /* XXX: we could skip snapshots in non toplevel txns */ TODO. Or fix it. + /* + * don't do a ReorderBufferCleanupTXN here, with the vague idea of + * allowing to retry decoding. + */ It's a bit late in the cycle for such vagueness. + * Rejigger change->newtuple to point to in-memory toast tuples instead to + * on-disk toast tuples that may not longer exist (think DROP TABLE or VACUUM). + * + * We cannot replace unchanged toast tuples though, so those will still point + * to on-disk toast data. Why is that OK? + * location indicated by 'lsn'. Returns true if successfull, false otherwise. Extra "l". My eyes are glazing over, so I'm stopping here for now. Generally, I think there's a lot of good stuff in here, but I think you need to make a serious pass through here and try to get rid of all the things marked XXX and FIXME, either by changing them to say TODO (or nothing), or by deciding that they're OK and removing the comment, or by fixing them. It's fine to have some XXX comments on points where you're hoping for reviewer feedback, but the time to have notes in there for yourself is long past. Also, you need to either go through and make a studious attempt to fix all the typographical errors and message style issues, or you need to find someone else who can help you with that. Preferably not me, because I'd rather focus on things of somewhat greater significance. Generally, I find decode.c to be relatively straightforward, and it's even got a nice long header comment explaining what it does. logical.c, on the other hand, has a one-line header comment. As I mentioned above, I think a lot of the stuff in this file properly belongs elsewhere, so maybe that's not so bad. But it could probably use at least a little more work. To the extent that it's possible, the documentation should be incorporated into the patches that introduce the corresponding facilities rather than being a separate patch all of its own. The meat of the patch seems to be reorderbuffer.c. That file needs more and better explanations of what is being done and why it is being done. For example: +/* + * Abort a transaction that possibly has previous changes. Needs to be done + * independently for toplevel and subtransactions. + */ +void +ReorderBufferAbort(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn) It's already clear from the name of the function and how it's used elsewhere that it gets called when a transaction aborts. Mentioning that the function gets invoked for both toplevel and subtransactions is, surely, worthwhile. But the function has no comments explaining what it's doing in response to that abort, or why it's doing those things, or why it's doing those things in that order rather than some other order. Here is an example of a better comment: +/* + * Setup the base snapshot of a transaction. That is the snapshot that is used + * to decode all changes until either this transaction modifies the catalog or + * another catalog modifying transaction commits. + */ Now, the grammar there might need a tad of work, but there's a lot of useful information in that comment. It would be even better if there were a comment explaining, either here or in the caller, how we decide *when* to call this function. Both reorderbuffer.c and snapbuild.c contain a significant number of functions that are quite short. I can't decide whether this is excellent attention to abstraction boundaries or a sign that the abstraction boundary is too permeable in the first place. This email is a bit down in the trenches; I will try to write another with some higher-level considerations. But I think there is plenty of stuff here for you to get started fixing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: