Home > mailing lists
Re: Changeset Extraction v7.6.1 - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Changeset Extraction v7.6.1
Date	February 15, 2014 22:29:10
Msg-id	CA+TgmoYwUTK=qerv4OOsiAyw0d6G=CZ7_0+X6T3ZX36Xxx2u7w@mail.gmail.com Whole thread
In response to	Re: Changeset Extraction v7.6.1 (Andres Freund <andres@2ndquadrant.com>)
Responses	Re: Changeset Extraction v7.6.1 Re: Changeset Extraction v7.6.1 Re: Changeset Extraction v7.6.1
List	pgsql-hackers
Tree view
On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> [ new patches ]

0001 already needs minor

+ * copied stuff from tuptoaster.c. Perhaps there should be toast_internal.h?

Yes, please.  If you can submit a separate patch creating this file
and relocating this stuff there, I will commit it.

+       /*
+        * XXX: It's impolite to ignore our argument and keep decoding until the
+        * current position.
+        */

Eh, what?

+        * We misuse the original meaning of SnapshotData's xip and
subxip fields
+        * to make the more fitting for our needs.
[...]
+        * XXX: Do we want extra fields instead of misusing existing
ones instead?

If we're going to do this, then it surely needs to be documented in
snapshot.h.  On the second question, you're not the first hacker to
want to abuse the meanings of the existing fields; SnapshotDirty
already does it.  It's tempting to think we need a more principled
approach to this, like what we've done with Node i.e. typedef enum ...
SnapshotType; and then a separate struct definition for each kind, all
beginning with SnapshotType type.

+       /*
+        * XXX: Timeline handling/naming. Do we need to include the timeline in
+        * snapshot's name? Outside of very obscure, user triggered, cases every
+        * LSN should correspond to exactly one timeline?
+        */

I can't really comment intelligently on this, so you need to figure it
out.  My off-the-cuff thought is that erring on the side of including
it couldn't hurt anything.

+ * XXX: use hex format for the LSN as well?

Yes, please.

+                               /* prepared abort contain a normal
commit abort... */

contains.

+                               /*
+                                * Abort all transactions that we keep
track of that are older
+                                * than the record's oldestRunningXid.
This is the most
+                                * convenient spot for doing so since,
in contrast to shutdown
+                                * or end-of-recovery checkpoints, we
have sufficient
+                                * knowledge to deal with prepared
transactions here.
+                                */

I have no real quibble with this, but I think the comment could be
clarified slightly to state *what* knowledge we have here that we
wouldn't have there.

+                       /* only crash recovery/replication needs to care */

I believe I know what you're getting at here, but the next guy might
not.  I suggest: "Although these records only exist to serve the needs
of logical decoding, all the work happens as part of crash or archive
recovery, so we don't need to do anything here."

+                        * Treat HOT update as normal updates, there
is no useful

s/, t/. T/

+                        * There are cases in which inplace updates
are used without xids
+                        * assigned (like VACUUM), there are others
(like CREATE INDEX
+                        * CONCURRENTLY) where an xid is present. If
an xid was assigned

In-place updates can be used either by XID-bearing transactions (e.g.
in CREATE INDEX CONCURRENTLY) or by XID-less transactions (e.g.
VACUUM).  In the former case, ...

+                        * redundant because the commit will do that
as well, but one
+                        * we'll support decoding in-progress
relations, this will be

s/one/once/
s/we'll/we/

+                       /* we don't care about row level locks for now */
+               case XLOG_HEAP_LOCK:
+                       break;

The position of the comment isn't consistent with the comments for the
other WAL record type in this section; that is, it's above rather than
below the case.

+        * transaction's contents as the various caches need to always be

I think you should use "since" or "because" rather than "as" here, and
maybe put a comma before it.

+        * the transaction's invalidations. This currently won't be needed if
+        * we're just skipping over the transaction, since that currently only
+        * happens when starting decoding, after we invalidated all caches, but
+        * transactions in other databases might have touched shared relations.

I'm having trouble understanding what this means, especially the part
after the "but".

+ * Read a HeapTuple as WAL logged by heap_insert, heap_update and
+ * heap_delete, but not by heap_multi_insert into a tuplebuf.

"but not by heap_multi_insert" needs punctuation both before and
after.  You can just add a comma after, or change it into a
parenthetical phrase.

As the above comments probably make clear, I'm pretty much happy with decode.c.

+       /* TODO: We got to change that someday soon.. */

Two periods.  Maybe "We need to change this some day soon." - and then
follow that with a paragraph explaining what roughly what would need
to be done.

+       /* shorter lines... */
+       slot = MyReplicationSlot;
+
+       /* first some sanity checks that are unlikely to be violated */
+       if (MyReplicationSlot == NULL)
+               elog(ERROR, "cannot perform logical decoding without a
acquired slot");

Can test slot.

+       /* make sure the passed slot is suitable, these are user
facing errors */

Make sure the passed slot is suitable.  These are user-facing errors.

+       if (IsTransactionState() &&
+               GetTopTransactionIdIfAny() != InvalidTransactionId)
+               ereport(ERROR,
+                               (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
+                                errmsg("cannot perform changeset
extraction in transaction that has performed writes")));

This is sort of an awkward restriction, as it makes it hard to compose
this feature with others.  What underlies the restriction, can we get
rid of it, and if not can we include a comment here explaining why it
exists?

+                        * the xlog records didn't result in anyting
relevant for

anything.

+               /* register output plugin name with slot */
+               strncpy(NameStr(slot->data.plugin), plugin,
+                               NAMEDATALEN);
+               NameStr(slot->data.plugin)[NAMEDATALEN - 1] = '\0';

If it's safe to do this without a lock, I don't know why.

More broadly, I wonder why this is_init stuff is in here at all.
Maybe the stuff that happens in the is_init case should be done in the
caller, or another helper function.

+               /* prevent WAL removal as fast as possible */
+               ReplicationSlotsComputeRequiredLSN();

If there's a race here, can't we rejigger the order of operations to
eliminate it?  Or is that just too hard and not worth it?

+begin_txn_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
+       state.callback_name = "pg_decode_begin_txn";
+       ctx->callbacks.begin_cb(ctx, txn);

I feel that there's a certain lack of naming consistency between these
things.  Can we improve that?  (and similarly for parallel cases)

+pg_create_decoding_replication_slot(PG_FUNCTION_ARGS)

I thought we were going to have physical and logical slots, not
physical and decoding slots.

+       /* make sure we don't end up with an unreleased slot */
+       PG_TRY();
+       {
...
+       PG_CATCH();
+       {
+               ReplicationSlotRelease();
+               ReplicationSlotDrop(NameStr(*name));
+               PG_RE_THROW();
+       }
+       PG_END_TRY();

I don't think this is a very good idea.  The problem with doing things
during error recovery that can themselves fail is that you'll lose the
original error, which is not cool, and maybe even blow out the error
stack.  Many people have confuse PG_TRY()/PG_CATCH() with an
exception-handling system, but it's not.  One way to fix this is to
put some of the initialization logic in ReplicationSlotCreate() just
prior to calling CreateSlotOnDisk().  If the work that needs to be
done is too complex or protracted to be done there, then I think that
it should be pulled out of the act of creating the replication slot
and made to happen as part of first use, or as a separate operation
like PrepareForLogicalDecoding.

+        * When the client has confirmed flushes >= candidate_xmin_after we can

candidate_xmin_after is not otherwise referenced; incomplete identifier rename?

Nothing in patch 1 sets PROC_IN_LOGICAL_DECODING.  Is that right?

+ProcArraySetReplicationSlotXmin(TransactionId xmin, TransactionId catalog_xmin,
+                                                               bool
already_locked)

Maybe Assert(!already_locked || LWLockHeldByMe(ProcArrayLock))

+void
+ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
+
TransactionId *catalog_xmin)
+{
+       LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);

If we need the lock in exclusive mode here, there should be a comment
explaining why.  And regardless, there should probably be some sort of
header comment.

+               /*
+                * Acquire spinlock so other backends are guaranteed
to see this in
+                * time - we cannot generally acquire the lwlock here
since we might
+                * be still holding it in an error path.
+                */
+               SpinLockAcquire(&MyReplicationSlot->mutex);
+               MyReplicationSlot->active = false;
+               SpinLockRelease(&MyReplicationSlot->mutex);
+
+               /* might not have been set when we've been a plain slot */
+               LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+               MyPgXact->vacuumFlags &= ~PROC_IN_LOGICAL_DECODING;
+               LWLockRelease(ProcArrayLock);

OK, a couple of things.  First, the comment for the first chunk says
we can't acquire the LWLock here, and then the second part acquires an
LWLock.  Second, if we're about to dump our PGPROC altogether, why do
we need to update vacuumFlags first?  Third, this isn't actually
called anywhere.

+ * At some point in the future it probaly makes sense to have a more elaborate
+ * resource management here, but it's not entirely clear how that would look
+ * like.

s/how/what/

ReorderBufferGetTXN() should get a comment about the performance
impact of this.  There's a tiny bit there in ReorderBufferReturnTXN()
but it should be better called out.  Should these call the valgrind
macros to make the memory inaccessible while it's being held in cache?

My eyes are starting to glaze over, so more later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
pgsql-hackers by date:
From: Tom Lane
Date: 15 February 2014, 22:26:38
Subject: Re: narwhal and PGDLLIMPORT
From: Andres Freund
Date: 15 February 2014, 22:37:18
Subject: Re: narwhal and PGDLLIMPORT
Re: Changeset Extraction v7.6.1 - Mailing list pgsql-hackers

Previous

Next