Re: Fix slot's xmin advancement and subxact's lost snapshots in decoding. - Mailing list pgsql-hackers

From Arseny Sher
Subject Re: Fix slot's xmin advancement and subxact's lost snapshots in decoding.
Date
Msg-id 87k1qlve3d.fsf@ars-thinkpad
Whole thread Raw
In response to Re: Fix slot's xmin advancement and subxact's lost snapshots indecoding.  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Fix slot's xmin advancement and subxact's lost snapshots indecoding.
List pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:

> Firstly -- this is top-notch detective work, kudos and thanks for the
> patch and test cases.  (I verified that both fail before the code fix.)

Thank you!

> Here's a v3.  I applied a lot of makeup in order to try to understand
> what's going on.  I *think* I have a grasp on the original code and your
> bugfix, not terribly firm I admit.

Well, when I started digging it, I have found logical decoding code
somewhat underdocumented. If you or any other committer will consider
the patch trying to improve the comments, I can prepare one.

> * you don't need to Assert that things are not NULL if you're
>   immediately going to dereference them.

Indeed.

> * I think setting snapshot_base to NULL in ReorderBufferCleanupTXN is
>   pointless, since the struct is gonna be freed shortly afterwards.

Yeah, but it is a general style of the original code, e.g. see
/* cosmetic... */
comments. It probably makes the picture a bit more aesthetic and
consistent, kind of final chord, though I don't mind removing it.

> * I rewrote many comments (both existing and some of the ones your patch
>   adds), and added lots of comments where there were none.

Now the code is nicer, thanks.

> * I'm a bit unsure about the comment atop ReorderBufferSetBaseSnapshot.
>   Obviously, the bit within the #if 0/#endif I'm going to remove before
>   push.

It looks like you've started editing that bit and didn't finish.

>   I don't understand why it says "Needs to be called before any
>   changes are added with ReorderBufferQueueChange"; but if you edit that
>   function and add an assert that the base snapshot is set, it crashes
>   pretty quickly in the test_decoding tests.  (The supposedly bogus
>   comment was there before your patch -- I'm not saying your comment
>   addition is faulty.)

That's because REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID changes are
queued whenever we have read that xact has modified the catalog,
regardless of base snapshot existence. Even if there are no changes yet,
we will need it for correct visibility of catalog, so I am inclined to
remove the assert and comment or rephrase the latter with 'any
*data-carrying* changes'.

> * I also noticed that we're doing subtxn cleanup one by one in both
>   ReorderBufferAssignChild and ReorderBufferCommitChild, which means the
>   top-level txn is sought in the hash table over and over, which seems a
>   bit silly.  Not this patch's problem to fix ...

There is already one-entry cache in ReorderBufferTXNByXid. We could add
'don't cache me' flag to it and use it with subxacts, or simply pull
top-level reorderbuffer out of loops.

I'm fine with the rest of your edits. One more little comment:

@@ -543,9 +546,10 @@ ReorderBufferQueueChange(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn,
     ReorderBufferTXN *txn;

     txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
+    Assert(txn->base_snapshot != NULL || txn->is_known_as_subxact);

     change->lsn = lsn;
-    Assert(InvalidXLogRecPtr != lsn);
+    Assert(!XLogRecPtrIsInvalid(lsn));
     dlist_push_tail(&txn->changes, &change->node);
     txn->nentries++;
     txn->nentries_mem++;

Since we do that, probably we should replace all lsn validity checks
with XLogRectPtrIsInvalid?

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Few cosmetic suggestions for commit 16828d5c (Fast Alter TableAdd Column...)
Next
From: Alvaro Herrera
Date:
Subject: Re: Performance regression with PostgreSQL 11 and partitioning