Re: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers

From Arseny Sher
Subject Re: [HACKERS] logical decoding of two-phase transactions
Date
Msg-id 87a7k2996m.fsf@ars-thinkpad
Whole thread Raw
In response to Re: [HACKERS] logical decoding of two-phase transactions  (Nikhil Sontakke <nikhils@2ndquadrant.com>)
Responses Re: [HACKERS] logical decoding of two-phase transactions
List pgsql-hackers
Nikhil Sontakke <nikhils@2ndquadrant.com> writes:

> I'd like to believe that the latest patch set tries to address some
> (if not all) of your concerns. Can you please take a look and let me
> know?

Hi, sure.

General things:

- Earlier I said that there is no point of sending COMMIT PREPARED if
  decoding snapshot became consistent after PREPARE, i.e. PREPARE hadn't
  been sent. I realized since then that such use cases actually exist:
  prepare might be copied to the replica by e.g. basebackup or something
  else earlier. Still, a plugin must be able to easily distinguish these
  too early PREPARES without doing its own bookkeeping (remembering each
  PREPARE it has seen). Fortunately, turns out this we can make it
  easy. If during COMMIT PREPARED / ABORT PREPARED record decoding we
  see that ReorderBufferTXN with such xid exists, it means that either
  1) plugin refused to do replay of this xact at PREPARE or 2) PREPARE
  was too early in the stream. Otherwise xact would be replayed at
  PREPARE processing and rbtxn purged immediately after. I think we
  should add this to the documentation of filter_prepare_cb. Also, to
  this end we need to add an argument to this callback specifying at
  which context it was called: during prepare / commit prepared / abort
  prepared.  Also, for this to work, ReorderBufferProcessXid must be
  always called at PREPARE, not only when 2PC decoding is disabled.

- BTW, ReorderBufferProcessXid at PREPARE should be always called
  anyway, because otherwise if xact is empty, we will not prepare it
  (and call cb), even if the output plugin asked us not to filter it
  out. However, we will call commit_prepared cb, which is inconsistent.

- I find it weird that in DecodePrepare and in DecodeCommit you always
  ask the plugin whether to filter an xact, given that sometimes you
  know beforehand that you are not going to replay it: it might have
  already been replayed, might have wrong dbid, origin, etc. One
  consequence of this: imagine that notorious xact with PREPARE before
  point where snapshot became consistent and COMMIT PREPARED after that
  point. Even if filter_cb says 'I want 2PC on this xact', with current
  code it won't be replayed on PREPARE and rbxid will be destroyed with
  ReorderBufferForget. Now this xact is lost.

- Doing full-blown SnapBuildCommitTxn during PREPARE decoding is wrong,
  because xact effects must not yet be seen to others. I discussed this
  at length and described adjacent problems in [1].

- I still don't like that if 2PC xact was aborted and its replay
  stopped, prepare callback won't be called but abort_prepared would be.
  This either should be documented or fixed.


Second patch:

+    /* filter_prepare is optional, but requires two-phase decoding */
+    if ((ctx->callbacks.filter_prepare_cb != NULL) && (!opt->enable_twophase))
+        ereport(ERROR,
+                (errmsg("Output plugin does not support two-phase decoding, but "
+                        "registered filter_prepared callback.")));

I actually think that enable_twophase output plugin option is
redundant. If plugin author wants 2PC, he just provides
filter_prepare_cb callback and potentially others. I also don't see much
value in checking that exactly 0 or 3 callbacks were registred.


- You allow (commit|abort)_prepared_cb, prepare_cb callbacks to be not
  specified with enabled 2PC and call them without check that they
  actually exist.


-     executed within that transaction.
+     executed within that transaction. A transaction that is prepared for
+     a two-phase commit using <command>PREPARE TRANSACTION</command> will
+     also be decoded if the output plugin callbacks needed for decoding
+     them are provided. It is possible that the current transaction which
+     is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+     command. In that case, the logical decoding of this transaction will
+     be aborted too.

This should say explicitly that such 2PC xact will be decoded at PREPARE
record. Probably also add that otherwise it is decoded at CP
record. Probably also add "and abort_cb callback called" to the last
sentence.


+      The required <function>abort_cb</function> callback is called whenever
+      a transaction abort has to be initiated. This can happen if we are

This callback is not required in the code, and it would be indeed a bad
idea to demand it, breaking compatibility with existing plugins not
caring about 2PC.


+         * Otherwise call either PREPARE (for twophase transactions) or COMMIT
+         * (for regular ones).
+         */
+        if (rbtxn_rollback(txn))
+            rb->abort(rb, txn, commit_lsn);

This is dead code since we don't have decoding of in-progress xacts yet.


+        /*
+         * If there is a valid top-level transaction that's different from the
+         * two-phase one we are aborting, clear its reorder buffer as well.
+         */
+        if (TransactionIdIsNormal(xid) && xid != parsed->twophase_xid)
+            ReorderBufferAbort(ctx->reorder, xid, origin_lsn);

What is the aim of this? How xl_xid xid of commit prepared record can be
normal?


+    /*
+     * The transaction may or may not exist (during restarts for example).
+     * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
+     * it to be created below.
+     */

Code around looks sane, but I think that restarts are irrelevant to
rbtxn existence at this moment: if we are going to COMMIT/ABORT PREPARED
it, it must have been replayed and rbtxn purged immediately after. The
only reason why rbtxn can exist here is invalidation addition
(ReorderBufferAddInvalidations) happening a couple of calls earlier.
Also, instead of misty '2PC transactions do not contain any
reorderbuffers' I would say something like 'create dummy
ReorderBufferTXN to pass it to the callback'.


- filter_prepare_cb callback existence is checked in both decode.c and
  in filter_prepare_cb_wrapper.


Third patch:

+/*
+ * An xid value pointing to a possibly ongoing or a prepared transaction.
+ * Currently used in logical decoding.  It's possible that such transactions
+ * can get aborted while the decoding is ongoing.
+ */

I would explain here that this xid is checked for abort after each
catalog scan, and sent for the details to SetupHistoricSnapshot.


Nitpicking:

First patch: I still don't think that these flags need a bitmask.

Second patch:

- I still think ReorderBufferCommitInternal name is confusing and should
  be renamed to something like ReorderBufferReplay.


     /* Do we know this is a subxact?  Xid of top-level txn if so */
     TransactionId toplevel_xid;
+    /* In case of 2PC we need to pass GID to output plugin */
+    char         *gid;

Better add here newline as between other fields.


+    txn->txn_flags |= RBTXN_PREPARE;
+    txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+    strcpy(txn->gid, gid);

pstrdup?


- ReorderBufferTxnIsPrepared and ReorderBufferPrepareNeedSkip do the
  same and should be merged with comments explaining that the answer
  must be stable.

+      The optional <function>commit_prepared_cb</function> callback is called whenever
+      a commit prepared transaction has been decoded. The <parameter>gid</parameter> field,

a commit prepared transaction *record* has been decoded?


Fourth patch:

Applying: Teach test_decoding plugin to work with 2PC
.git/rebase-apply/patch:347: trailing whitespace.
-- test savepoints
.git/rebase-apply/patch:424: trailing whitespace.
# get XID of the above two-phase transaction
warning: 2 lines add whitespace errors.


[1] https://www.postgresql.org/message-id/87zhxrwgvh.fsf%40ars-thinkpad

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: explain plans with information about (modified) gucs
Next
From: Tom Lane
Date:
Subject: Re: Libpq support to connect to standby server as priority