Re: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] logical decoding of two-phase transactions
Date
Msg-id CAA4eK1J_PfnENwgtTnrQoTcoWPKqDoWYaf12mTKQWtc9D=xAjQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] logical decoding of two-phase transactions  (Ajin Cherian <itsajin@gmail.com>)
Responses Re: [HACKERS] logical decoding of two-phase transactions
Re: [HACKERS] logical decoding of two-phase transactions
Re: [HACKERS] logical decoding of two-phase transactions
Re: [HACKERS] logical decoding of two-phase transactions
List pgsql-hackers
On Tue, Dec 8, 2020 at 2:01 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> On Tue, Dec 1, 2020 at 6:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > To skip it, we need to send GID in begin message and then on
> > subscriber-side, check if the prepared xact already exists, if so then
> > set a flag. The flag needs to be set in begin/start_stream and reset
> > in stop_stream/commit/abort. Using the flag, we can skip the entire
> > contents of the prepared xact. In ReorderFuffer-side also, we need to
> > get and set GID in txn even when we skip it because we need to send
> > the same at commit time. In this solution, we won't be able to send it
> > during normal start_stream because by that time we won't know GID and
> > I think that won't be required. Note that this is only required when
> > we skipped sending prepare, otherwise, we just need to send
> > Commit-Prepared at commit time.
>
> I have implemented these changes and tested the fix using the test
> setup I had shared above and it seems to be working fine.
> I have also tested restarts that simulate duplicate prepares being
> sent by the publisher and verified that it is handled correctly by the
> subscriber.
>

This implementation has a flaw in that it has used commit_lsn for the
prepare when we are sending prepare just before commit prepared. We
can't send the commit LSN with prepare because if the subscriber
crashes after prepare then it would update
replorigin_session_origin_lsn with that commit_lsn. Now, after the
restart, because we will use that LSN to start decoding, the Commit
Prepared will get skipped. To fix this, we need to remember the
prepare LSN and other information even when we skip prepare and then
use it while sending the prepare during commit prepared.

Now, after fixing this, I discovered another issue which is that we
allow adding a new snapshot to prepared transactions via
SnapBuildDistributeNewCatalogSnapshot. We can only allow it to get
added to in-progress transactions. If you comment out the changes
added in SnapBuildDistributeNewCatalogSnapshot then you will notice
one test failure which indicates this problem. This problem was not
evident before the bug-fix in the previous paragraph because you were
using commit-lsn even for the prepare and newly added snapshot change
appears to be before the end_lsn.

Some other assorted changes in various patches:
v31-0001-Extend-the-output-plugin-API-to-allow-decoding-o
1. I have changed the filter_prepare API to match the signature with
FilterByOrigin. I don't see the need for ReorderBufferTxn or xid in
the API.
2. I have expanded the documentation of 'Begin Prepare Callback' to
explain how a user can use it to detect already prepared transactions
and in which scenarios that can happen.
3. Added a few comments in the code atop begin_prepare_cb_wrapper to
explain why we are adding this new API.
4. Move the check whether the filter_prepare callback is defined from
filter_prepare_cb_wrapper to caller. This is similar to how
FilterByOrigin works.
5. Fixed various whitespace and cosmetic issues.
6. Update commit message to include two of the newly added APIs

v31-0002-Allow-decoding-at-prepare-time-in-ReorderBuffer
1. Changed the variable names and comments in DecodeXactOp.
2. A new API for FilterPrepare similar to FilterByOrigin and use that
instead of ReorderBufferPrepareNeedSkip.
3. In DecodeCommit, we need to update the reorderbuffer about the
surviving subtransactions for both ReorderBufferFinishPrepared and
ReorderBufferCommit because now both can process the transaction.
4. Because, now we need to remember the prepare info even when we skip
it, I have simplified ReorderBufferPrepare API by removing the extra
parameters as that information will be now available via
ReorderBufferTxn.
5. Updated comments at various places.

v31-0006-Support-2PC-txn-pgoutput
1. Added Asserts in streaming APIs on the subscriber-side to ensure
that we should never reach there for the already prepared transaction
case. We never need to stream the changes when we are skipping prepare
either because the snapshot was not consistent by that time or we have
already sent those changes before restart. Added the same Assert in
Begin and Commit routines because while skipping prepared txn, we must
not receive the changes of any other xact.
2.
+ /*
+ * Flags are determined from the state of the transaction. We know we
+ * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
+ * it's already marked as committed then it has to be COMMIT PREPARED (and
+ * likewise for abort / ROLLBACK PREPARED).
+ */
+ if (rbtxn_commit_prepared(txn))
+ flags = LOGICALREP_IS_COMMIT_PREPARED;
+ else if (rbtxn_rollback_prepared(txn))
+ flags = LOGICALREP_IS_ROLLBACK_PREPARED;
+ else
+ flags = LOGICALREP_IS_PREPARE;

I don't like clubbing three different operations under one message
LOGICAL_REP_MSG_PREPARE. It looks awkward to use new flags
RBTXN_COMMIT_PREPARED and RBTXN_ROLLBACK_PREPARED in ReordeBuffer so
that we can recognize these operations in corresponding callbacks. I
think setting any flag in ReorderBuffer should not dictate the
behavior in callbacks. Then also there are few things that are not
common to those APIs like the patch has an Assert to say that the txn
is marked with prepare flag for all three operations which I think is
not true for Rollback Prepared after the restart. We don't ensure to
set the Prepare flag if the Rollback Prepare happens after the
restart. Then, we have to introduce separate flags to distinguish
prepare/commit prepared/rollback prepared to distinguish multiple
operations sent as protocol messages. Also, all these operations are
mutually exclusive so it will be better to send separate messages for
each of these and I have changed it accordingly in the attached patch.

3. The patch has a duplicate code to send replication origins. I have
moved the common code to a separate function.

v31-0009-Support-2PC-txn-Subscription-option
1.
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */

 /* yyyymmddN */
-#define CATALOG_VERSION_NO 202011251
+#define CATALOG_VERSION_NO 202011271

No need to change catversion as this gets changed frequently and that
leads to conflict in the patch. We can change it either in the final
version or normally committers take care of this. If you want to
remember it, maybe adding a line for it in the commit message should
be okay. For now, I have removed this from the patch.


-- 
With Regards,
Amit Kapila.

Attachment

pgsql-hackers by date:

Previous
From: Seino Yuki
Date:
Subject: Re: Feature improvement for pg_stat_statements
Next
From: Benoit Lobréau
Date:
Subject: Re: pg_shmem_allocations & documentation