Re: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] logical decoding of two-phase transactions |
Date | |
Msg-id | CAA4eK1J_PfnENwgtTnrQoTcoWPKqDoWYaf12mTKQWtc9D=xAjQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] logical decoding of two-phase transactions (Ajin Cherian <itsajin@gmail.com>) |
Responses |
Re: [HACKERS] logical decoding of two-phase transactions
Re: [HACKERS] logical decoding of two-phase transactions Re: [HACKERS] logical decoding of two-phase transactions Re: [HACKERS] logical decoding of two-phase transactions |
List | pgsql-hackers |
On Tue, Dec 8, 2020 at 2:01 PM Ajin Cherian <itsajin@gmail.com> wrote: > > On Tue, Dec 1, 2020 at 6:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > To skip it, we need to send GID in begin message and then on > > subscriber-side, check if the prepared xact already exists, if so then > > set a flag. The flag needs to be set in begin/start_stream and reset > > in stop_stream/commit/abort. Using the flag, we can skip the entire > > contents of the prepared xact. In ReorderFuffer-side also, we need to > > get and set GID in txn even when we skip it because we need to send > > the same at commit time. In this solution, we won't be able to send it > > during normal start_stream because by that time we won't know GID and > > I think that won't be required. Note that this is only required when > > we skipped sending prepare, otherwise, we just need to send > > Commit-Prepared at commit time. > > I have implemented these changes and tested the fix using the test > setup I had shared above and it seems to be working fine. > I have also tested restarts that simulate duplicate prepares being > sent by the publisher and verified that it is handled correctly by the > subscriber. > This implementation has a flaw in that it has used commit_lsn for the prepare when we are sending prepare just before commit prepared. We can't send the commit LSN with prepare because if the subscriber crashes after prepare then it would update replorigin_session_origin_lsn with that commit_lsn. Now, after the restart, because we will use that LSN to start decoding, the Commit Prepared will get skipped. To fix this, we need to remember the prepare LSN and other information even when we skip prepare and then use it while sending the prepare during commit prepared. Now, after fixing this, I discovered another issue which is that we allow adding a new snapshot to prepared transactions via SnapBuildDistributeNewCatalogSnapshot. We can only allow it to get added to in-progress transactions. If you comment out the changes added in SnapBuildDistributeNewCatalogSnapshot then you will notice one test failure which indicates this problem. This problem was not evident before the bug-fix in the previous paragraph because you were using commit-lsn even for the prepare and newly added snapshot change appears to be before the end_lsn. Some other assorted changes in various patches: v31-0001-Extend-the-output-plugin-API-to-allow-decoding-o 1. I have changed the filter_prepare API to match the signature with FilterByOrigin. I don't see the need for ReorderBufferTxn or xid in the API. 2. I have expanded the documentation of 'Begin Prepare Callback' to explain how a user can use it to detect already prepared transactions and in which scenarios that can happen. 3. Added a few comments in the code atop begin_prepare_cb_wrapper to explain why we are adding this new API. 4. Move the check whether the filter_prepare callback is defined from filter_prepare_cb_wrapper to caller. This is similar to how FilterByOrigin works. 5. Fixed various whitespace and cosmetic issues. 6. Update commit message to include two of the newly added APIs v31-0002-Allow-decoding-at-prepare-time-in-ReorderBuffer 1. Changed the variable names and comments in DecodeXactOp. 2. A new API for FilterPrepare similar to FilterByOrigin and use that instead of ReorderBufferPrepareNeedSkip. 3. In DecodeCommit, we need to update the reorderbuffer about the surviving subtransactions for both ReorderBufferFinishPrepared and ReorderBufferCommit because now both can process the transaction. 4. Because, now we need to remember the prepare info even when we skip it, I have simplified ReorderBufferPrepare API by removing the extra parameters as that information will be now available via ReorderBufferTxn. 5. Updated comments at various places. v31-0006-Support-2PC-txn-pgoutput 1. Added Asserts in streaming APIs on the subscriber-side to ensure that we should never reach there for the already prepared transaction case. We never need to stream the changes when we are skipping prepare either because the snapshot was not consistent by that time or we have already sent those changes before restart. Added the same Assert in Begin and Commit routines because while skipping prepared txn, we must not receive the changes of any other xact. 2. + /* + * Flags are determined from the state of the transaction. We know we + * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if + * it's already marked as committed then it has to be COMMIT PREPARED (and + * likewise for abort / ROLLBACK PREPARED). + */ + if (rbtxn_commit_prepared(txn)) + flags = LOGICALREP_IS_COMMIT_PREPARED; + else if (rbtxn_rollback_prepared(txn)) + flags = LOGICALREP_IS_ROLLBACK_PREPARED; + else + flags = LOGICALREP_IS_PREPARE; I don't like clubbing three different operations under one message LOGICAL_REP_MSG_PREPARE. It looks awkward to use new flags RBTXN_COMMIT_PREPARED and RBTXN_ROLLBACK_PREPARED in ReordeBuffer so that we can recognize these operations in corresponding callbacks. I think setting any flag in ReorderBuffer should not dictate the behavior in callbacks. Then also there are few things that are not common to those APIs like the patch has an Assert to say that the txn is marked with prepare flag for all three operations which I think is not true for Rollback Prepared after the restart. We don't ensure to set the Prepare flag if the Rollback Prepare happens after the restart. Then, we have to introduce separate flags to distinguish prepare/commit prepared/rollback prepared to distinguish multiple operations sent as protocol messages. Also, all these operations are mutually exclusive so it will be better to send separate messages for each of these and I have changed it accordingly in the attached patch. 3. The patch has a duplicate code to send replication origins. I have moved the common code to a separate function. v31-0009-Support-2PC-txn-Subscription-option 1. --- a/src/include/catalog/catversion.h +++ b/src/include/catalog/catversion.h @@ -53,6 +53,6 @@ */ /* yyyymmddN */ -#define CATALOG_VERSION_NO 202011251 +#define CATALOG_VERSION_NO 202011271 No need to change catversion as this gets changed frequently and that leads to conflict in the patch. We can change it either in the final version or normally committers take care of this. If you want to remember it, maybe adding a line for it in the commit message should be okay. For now, I have removed this from the patch. -- With Regards, Amit Kapila.
Attachment
- v31-0001-Extend-the-output-plugin-API-to-allow-decoding-o.patch
- v31-0002-Allow-decoding-at-prepare-time-in-ReorderBuffer.patch
- v31-0003-Support-2PC-txn-tests-for-test_decoding.patch
- v31-0004-Support-2PC-txn-tests-for-concurrent-aborts.patch
- v31-0005-Support-2PC-txn-spoolfile.patch
- v31-0006-Support-2PC-txn-pgoutput.patch
- v31-0007-Support-2PC-txn-subscriber-tests.patch
- v31-0008-Support-2PC-documentation.patch
- v31-0009-Support-2PC-txn-Subscription-option.patch
pgsql-hackers by date: