Re: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers
From | Nikhil Sontakke |
---|---|
Subject | Re: [HACKERS] logical decoding of two-phase transactions |
Date | |
Msg-id | CAMGcDxeViP+R-OL7QhzUV9eKCVjURobuY1Zijik4Ay_Ddwo4Cg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] logical decoding of two-phase transactions (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [HACKERS] logical decoding of two-phase transactions
|
List | pgsql-hackers |
Hi Andres, >> > First off: This patch has way too many different types of changes as >> > part of one huge commit. This needs to be split into several >> > pieces. First the cleanups (e.g. the fields -> flag changes), then the >> > individual infrastructure pieces (like the twophase.c changes, best >> > split into several pieces as well, the locking stuff), then the main >> > feature, then support for it in the output plugin. Each should have an >> > individual explanation about why the change is necessary and not a bad >> > idea. >> > >> >> Ok, I will break this patch into multiple logical pieces and re-submit. > > Thanks. > Attached are 5 patches split up from the original patch that I had submitted earlier. ReorderBufferTXN_flags_cleanup_1.patch: cleanup of the ReorderBufferTXN bools and addition of some new flags that following patches will need. Logical_lock_unlock_api_2.patch: Streaming changes of uncommitted transactions and of prepared transaction runs the risk of aborts (rollback prepared) happening while we are decoding. It's not a problem for most transactions, but some of the transactions which do catalog changes need to get a consistent view of the metadata so that the decoding does not behave in uncertain ways when such concurrent aborts occur. We came up with the concept of a logical locking/unlocking API to safeguard access to catalog tables. This patch contains the implementation for this functionality. 2PC_gid_wal_and_2PC_origin_tracking_3.patch: We now store the 2PC gid in the commit/abort records. This allows us to send the proper gid to the downstream across restarts. We also want to avoid receiving the prepared transaction AGAIN from the upstream and use replorigin tracking across prepared transactions. reorderbuffer_2PC_logic_4.patch: Add decoding logic to understand PREPARE related wal records and relevant changes in the reorderbuffer logic to deal with 2PC. This includes logic to handle concurrent rollbacks while we are going through the change buffers belonging to a prepared or uncommitted transaction. pgoutput_plugin_support_2PC_5.patch: Logical protocol changes to apply and send changes via the internal pgoutput output plugin. Includes test case and relevant documentation changes. Besides the above, you had feedback around the test_decoding plugin and the use of sleep() etc. I will submit a follow-on patch for the test_decoding plugin stuff soon. More comments inline below. >> >> bool only_local; >> >> + bool twophase_decoding; >> >> + bool twophase_decode_with_catalog_changes; >> >> + int decode_delay; /* seconds to sleep after every change record */ >> > >> > This seems too big a crock to add just for testing. It'll also make the >> > testing timing dependent... >> > >> >> The idea *was* to make testing timing dependent. We wanted to simulate >> the case when a rollback is issued by another backend while the >> decoding is still ongoing. This allows that test case to be tested. > > What I mean is that this will be hell on the buildfarm because the > different animals are differently fast. > Will handle this in the test_decoding plugin patch soon. > >> >> +/* Filter out unnecessary two-phase transactions */ >> >> +static bool >> >> +pg_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, >> >> + TransactionId xid, const char *gid) >> >> +{ >> >> + TestDecodingData *data = ctx->output_plugin_private; >> >> + >> >> + /* treat all transactions as one-phase */ >> >> + if (!data->twophase_decoding) >> >> + return true; >> >> + >> >> + if (txn && txn_has_catalog_changes(txn) && >> >> + !data->twophase_decode_with_catalog_changes) >> >> + return true; >> > >> > What? I'm INCREDIBLY doubtful this is a sane thing to expose to output >> > plugins. As in, unless I hear a very very convincing reason I'm strongly >> > opposed. >> > >> >> These bools are specific to the test_decoding plugin. > Will handle in the test_decoding plugin patch soon. > txn_has_catalog_changes() definitely isn't just exposed to > test_decoding. I think you're making the output plugin interface > massively more complicated in this patch and I think we need to push > back on that. > > >> Again, these are useful in testing decoding in various scenarios with >> twophase decoding enabled/disabled. Testing decoding when catalog >> changes are allowed/disallowed etc. Please take a look at >> "contrib/test_decoding/sql/prepared.sql" for the various scenarios. > > I don't se ehow that addresses my concern in any sort of way. > Will handle in the test_decoding plugin patch soon. > >> >> +/* >> >> + * Check if we should continue to decode this transaction. >> >> + * >> >> + * If it has aborted in the meanwhile, then there's no sense >> >> + * in decoding and sending the rest of the changes, we might >> >> + * as well ask the subscribers to abort immediately. >> >> + * >> >> + * This should be called if we are streaming a transaction >> >> + * before it's committed or if we are decoding a 2PC >> >> + * transaction. Otherwise we always decode committed >> >> + * transactions >> >> + * >> >> + * Additional checks can be added here, as needed >> >> + */ >> >> +static bool >> >> +pg_filter_decode_txn(LogicalDecodingContext *ctx, >> >> + ReorderBufferTXN *txn) >> >> +{ >> >> + /* >> >> + * Due to caching, repeated TransactionIdDidAbort calls >> >> + * shouldn't be that expensive >> >> + */ >> >> + if (txn != NULL && >> >> + TransactionIdIsValid(txn->xid) && >> >> + TransactionIdDidAbort(txn->xid)) >> >> + return true; >> >> + >> >> + /* if txn is NULL, filter it out */ >> > >> > Why can this be NULL? >> > >> >> Depending on parameters passed to the ReorderBufferTXNByXid() >> function, the txn might be NULL in some cases, especially during >> restarts. > > That a) isn't an explanation why that's ok b) reasoning why this ever > needs to be exposed to the output plugin. > Removing this pg_filter_decode_txn() function. You are right, there's no need to expose this function to the output plugin and we can make the decision entirely inside the ReorderBuffer code handling. > >> >> static bool >> >> pg_decode_filter(LogicalDecodingContext *ctx, >> >> RepOriginId origin_id) >> >> @@ -409,8 +622,18 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, >> >> } >> >> data->xact_wrote_changes = true; >> >> >> >> + if (!LogicalLockTransaction(txn)) >> >> + return; >> > >> > It really really can't be right that this is exposed to output plugins. >> > >> >> This was discussed in the other thread >> (http://www.postgresql-archive.org/Logical-Decoding-and-HeapTupleSatisfiesVacuum-assumptions-td5998294i20.html). >> Any catalog access in any plugins need to interlock with concurrent >> aborts. This is only a problem if the transaction is a prepared one or >> yet uncommitted one. Rest of the majority of the cases, this function >> will do nothing at all. > > That doesn't address at all that it's not ok that the output plugin > needs to handle this. Doing this in output plugins, the majority of > which are external projects, means that a) the work needs to be done > many times. b) we can't simply adjust the relevant code in a minor > release, because every output plugin needs to be changed. > How do we know if the external project is going to access catalog data? How do we ensure that the data that they access is safe from concurrent aborts if we are decoding uncommitted or prepared transactions? We are providing a guideline here and recommending them to use these APIs if they need to. >> > >> >> + /* if decode_delay is specified, sleep with above lock held */ >> >> + if (data->decode_delay > 0) >> >> + { >> >> + elog(LOG, "sleeping for %d seconds", data->decode_delay); >> >> + pg_usleep(data->decode_delay * 1000000L); >> >> + } >> > >> > Really not on board. >> > >> >> Again, specific to test_decoding plugin. > > Again, this is not a justification. People look at the code to write > output plugins. Also see my above complaint about this going to be hell > to get right on slow buildfarm members - we're going to crank up the > sleep times to make it robust-ish. > Sure, as mentioned above, will come up with a different way for the test_decoding plugin later. > > >> >> + XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN); >> >> + >> > >> > Can we perhaps merge a bit of the code with the plain commit path on >> > this? >> > >> >> Given that PREPARE ROLLBACK handling is totally separate from the >> regular commit code paths, wouldn't it be a little difficult? > > Why? A helper function doing so ought to be doable. > Can you elaborate on what exactly you mean here? > > >> >> @@ -1386,8 +1449,20 @@ FinishPreparedTransaction(const char *gid, bool isCommit) >> >> /* >> >> * Validate the GID, and lock the GXACT to ensure that two backends do not >> >> * try to commit the same GID at once. >> >> + * >> >> + * During logical decoding, on the apply side, it's possible that a prepared >> >> + * transaction got aborted while decoding. In that case, we stop the >> >> + * decoding and abort the transaction immediately. However the ROLLBACK >> >> + * prepared processing still reaches the subscriber. In that case it's ok >> >> + * to have a missing gid >> >> */ >> >> - gxact = LockGXact(gid, GetUserId()); >> >> + gxact = LockGXact(gid, GetUserId(), missing_ok); >> >> + if (gxact == NULL) >> >> + { >> >> + Assert(missing_ok && !isCommit); >> >> + return; >> >> + } >> > >> > I'm very doubtful it is sane to handle this at such a low level. >> > >> >> FinishPreparedTransaction() is called directly from ProcessUtility. If >> not here, where else could we do this? > > I don't think this is something that ought to be handled at this layer > at all. You should get an error in that case, the replay logic needs to > handle that, not the low level 2pc code. > Removed the above changes. The replay logic now checks if the GID still exists in the abort rollback codepath. If not, it returns immediately. In case of commit rollback replay, the GID has to obviously exist at the downstream. > >> >> @@ -2358,6 +2443,13 @@ PrepareRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn) >> >> Assert(TwoPhaseState->numPrepXacts < max_prepared_xacts); >> >> TwoPhaseState->prepXacts[TwoPhaseState->numPrepXacts++] = gxact; >> >> >> >> + if (origin_id != InvalidRepOriginId) >> >> + { >> >> + /* recover apply progress */ >> >> + replorigin_advance(origin_id, hdr->origin_lsn, end_lsn, >> >> + false /* backward */ , false /* WAL */ ); >> >> + } >> >> + >> > >> > It's unclear to me why this is necessary / a good idea? >> > >> >> Keeping PREPARE handling as close to regular COMMIT handling seems >> like a good idea, no? > > But this code *means* something? Explain to me why it's a good idea to > advance, or don't do it. > We want to do this to use it as protection against receiving the prepared tx again. Other than the above, *) Changed the flags and added "RB" prefix to all flags and macros. *) Added a few fields into existing xl_xact_parsed_commit record and avoided creating an entirely new xl_xact_parsed_prepare record. Regards, Nikhils -- Nikhil Sontakke http://www.2ndQuadrant.com/ PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: