Re: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers
From | Nikhil Sontakke |
---|---|
Subject | Re: [HACKERS] logical decoding of two-phase transactions |
Date | |
Msg-id | CAMGcDxf0YDPxgG3sU=0k8zTZniEe2RhT90v4BP__3a1P4iHpEQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] logical decoding of two-phase transactions (Nikhil Sontakke <nikhils@2ndquadrant.com>) |
Responses |
Re: [HACKERS] logical decoding of two-phase transactions
Re: [HACKERS] logical decoding of two-phase transactions |
List | pgsql-hackers |
PFA, latest patch for this functionality. This patch contains the following changes as compared to the earlier patch: - Fixed a bunch of typos and comments - Modified HeapTupleSatisfiesVacuum to return HEAPTUPLE_RECENTLY_DEAD if the transaction id is newer than OldestXmin. Doing this only for CATALOG tables (htup->t_tableOid < (Oid) FirstNormalObjectId). - Added a filter callback filter_decode_txn_cb_wrapper() to decide if it's ok to decode the NEXT change record. This filter as of now checks if the XID that is involved got aborted. Additional checks can be added here as needed. - Added ABORT callback in the decoding process. This was not needed before because we always used to decode committed transactions. With 2PC transactions, it possible that while we are decoding it, another backend might issue a concurrent ROLLBACK PREPARED. So when filter_decode_txn_cb_wrapper() gets called, it will tell us to not to decode the next change record. In that case we need to send an ABORT to the subscriber (and not ROLLBACK PREPARED because we are yet to issue PREPARE to the subscriber) - Added all functionality to read the abort command and apply it on the remote subscriber as needed. - Added functionality in ReorderBufferCommit() to abort midways based on the feedback from filter_decode_txn_cb_wrapper() - Modified LockGXact() and FinishPreparedTransaction() to allow missing GID in case of "ROLLBACK PREPARED". Currently, this will only happen in the logical apply code path. We still send it to the subscriber because it's difficult to identify on the provider if this transaction was aborted midways in decoding or if it's in PREPARED state on the subscriber. It will error out as before in all other cases. - Totally removed snapshot addition/deletion code while doing the decoding. That's not needed at all while decoding an ongoing transaction. The entries in the snapshot are needed for future transactions to be able to decode older transactions. For 2PC transactions, we don't need to decode them till COMMIT PREPARED gets called. This has simplified all that unwanted snapshot push/pop code, which is nice. Regards, Nikhils On 30 November 2017 at 16:08, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote: > Hi, > > >>> So perhaps better approach would be to not return >>> HEAPTUPLE_DEAD if the transaction id is newer than the OldestXmin (same >>> logic we use for deleted tuples of committed transactions) in the >>> HeapTupleSatisfiesVacuum() even for aborted transactions. I also briefly >>> checked HOT pruning and AFAICS the normal HOT pruning (the one not >>> called by vacuum) also uses the xmin as authoritative even for aborted >>> txes so nothing needs to be done there probably. >>> >>> In case we are worried that this affects cleanups of for example large >>> aborted COPY transactions and we think it's worth worrying about then we >>> could limit the new OldestXmin based logic only to catalog tuples as >>> those are the only ones we need available in decoding. >> >> >> Yeah, if it's limited to catalog tuples only then that sounds good. I was >> quite concerned about how it'd impact vacuuming otherwise, but if limited to >> catalogs about the only impact should be on workloads that create lots of >> TEMPORARY tables then ROLLBACK - and not much on those. >> > > Based on these discussions, I think there are two separate issues here: > > 1) Make HeapTupleSatisfiesVacuum() to behave differently for recently > aborted catalog tuples. > > 2) Invent a mechanism to stop a specific logical decoding activity in > the middle. The reason to stop it could be a concurrent abort, maybe a > global transaction manager decides to rollback, or any other reason, > for example. > > ISTM, that for 2, if (1) is able to leave the recently abort tuples > around for a little bit while (we only really need them till the > decode of the current change record is ongoing), then we could > accomplish it via a callback. This callback should be called before > commencing decode and network send of each change record. In case of > in-core logical decoding, the callback for pgoutput could check for > the transaction having aborted (a call to TransactionIdDidAbort() or > similar such functions), additional logic can be added as needed for > various scenarios. If it's aborted, we will abandon decoding and send > an ABORT to the subscribers before returning. > > Regards, > Nikhils -- Nikhil Sontakke http://www.2ndQuadrant.com/ PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: