Re: Undo worker and transaction rollback - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Undo worker and transaction rollback |
Date | |
Msg-id | CAFiTN-vZan-=ZtbbBaxEMDt6aj9Z79kjRe_GKovRgBEHXM+6ow@mail.gmail.com Whole thread Raw |
In response to | Re: Undo worker and transaction rollback (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Wed, Jan 23, 2019 at 9:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jan 11, 2019 at 9:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Mon, Dec 31, 2018 at 11:04 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > >> > >> On Tue, Dec 4, 2018 at 3:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > >> > >> After the latest change in undo interface patch[1], this patch need to > >> be rebased. Attaching the rebased version of the patch. > >> > >> [1]https://www.postgresql.org/message-id/CAFiTN-tfquvm6tnWFaJNYYz-vSY%3D%2BSQTMv%2BFv1jPozTrW4mtKw%40mail.gmail.com > >> > > > > I have rebased this patch on top of latest version of undolog and undo-interface patch set [1] > > > > I have started reviewing your patch and below are some initial > comments. To be clear to everyone, there is some part of this patch > for which I have also written code and I am also involved in the > high-level design of this work, but I have never done the detailed > review of this patch which I am planning to do now. I think it will > be really good if some other hackers also review this patch especially > the interaction with transaction machinery and how the error rollbacks > work. I have few comments below in that regard as well and I have > requested to add more comments to explain that part of the patch so > that it will be easier to understand how this works. Thanks for review please find my reply inline. > > Few comments: > ------------------------ > 1. > +void > +undoaction_desc(StringInfo buf, XLogReaderState *record) > +{ > + char *rec = XLogRecGetData(record); > + uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; > + > + if (info == XLOG_UNDO_PAGE) > + { > + uint8 *flags = (uint8 *) rec; > + > + appendStringInfo(buf, "page_contains_tpd_slot: %c ", > + (*flags & XLU_PAGE_CONTAINS_TPD_SLOT) ? 'T' : 'F'); > + appendStringInfo(buf, "is_page_initialized: %c ", > + (*flags & XLU_INIT_PAGE) ? 'T' : 'F'); > + if (*flags & XLU_PAGE_CONTAINS_TPD_SLOT) > > Do we need handling of TPD in this patch? This is mainly tied to > zheap, so, I think we can exclude it from this patch. Fixed > > 2. > const char * > +undoaction_identify(uint8 info) > +{ > + const char *id = NULL; > + > + switch (info & ~XLR_INFO_MASK) > + { > + case XLOG_UNDO_APPLY_PROGRESS: > + id = "UNDO APPLY PROGRESS"; > + break; > + } > + > + return id; > +} > > Don't we need to handle the case of XLOG_UNDO_PAGE in this function? Fixed > > 3. > @@ -1489,6 +1504,34 @@ FinishPreparedTransaction(const char *gid, bool isCommit) > { > .. > + /* > + * Perform undo actions, if there are undologs for this transaction. > + * We need to perform undo actions while we are still in transaction. > + * Never push rollbacks of temp tables to undo worker. > + */ > + for (i = 0; i < UndoPersistenceLevels; i++) > + { > + if (end_urec_ptr[i] != InvalidUndoRecPtr && !isCommit) > + { > + bool result = false; > + uint64 rollback_size = 0; > + > + if (i != UNDO_TEMP) > + rollback_size = end_urec_ptr[i] - start_urec_ptr[i]; > + > + if (rollback_size >= rollback_overflow_size * 1024 * 1024) > + result = PushRollbackReq(end_urec_ptr[i], start_urec_ptr[i], InvalidOid); > > The comment and code don't seem to be completely in sync. It seems > for rollback_overflow_size as 0, we will push the undo for temp tables > as well. I think you should have a stronger check here. Yeah, it's a problem. I have fixed it. > > 4. > + /* > + * We need the locations of start and end undo record pointers when rollbacks > + * are to be performed for prepared transactions using zheap relations. > + */ > + UndoRecPtr start_urec_ptr[UndoPersistenceLevels]; > + UndoRecPtr end_urec_ptr[UndoPersistenceLevels]; > } TwoPhaseFileHeader; > > > I think you can expand this comment a bit by telling the reason why > you need to store these in the file. It seems it is because you want > to rollback prepared transactions after recovery. Done > > 5. > @@ -284,10 +284,21 @@ SetTransactionIdLimit(TransactionId > oldest_datfrozenxid, Oid oldest_datoid) > TransactionId xidStopLimit; > TransactionId xidWrapLimit; > TransactionId curXid; > + TransactionId oldestXidHavingUndo; > > Assert(TransactionIdIsNormal(oldest_datfrozenxid)); > > /* > + * To determine the last safe xid that can be allocated, we need to > + * consider oldestXidHavingUndo. The oldestXidHavingUndo will be only > + * valid for zheap storage engine, so it won't impact any other storage > + * engine. > + */ > + oldestXidHavingUndo = pg_atomic_read_u32(&ProcGlobal->oldestXidHavingUndo); > + if (TransactionIdIsValid(oldestXidHavingUndo)) > + oldest_datfrozenxid = Min(oldest_datfrozenxid, oldestXidHavingUndo); > + > > Here, I think we need to mention the reason as well in the comments > why we need to care about oldestXidHavingUndo. Done > > 6. > +/* Location in undo log from where to start applying the undo actions. */ > +static UndoRecPtr UndoActionStartPtr[UndoPersistenceLevels] = > + > {InvalidUndoRecPtr, > + > InvalidUndoRecPtr, > + > InvalidUndoRecPtr}; > + > +/* Location in undo log up to which undo actions need to be applied. */ > +static UndoRecPtr UndoActionEndPtr[UndoPersistenceLevels] = > + > {InvalidUndoRecPtr, > + > InvalidUndoRecPtr, > + > InvalidUndoRecPtr}; > + > +/* Do we need to perform any undo actions? */ > +static bool PerformUndoActions = false; > > Normally, we track to and from undo locations for each transaction > state. I think these are used for error rollbacks which have > different handling. If so, can we write a few comments here to > explain how that works and then it will be easier to understand the > purpose of these variables? Done > > 7. > +static void > +AtAbort_Rollback(void) > { > .. > + /* > + * If we are in a valid transaction state then execute the undo action here > + * itself, otherwise we have already stored the required information for > + * executing the undo action later. > + */ > + if (CurrentTransactionState->state == TRANS_INPROGRESS) > .. > } > > As such this code is okay, but it is better to use IsTransactionState > as that is what we commonly used throughout the code for the same > purpose. The same change is required in AtSubAbort_Rollback. > AtAbort_Rollback is removed now as per comment 9. > 8. > +static void > +AtAbort_Rollback(void) > { > .. > + /* > + * Remember the required information for performing undo actions. So that > + * if there is any failure in executing the undo action we can execute > + * it later. > + */ > + memcpy (UndoActionStartPtr, latest_urec_ptr, sizeof(UndoActionStartPtr)); > + memcpy (UndoActionEndPtr, s->start_urec_ptr, sizeof(UndoActionEndPtr)); > + > + /* > + * If we are in a valid transaction state then execute the undo action here > + * itself, otherwise we have already stored the required information for > + * executing the undo action later. > + */ > + if (CurrentTransactionState->state == TRANS_INPROGRESS) > .. > } > > It is not clear from above commentary and code how the rollbacks will > work if we get the error while executing undo actions. Basically, how > will PerformUndoActions will be set in such a case? I think you are > going to set it during AbortCurrentTransaction, but if that is the > case, can't we set the values of UndoActionStartPtr and > UndoActionEndPtr at that time? IIUC, these two variables are used > mainly when we want to postpone executing undo actions when we are not > in a good transaction state (like during error rollbacks) and > PerformUndoActions will indicate whether there are any pending > actions, so I feel these three variables should be set together. If > that is not the case, I think we need to add more comments to explain > the same. AtAbort_Rollback is removed now as per comment 9. > > 9. > @@ -2594,6 +2676,7 @@ AbortTransaction(void) > .. > /* > * set the current transaction state information appropriately during the > * abort processing > */ > s->state = TRANS_ABORT; > */ > AfterTriggerEndXact(false); /* 'false' means it's abort */ > AtAbort_Portals(); > + AtAbort_Rollback(); > > .. > > +AtAbort_Rollback(void) > { > .. > + /* > + * If we are in a valid transaction state then execute the undo action here > + * itself, otherwise we have already stored the required information for > + * executing the undo action later. > + */ > + if (CurrentTransactionState->state == TRANS_INPROGRESS) > > The function AtAbort_Rollback is called only from one place > AbortTransaction in the patch and by that time we have already set the > transaction state as TRANS_ABORT, so it is never going to get the > chance to execute the undo actions. How will it work when the user has > specifically given Rollback and we want to execute the undo actions? > Why can't we do this in UserAbortTransactionBlock? There is a comment > indicating the correctness of this method "/* XXX: TODO: check this > logic, which was moved out of UserAbortTransactionBlock */", but it is > not very clear why we need to do this in abort path only. I agree with your point. Now we are handling in UserAbortTransactionBlock. > > 10. > @@ -4811,6 +5243,7 @@ AbortSubTransaction(void) > s->parent->subTransactionId, > s->curTransactionOwner, > s->parent->curTransactionOwner); > + AtSubAbort_Rollback(s); > > I see a similar problem as mentioned in point-9 in > AtSubAbort_Rollback. I think there is one problem here which is that > if executing the undo actions are postponed during Rollback To > Savepoint, then we would have released the locks and some other > backend which was supposed to wait on subtransaction will not wait and > might update the tuple or in some cases need to perform actions. I > understand this is more of a zheap specific stuff, but I feel in > general also we should execute undo actions of Rollback To Savepoint > immediately; if not, you might also need to invent some mechanism to > transfer (to and from) undo record pointers from the sub-transaction > state. Moved execution to RollbackToSavepoint function > > 11. > @@ -2803,6 +2886,12 @@ void > CommitTransactionCommand(void) > { > .. > + > + /* > + * Update the end undo record pointer if it's not valid with > + * the currently popped transaction's end undo record pointer. > + * This is particularly required when the first command of > + * the transaction is of type which does not require an undo, > + * e.g. savepoint x. > + * Accordingly, update the start undo record pointer. > + */ > + for (i = 0; i < UndoPersistenceLevels; i++) > + { > + if (!UndoRecPtrIsValid(end_urec_ptr[i])) > + end_urec_ptr[i] = s->latest_urec_ptr[i]; > + > + if (UndoRecPtrIsValid(s->start_urec_ptr[i])) > + start_urec_ptr[i] = s->start_urec_ptr[i]; > + } > .. > } > > I am not able to understand the above piece of code. Basically, you > have written in comments that this is required when the first command > in a transaction is of the type which doesn't generate undo, but there > is no explanation why it is required for that case. Also, the comment > and code don't seem to match because of below points: > (a) The comment says "Update the end undo record pointer if it's not > valid with the currently popped transaction's end undo record > pointer." and the code is not checking current transactions end > pointer validity, so it is not clear to me what you want to say here. end undo record pointer (recently renamed to latest_urec_ptr) is the first valid latest_urec_ptr while popping out the transaction from transaction stack. So once we got a valid latest we don't need to update, but I agree there are current extra statement getting executed i.e. even if the current transaction's latest_urec_ptr is not valid it's getting assiged. So added extra check to avoid extra assignment cycles. > (b) The comment says "Accordingly, update the start undo record > pointer.". However from start undo record pointer, you are checking > the current state's start pointer which is different from what you do > for end pointer. Start pointer we have to overwrite every time so that we can reach to the first start pointer of the transaction > (c) How will the saved start_urec_ptr and end_urec_ptr be used after this? I have tried to explain it in the comment. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: