Thread: An improvement of ProcessTwoPhaseBuffer logic
Dear Hackers, I would like to discuss ProcessTwoPhaseBuffer function. It reads two-phase transaction states from disk or the WAL. It takesxid as well as some other input parameters and executes the following steps: Step #1: Check if xid is committed or aborted in clog (TransactionIdDidCommit, TransactionIdDidAbort) Step #2: Check if xid is not equal or greater than ShmemVariableCache->nextXid Step #3: Read two-phase state for the specified xid from memory or the corresponding file and returns it In some, very rare scenarios, the postgres instance will newer recover because of such logic. Imagine, that the two_phasedirectory contains some files with two-phase states of transactions of distant future. I assume, it can happen ifsome WAL segments are broken and ignored (as well as clog data) but two_phase directory was not broken. In recovery, postgresqlreads all the files in two_phase and tries to recover two-phase states. The problem appears in the functions TransactionIdDidCommit or TransactionIdDidAbort. These functions may fail with the FATALmessage like below when no clog state on disk is available for the xid: FATAL: could not access status of transaction 286331153 DETAIL: Could not open file "pg_xact/0111": No such file or directory. Such error do not allow the postgresql instance to be started. My guess, if to swap Step #1 with Step #2 such error will disappear because transactions will be filtered when comparingxid with ShmemVariableCache->nextXid before accessing clog. The function will be more robust. In general, it worksbut I'm not sure that such logic will not break some rare boundary cases. Another solution is to catch and ignore sucherror, but the original solution is the simpler one. I appreciate any thoughts concerning this topic. May be, you knowsome cases when such change in logic is not relevant? Thank you in advance! With best regards, Vitaly
Hi Michael, Thank you for the explanation and the patch! I'm happy, that I seem to be on the right way. On Wednesday, December 25, 2024 08:04 MSK, Michael Paquier <michael@paquier.xyz> wrote: > Vitaly, have you seen that in the wild as an effect of future 2PC files? I haven't heard about this problem in production, but I've encountered it when I did some development in two-phase functionality.I fixed it by swapping the if blocks and it solved my problem. It is pretty easy to reproduce it on the master branch: 1. Start an instance with enabled prepared transactions 2. Create a simple prepared transaction 3. Do checkpoint to write the transaction two-phase state into a file in pg_twophase subdirectory 4. Copy the created file, change its name to reflect a future xid (in my case: cp 00000000000002E8 00000000000FF2E8) 5. Commit the prepared transaction 6. Stop the instance with -m immediate 7. Start the instance After starting, you can get an error like "could not start server". In the log file you can find a message like: LOG: database system was not properly shut down; automatic recovery in progress FATAL: could not access status of transaction 1045224 DETAIL: Could not read from file "pg_xact/0000" at offset 253952: read too few bytes. 2024-12-25 18:38:30.606 MSK [795557] LOG: startup process (PID 795560) exited with exit code 1 I tried your patch and it seems the server is started successfully. But I've found another problem in my synthetic test -it can not remove the file with the following message: LOG: database system was not properly shut down; automatic recovery in progress WARNING: removing future two-phase state file for transaction 1045224 WARNING: could not remove file "pg_twophase/FFFFFFFF000FF2E8": No such file or directory LOG: redo starts at 0/1762978 The fill will never be removed automatically. I guess, it is because we incorrectly calculate the two-phase file name using TwoPhaseFilePath in RemoveTwoPhaseFile in thisscenario. It can be fixed if to pass file path directly from RecoverPreparedTransactions or StandbyRecoverPreparedTransactioninto ProcessTwoPhaseBuffer -> RemoveTwoPhaseFile. I did it in the proposed patch https://www.postgresql.org/message-id/cedbe-65e0c000-1-6db17700%40133269862(it is incomplete now). With best regards, Vitaly
Hi Michael, > Here are two patches to address both issues: Thank you for the preparing the patches and test simplification. My bad, I overcompilcated it. I concerned about twophase file name generation. The function TwoPhaseFilePath() is pretty straitforward and unambiguousin 16 and earlier versions. The logic of this function in 17+ seems to be more complex. I do not understand itclearly. But, I guess, it will work incorrectly after turning to a newer epoch, because the epoch is calculated from TransamVariables->nextXid,but not from problem xid. The same problem may happen if we are in epoch 1 or greater. It willproduce a wrong file name, because the epoch will be obtained from the last xid, not from file name xid. In another words,the function AdjustToFullTransactionId assumes that if xid > TransamVariables->nextXid, then the xid from the previousepoch. I may be not the case in our scenario. > I suspect that this can be still dangerous as-is while complicating the code with more possible paths for the removal ofthe 2PC files Agree, but we may pass file name into TwoPhaseFilePath if any, instead of creation of two functions as in the patch. Thecost - two new if conditions. Using file names is pretty safe. Once we read the file and extract xid from its name, justpass this file name to TwoPhaseFilePath(). If not, try to generate it. Anyway, I do not insist on it, just try to discuss. With best regards, Vitaly
On Monday, December 30, 2024 04:08 MSK, Michael Paquier <michael@paquier.xyz> wrote: > Instead, I have gone with a new > FullTransactionIdFromCurrentEpoch() that replaces > AdjustToFullTransactionId(). It cleans up most of the calls to > ReadNextFullTransactionId() compared to the previous patch. It is > true that these couple with calls to ProcessTwoPhaseBuffer(), but the > result felt OK this way in the scope of this fix. Could you please send the latest version of the patch, if any? I haven't found any new patches in the latest email. Happy New Year! With best regards, Vitaly
It seems, there are much deeper problems with twophase transactions as I thought. I'm interested in fixing twophase transactions,because I support a solution which actively uses twophase transactions. I'm interested to get more deeply intothe twophase functionality. Below, I just want to clarify some ideas behind twophase transactions. I appreciate, if youcomment my point of view, or just ignore this email if you find it too naive and boring. Two phase files are created after checkpoint to keep twophase states on disk after WAL truncation. For transactions, thatare inside the checkpoint horizon, we do not create two phase files because the states are currently stored in the WAL. Based on the thesis above, I guess, we have to read only those twophase files which are related to the transactions beforethe latest checkpoint. Its full xid should be lesser than TransamVariables->nextXid (which is the same as ControlFile->checkPointCopy.nextXidat the moment of StartupXLOG -> restoreTwoPhaseData call). The files with greater (orequal) full xids, should be ignored and removed. That's all what we need in restoreTwoPhaseData, I believe. In the current implementation, such check is applied in restoreTwoPhaseData -> ProcessTwoPhaseBuffer but after checking forxid in CLOG. I'm not sure, why we check CLOG here. Once we truncate the WAL on checkpoint and save twophase states intopg_twophase, these files must store states of real transactions from past. I mean, if someone creates a stub file withfull xid < TransamVariables->nextXid, we have no means (except CLOG ?) to check that this file belongs to a real transactionfrom past. CLOG check seems to be a weak attempt to deal with it. At this point, I'm not sure that CLOG may containstates for all full xids of existing twophase files. I guess, we should call restoreTwoPhaseData at start of recoverybut we shouldn't check CLOG at this stage. May be it is reasonable to check some not so old xids which are greaterthan the current CLOG horizon, but I'm not sure how CLOG segments are managed and how the horizon is moving. There is another question about the loading order of twophase files but I think it doesn't matter in which order we loadthese files. But I would prefer to load it in full xid ascending order. On Tuesday, January 28, 2025 08:02 MSK, Michael Paquier <michael@paquier.xyz> wrote: > Noah's > proposal at [1] is much closer to the long-term picture that would > look adapted. > - The CLOG lookups that can happen in ProcessTwoPhaseBuffer() during > recovery while a consistent state is not reached are still possible > (planning to start a different thread about this specific issue). > > [1]: https://www.postgresql.org/message-id/20250116205254.65.nmisch@google.com Agree, thank you, but my simple patch with some adjustments and swapping of checks in ProcessTwoPhaseBuffer may be back-ported.It doesn't fix all the problems but may help to fix the problem with twophase files related to broken latestWAL segments. With best regards, Vitaly