Re: Issues with 2PC at recovery: CLOG lookups and GlobalTransactionData - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Issues with 2PC at recovery: CLOG lookups and GlobalTransactionData |
Date | |
Msg-id | 20250706180711.16.nmisch@google.com Whole thread Raw |
In response to | Re: Issues with 2PC at recovery: CLOG lookups and GlobalTransactionData (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Issues with 2PC at recovery: CLOG lookups and GlobalTransactionData
|
List | pgsql-hackers |
On Fri, Jun 20, 2025 at 09:02:00AM +0900, Michael Paquier wrote: > On Fri, Jun 06, 2025 at 03:34:13PM -0700, Noah Misch wrote: > > Shared memory should (not "must") omit a GXACT whose prepare_start_lsn lies > > beyond the point recovery has reached. In other words, recovery should keep > > anachronistic information out of shared memory. For example, one could > > imagine a straw man implementation in which, before reaching consistency, we > > load each pg_twophase file that does pass its CRC check. I'd dislike that, > > because it would facilitate anachronisms involving the > > GlobalTransactionData.gid field. We could end up with two GXACT having the > > same gid, one being the present (according to recovery progress) instance of > > that gid and another being a future instance. The system might not > > malfunction today, but I'd consider that fragile. Anachronistic entries might > > cause recovery to need more shared memory than max_prepared_transactions has > > allocated. > > So this comes down to attempting to translate the information from the > 2PC files to shared memory at a point earlier than what we could. > Thinking more about this point, there is something that I can see us > potentially do here: how about tracking the file names in shared > memory when we go through them in restoreTwoPhaseData()? That > addresses my point about doing only one scan of pg_twophase/ at the > beginning of recovery. I have not studied in details how doable it > is, A challenge is that the number of files in pg_twophase can exceed max_prepared_transactions if we've not yet reached consistency. That happens with a sequence like this: - exactly max_prepared_transactions files are in pg_twophase - backup process copies pg_twophase/$file1, then pauses before the next readdir - some backend deletes pg_twophase/$file1 - checkpointer creates pg_twophase/$fileN - backup process wakes up and copies remaining pg_twophase files That's the only problem coming to mind. If that problem doesn't cancel out the benefits of scanning pg_twophase early, you may as well do it. > By the way, what do you think we should do with 0001 at this stage > (your refactoring work with some changes I've sprinkled)? I was > looking at it this morning with fresh eyes and my opinion about it is > that it improves the situation in all these 2PC callers on HEAD now > that we need to deal with fxids for the file names, so that's an > independent piece. Perhaps this should be applied once v19 opens for > business while we sort out the rest that 0002 was trying to cover? That looks harmless. > Note: this introduced two calls to FullTransactionIdFromAllowableAt() > in StandbyTransactionIdIsPrepared() and PrepareRedoAdd(), which worried > me a bit on second look and gave me a pause: > - The call in StandbyTransactionIdIsPrepared() should use > AdjustToFullTransactionId(), ReadTwoPhaseFile() calls > TwoPhaseFilePath(), which uses AdjustToFullTransactionId() on HEAD. > - The call in PrepareRedoAdd() slightly worried me, but I think that > we're OK with using FullTransactionIdFromAllowableAt(), as we are in > the case of a 2PC transaction recovered from WAL, with the XID coming > from the file's header. I think that we'd better put an > Assert(InRecovery) in this path, just for safety. > > What do you think? It's an appropriate level of risk-taking.
pgsql-hackers by date: