Re: Issues with 2PC at recovery: CLOG lookups and GlobalTransactionData - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Issues with 2PC at recovery: CLOG lookups and GlobalTransactionData
Date
Msg-id 20250706180711.16.nmisch@google.com
Whole thread Raw
In response to Re: Issues with 2PC at recovery: CLOG lookups and GlobalTransactionData  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Issues with 2PC at recovery: CLOG lookups and GlobalTransactionData
List pgsql-hackers
On Fri, Jun 20, 2025 at 09:02:00AM +0900, Michael Paquier wrote:
> On Fri, Jun 06, 2025 at 03:34:13PM -0700, Noah Misch wrote:
> > Shared memory should (not "must") omit a GXACT whose prepare_start_lsn lies
> > beyond the point recovery has reached.  In other words, recovery should keep
> > anachronistic information out of shared memory.  For example, one could
> > imagine a straw man implementation in which, before reaching consistency, we
> > load each pg_twophase file that does pass its CRC check.  I'd dislike that,
> > because it would facilitate anachronisms involving the
> > GlobalTransactionData.gid field.  We could end up with two GXACT having the
> > same gid, one being the present (according to recovery progress) instance of
> > that gid and another being a future instance.  The system might not
> > malfunction today, but I'd consider that fragile.  Anachronistic entries might
> > cause recovery to need more shared memory than max_prepared_transactions has
> > allocated.
> 
> So this comes down to attempting to translate the information from the
> 2PC files to shared memory at a point earlier than what we could.
> Thinking more about this point, there is something that I can see us
> potentially do here: how about tracking the file names in shared
> memory when we go through them in restoreTwoPhaseData()?  That
> addresses my point about doing only one scan of pg_twophase/ at the
> beginning of recovery.  I have not studied in details how doable it
> is,

A challenge is that the number of files in pg_twophase can exceed
max_prepared_transactions if we've not yet reached consistency.  That happens
with a sequence like this:

- exactly max_prepared_transactions files are in pg_twophase
- backup process copies pg_twophase/$file1, then pauses before the next readdir
- some backend deletes pg_twophase/$file1
- checkpointer creates pg_twophase/$fileN
- backup process wakes up and copies remaining pg_twophase files

That's the only problem coming to mind.  If that problem doesn't cancel out
the benefits of scanning pg_twophase early, you may as well do it.

> By the way, what do you think we should do with 0001 at this stage
> (your refactoring work with some changes I've sprinkled)?  I was
> looking at it this morning with fresh eyes and my opinion about it is
> that it improves the situation in all these 2PC callers on HEAD now
> that we need to deal with fxids for the file names, so that's an
> independent piece.  Perhaps this should be applied once v19 opens for
> business while we sort out the rest that 0002 was trying to cover?

That looks harmless.

> Note: this introduced two calls to FullTransactionIdFromAllowableAt()
> in StandbyTransactionIdIsPrepared() and PrepareRedoAdd(), which worried
> me a bit on second look and gave me a pause:
> - The call in StandbyTransactionIdIsPrepared() should use
> AdjustToFullTransactionId(), ReadTwoPhaseFile() calls
> TwoPhaseFilePath(), which uses AdjustToFullTransactionId() on HEAD.
> - The call in PrepareRedoAdd() slightly worried me, but I think that
> we're OK with using FullTransactionIdFromAllowableAt(), as we are in
> the case of a 2PC transaction recovered from WAL, with the XID coming
> from the file's header.  I think that we'd better put an
> Assert(InRecovery) in this path, just for safety.
> 
> What do you think?

It's an appropriate level of risk-taking.



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [PATCH] Add support for displaying database service in psql prompt
Next
From: Tom Lane
Date:
Subject: Re: A recent message added to pg_upgade