On Wed, Mar 25, 2026 at 08:39:07AM +0800, Andy Fan wrote:
> I found a similar but not exactly same case at 2014 [1] which
> might be helpful to recall a boarder understanding on this area.
>
> [1] https://www.postgresql.org/message-id/534AF601.1030007%40vmware.com
Incorrect shared state when an ERROR happens at an arbitrary location
is usually bad, yes.
For this one, your suggestion of delaying the end of the critical
section started at StartPrepare() and ending in EndPrepare() is not an
acceptable solution as far as I can see, unfortunately: it would mean
doing a SyncRepWaitForLSN() while in a critical section, and I doubt
we'd want to do that. Anyway, I doubt that this one is worth caring
for. The current locking 2PC scheme means, as far as I remember, that
it is not really possible to interact with an external command in a
specific session between the EndPrepare() and the PostPrepare_Locks()
calls.
To put it in other words, let's imagine that we use a breakpoint
between these two calls (or a wait injection point if you automate
that). Is it possible for a second backend to mess with the state of
the first backend waiting until its locks are transfered to the dummy
PGPROC entry? That's what the 2014 thread is about: there was a race
condition reachable between two sessions. If the answer to this
question is yes, I'd agree that this is something that deserves a
closer lookup. And before you ask: attempting to interact with a 2PC
state from a second session with a first session waiting between these
two points would not work: the 2PC entry is locked, cleaned up after
EndPrepare() and PostPrepare_Locks() at PostPrepare_Twophase().
Trying to request an access to this entry fails, as the first backend
is marked as locking it. A second backend attempting to lock it would
fail, complaining that the 2PC entry with a GXID is "busy".
SyncRepWaitForLSN() would be a problematic pattern between the
EndPrepare() and the PostPrepare_Locks(), but we never ERROR there on
purpose: even if we cancel while waiting for a transaction commit we'd
just get a WARNING, meaning that we'd be able to transfer our locks
anyway.
Or perhaps you have a realistic scenario where it is possible to mess
up with the shared state, outside a elog(ERROR) forced between these
two points?
--
Michael