Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data - Mailing list pgsql-bugs

From Noah Misch
Subject Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data
Date
Msg-id 20210920044143.GA3414844@rfd.leadboat.com
Whole thread Raw
In response to Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data  (Noah Misch <noah@leadboat.com>)
Responses Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-bugs
On Mon, Aug 23, 2021 at 10:38:00PM +0500, Andrey Borodin wrote:
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -459,14 +459,15 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
>      proc->pgprocno = gxact->pgprocno;
>      SHMQueueElemInit(&(proc->links));
>      proc->waitStatus = PROC_WAIT_STATUS_OK;
> -    /* We set up the gxact's VXID as InvalidBackendId/XID */
> -    proc->lxid = (LocalTransactionId) xid;
> +    /* We set up the gxact's VXID as real for CIC purposes */
> +    proc->lxid = MyProc->lxid;

This breaks the case where the server restarted after PREPARE TRANSACTION.
MyProc->lxid is 0 in the startup process, and LocalTransactionIdIsValid(0) is
false.  I'm attaching a test case addition.  Can you repair this?

>      proc->xid = xid;
>      Assert(proc->xmin == InvalidTransactionId);
>      proc->delayChkpt = false;
>      proc->statusFlags = 0;
>      proc->pid = 0;
> -    proc->backendId = InvalidBackendId;
> +    /* May be backendId of startup process */
> +    proc->backendId = MyBackendId;

Incidentally, MyBackendId of startup process depends on other facts.  When
using hot standby, InitRecoveryTransactionEnvironment() sets MyBackendId=1.
Otherwise, including clean startup of a non-standby node, MyBackendId is
InvalidBackendId.  This may be harmless.  I didn't know about it.

On Tue, Sep 07, 2021 at 11:45:15PM -0700, Noah Misch wrote:
> On Sun, Aug 29, 2021 at 11:38:03PM +0500, Andrey Borodin wrote:
> > > 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а):
> > >>> It seems like it's going to add a substantial amount of work even when
> > >>> no 2PC xacts are involved?
> > >> Only if 2PCs are enabled.
> > > 
> > > I don't think that's good enough. Plenty of systems have 2PC enabled but very
> > > few if any transactions end up as 2PC ones.
> 
> > Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to
TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED).
 
> > 
> > Does it worth the complexity?
> 
> https://www.postgresql.org/search/?m=1&q=TwoPhaseStateLock&l=&d=-1&s=r
> suggests this is the first postgresql.org discussion of TwoPhaseStateLock as a
> bottleneck.  Nonetheless, if Andres Freund finds it's worth the complexity,
> then I'm content with it.  I'd certainly expect some performance benefit.
> Andres, what do you think?

A few more benefits (beyond lock contention) come to mind:

- Looking at the three VirtualXactLock() callers, waiting for final
  disposition of prepared transactions is necessary for
  WaitForLockersMultiple(), disadvantageous for WaitForOlderSnapshots(), and
  dead code for ResolveRecoveryConflictWithVirtualXIDs().  In
  WaitForOlderSnapshots(), PREPARE is as good as COMMIT/ABORT, because a
  prepared transaction won't do further database reads.  Waiting on the
  prepared transaction there could give CIC an arbitrarily-long, needless
  delay.  ResolveRecoveryConflictWithVirtualXIDs() will never wait on a
  prepared transaction, because prepared transactions hold no locks during
  recovery.  (If a prepared transaction originally acquired
  AccessExclusiveLock, the startup process holds that lock on its behalf.)
  Coordinating the XID search at a higher layer would let us change
  WaitForLockersMultiple() without changing the others.

- v13 WaitPreparedXact() experiences starvation when a steady stream of
  prepared transactions have the same VXID.  Since VXID reuse entails
  reconnecting, starvation will be unnoticeable in systems that follow best
  practices around connection lifespan.  The 2021-08-23 patch version didn't
  have that hazard.

None of those benefits clearly justify the complexity, but they're relevant to
the decision.

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Query planning on partitioned table causes postgres 13.4 to consume all memory
Next
From: PG Bug reporting form
Date:
Subject: BUG #17196: old_snapshot_threshold is not honored if there is a transaction