Re: remove spurious CREATE INDEX CONCURRENTLY wait - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: remove spurious CREATE INDEX CONCURRENTLY wait
Date
Msg-id 20201123153005.GA580@alvherre.pgsql
Whole thread Raw
In response to Re: remove spurious CREATE INDEX CONCURRENTLY wait  (Andres Freund <andres@anarazel.de>)
Responses Re: remove spurious CREATE INDEX CONCURRENTLY wait  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: remove spurious CREATE INDEX CONCURRENTLY wait  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 2020-Nov-18, Andres Freund wrote:

> In 13 this is:
>         LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>         MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
>         if (params->is_wraparound)
>             MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
>         LWLockRelease(ProcArrayLock);
> 
> Lowering this to a shared lock doesn't seem right, at least without a
> detailed comment explaining why it's safe. Because GetSnapshotData() etc
> look at all procs with just an LW_SHARED ProcArrayLock, changing
> vacuumFlags without a lock means that two concurrent horizon
> computations could come to a different result.
> 
> I'm not saying it's definitely wrong to relax things here, but I'm not
> sure we've evaluated it sufficiently.

True.  Let's evaluate it.

I changed three spots where statusFlags bits are written:

a) StartupDecodingContext: sets PROC_IN_LOGICAL_DECODING
b) ReplicationSlotRelease: removes PROC_IN_LOGICAL_DECODING
c) vacuum_rel: sets PROC_IN_VACUUM and potentially
   PROC_VACUUM_FOR_WRAPAROUND

Who reads these flags?

PROC_IN_LOGICAL_DECODING is read by:
 * ComputeXidHorizons
 * GetSnapshotData

PROC_IN_VACUUM is read by:
 * GetCurrentVirtualXIDs
 * ComputeXidHorizons
 * GetSnapshotData
 * ProcArrayInstallImportedXmin

PROC_VACUUM_FOR_WRAPAROUND is read by:
 * ProcSleep


ProcSleep:  no bug here.
The flags are checked to see if we should kill() the process (used when
autovac blocks some other process).  The lock here appears to be
inconsequential, since we release it before we do kill(); so strictly
speaking, there is still a race condition where the autovac process
could reset the flag after we read it and before we get a chance to
signal it.  The lock level autovac uses to set the flag is not relevant
either.

ProcArrayInstallImportedXmin:
This one is just searching for a matching backend; not affected by the
flags.

PROC_IN_LOGICAL_DECODING:
Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
ReplicationSlotRelease might be the most problematic one of the lot.
That's because a proc's xmin that had been ignored all along by
ComputeXidHorizons, will now be included in the computation.  Adding
asserts that proc->xmin and proc->xid are InvalidXid by the time we
reset the flag, I got hits in pg_basebackup, test_decoding and
subscription tests.  I think it's OK for ComputeXidHorizons (since it
just means that a vacuum that reads a later will remove less rows.)  But
in GetSnapshotData it is just not correct to have the Xmin go backwards.

Therefore it seems to me that this code has a bug independently of the
lock level used.


GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:

In these cases, what we need is that the code computes some xmin (or
equivalent computation) based on a set of transactions that exclude
those marked with the flags.  The behavior we want is that if some
transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
one*.  In other words, if there's no Xid/Xmin, then the flag is not
important.  So if we can ensure that the flag is set first, and the
Xid/xmin is installed later, that's sufficient correctness and we don't
need to hold exclusive lock.  But if we can't ensure that, then we must
use exclusive lock, because otherwise we risk another process seeing our
Xid first and not our flag, which would be bad.

In other words, my conclusion is that there definitely is a bug here and
I am going to restore the use of exclusive lock for setting the
statusFlags.


GetSnapshotData has an additional difficulty -- we do the
UINT32_ACCESS_ONCE(ProcGlobal->xid[i]) read *before* testing the bit. 
So it's not valid to set the bit without locking out GSD, regardless of
any barriers we had; if we want this to be safe, we'd have to change
this so that the flag is read first, and we read the xid only
afterwards, with a read barrier.

I *think* we could relax the lock if we had a write barrier in
between: set the flag first, issue a write barrier, set the Xid.
(I have to admit I'm confused about what needs to happen in the read
case: read the bit first, potentially skip the PGPROC entry; but can we
read the Xid ahead of reading the flag, and if we do read the xid
afterwards, do we need a read barrier in between?)
Given this uncertainty, I'm not proposing to relax the lock from
exclusive to shared.


I didn't get around to reading ComputeXidHorizons, but it seems like
it'd have the same problem as GSD.



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Online verification of checksums
Next
From: Luc Vlaming
Date:
Subject: Re: Parallel plans and "union all" subquery