Thread: remove spurious CREATE INDEX CONCURRENTLY wait

remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
I previously[1] posted a patch to have multiple CREATE INDEX CONCURRENTLY
not wait for the slowest of them.  This is an update of that, with minor
conflicts fixed and a fresh thread.

To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all
other CICs running concurrently to finish, because they can't be
distinguished amidst other old snapshots.  We can change things by
having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating
that it's doing CIC; other CICs will see that flag and will know that
they don't need to wait for those processes.  With this, CIC on small
tables don't have to wait for CIC on large tables to complete.

[1] https://postgr.es/m/20200805021109.GA9079@alvherre.pgsql


-- 
Álvaro Herrera                            http://www.linkedin.com/in/alvherre
"Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
+ James Coleman

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all
> other CICs running concurrently to finish, because they can't be
> distinguished amidst other old snapshots.  We can change things by
> having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating
> that it's doing CIC; other CICs will see that flag and will know that
> they don't need to wait for those processes.  With this, CIC on small
> tables don't have to wait for CIC on large tables to complete.

Hm.  +1 for improving this, if we can, but ...

It seems clearly unsafe to ignore a CIC that is in active index-building;
a snapshot held for that purpose is just as real as any other.  It *might*
be all right to ignore a CIC that is just waiting, but you haven't made
any argument in the patch comments as to why that's safe either.
(Moreover, at the points where we're just waiting, I don't think we have
a snapshot, so another CIC's WaitForOlderSnapshots shouldn't wait for us
anyway.)

Actually, it doesn't look like you've touched the comments at all.
WaitForOlderSnapshots' header comment has a long explanation of why
it's safe to ignore certain processes.  That certainly needs to be
updated by any patch that's going to change the rules.

BTW, what about REINDEX CONCURRENTLY?

            regards, tom lane



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
James Coleman
Date:
On Mon, Aug 10, 2020 at 8:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all
> > other CICs running concurrently to finish, because they can't be
> > distinguished amidst other old snapshots.  We can change things by
> > having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating
> > that it's doing CIC; other CICs will see that flag and will know that
> > they don't need to wait for those processes.  With this, CIC on small
> > tables don't have to wait for CIC on large tables to complete.
>
> Hm.  +1 for improving this, if we can, but ...
>
> It seems clearly unsafe to ignore a CIC that is in active index-building;
> a snapshot held for that purpose is just as real as any other.  It *might*
> be all right to ignore a CIC that is just waiting, but you haven't made
> any argument in the patch comments as to why that's safe either.
> (Moreover, at the points where we're just waiting, I don't think we have
> a snapshot, so another CIC's WaitForOlderSnapshots shouldn't wait for us
> anyway.)

Why is a CIC in active index-building something we need to wait for?
Wouldn't it fall under a similar kind of logic to the other snapshot
types we can explicitly ignore? CIC can't be run in a manual
transaction, so the snapshot it holds won't be used to perform
arbitrary operations (i.e., the reason why a manual ANALYZE can't be
ignored).

> Actually, it doesn't look like you've touched the comments at all.
> WaitForOlderSnapshots' header comment has a long explanation of why
> it's safe to ignore certain processes.  That certainly needs to be
> updated by any patch that's going to change the rules.

Agreed that the comment needs to be updated to discuss the
(im)possibility of arbitrary operations within a snapshot held by CIC.

James



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Tom Lane
Date:
James Coleman <jtc331@gmail.com> writes:
> Why is a CIC in active index-building something we need to wait for?
> Wouldn't it fall under a similar kind of logic to the other snapshot
> types we can explicitly ignore? CIC can't be run in a manual
> transaction, so the snapshot it holds won't be used to perform
> arbitrary operations (i.e., the reason why a manual ANALYZE can't be
> ignored).

Expression indexes that call user-defined functions seem like a
pretty serious risk factor for that argument.  Those are exactly
the same expressions that ANALYZE will evaluate, as a result of
which we judge it unsafe to ignore.  Why would CIC be different?

            regards, tom lane



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
James Coleman
Date:
On Tue, Aug 11, 2020 at 11:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > Why is a CIC in active index-building something we need to wait for?
> > Wouldn't it fall under a similar kind of logic to the other snapshot
> > types we can explicitly ignore? CIC can't be run in a manual
> > transaction, so the snapshot it holds won't be used to perform
> > arbitrary operations (i.e., the reason why a manual ANALYZE can't be
> > ignored).
>
> Expression indexes that call user-defined functions seem like a
> pretty serious risk factor for that argument.  Those are exactly
> the same expressions that ANALYZE will evaluate, as a result of
> which we judge it unsafe to ignore.  Why would CIC be different?

The comments for WaitForOlderSnapshots() don't discuss that problem at
all; as far as ANALYZE goes they only say:

* Manual ANALYZEs, however, can't be excluded because they
* might be within transactions that are going to do arbitrary operations
* later.

But nonetheless over in the thread Álvaro linked to we'd discussed
these issues already. Andres in [1] and [2] believed that:

> For the snapshot waits we can add a procarray flag
> (alongside PROCARRAY_VACUUM_FLAG) indicating that the backend is
> doing. Which WaitForOlderSnapshots() can then use to ignore those CICs,
> which is safe, because those transactions definitely don't insert into
> relations targeted by CIC. The change to WaitForOlderSnapshots() would
> just be to pass the new flag to GetCurrentVirtualXIDs, I think.

and

> What I was thinking of was a new flag, with a distinct value from
> PROC_IN_VACUUM. It'd currently just be specified in the
> GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid
> needing to wait for other CICs on different relations. Since CIC is not
> permitted on system tables, and CIC doesn't do DML on normal tables, it
> seems fairly obviously correct to exclude other CICs.

In [3] I'd brought up that a function index can do arbitrary
operations (including, terribly, a query of another table), and Andres
(in [4]) noted that:

> Well, even if we consider this an actual problem, we could still use
> PROC_IN_CIC for non-expression non-partial indexes (index operator
> themselves better ensure this isn't a problem, or they're ridiculously
> broken already - they can get called during vacuum).

But went on to describe how this is a problem with all expression
indexes (even if those expressions don't do dangerous things) because
of syscache lookups, but that ideally for expression indexes we'd have
a way to use a different (or more frequently taken) snapshot for the
purpose of computing those expressions. That's a significantly more
involved patch though.

So from what I understand, everything that I'd claimed in my previous
message still holds true for non-expression/non-partial indexes. Is
there something else I'm missing?

James

1: https://www.postgresql.org/message-id/20200325191935.jjhdg2zy5k7ath5v%40alap3.anarazel.de
2: https://www.postgresql.org/message-id/20200325195841.gq4hpl25t6pxv3gl%40alap3.anarazel.de
3: https://www.postgresql.org/message-id/CAAaqYe_fveT_tvKonVt1imujOURUUMrydGeaBGahqLLy9D-REw%40mail.gmail.com
4: https://www.postgresql.org/message-id/20200416221207.wmnzbxvjqqpazeob%40alap3.anarazel.de



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
On 2020-Aug-11, James Coleman wrote:

> In [3] I'd brought up that a function index can do arbitrary
> operations (including, terribly, a query of another table), and Andres
> (in [4]) noted that:
> 
> > Well, even if we consider this an actual problem, we could still use
> > PROC_IN_CIC for non-expression non-partial indexes (index operator
> > themselves better ensure this isn't a problem, or they're ridiculously
> > broken already - they can get called during vacuum).
> 
> But went on to describe how this is a problem with all expression
> indexes (even if those expressions don't do dangerous things) because
> of syscache lookups, but that ideally for expression indexes we'd have
> a way to use a different (or more frequently taken) snapshot for the
> purpose of computing those expressions. That's a significantly more
> involved patch though.

So the easy first patch here is to add the flag as PROC_IN_SAFE_CIC,
which is set only for CIC when it's used for indexes that are neither
on expressions nor partial.  Then ignore those in WaitForOlderSnapshots.
The attached patch does it that way.  (Also updated per recent
conflicts).

I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
can be done too, since in essence it's the same thing as a CIC from a
snapshot management point of view.

Also, per [1], ISTM this flag could be used to tell lazy VACUUM to
ignore the Xmin of this process too, which the previous formulation
(where all CICs were so marked) could not.  This patch doesn't do that
yet, but it seems the natural next step to take.

[1] https://postgr.es/m/20191101203310.GA12239@alvherre.pgsql

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Michael Paquier
Date:
On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:
> I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
> can be done too, since in essence it's the same thing as a CIC from a
> snapshot management point of view.

Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
there are no predicates and expressions involved.  The transactions
that should be patched are all started in ReindexRelationConcurrently.
The transaction of index_concurrently_swap() cannot set up that
though.  Only thing to be careful is to make sure that safe_flag is
correct depending on the list of indexes worked on.

> Also, per [1], ISTM this flag could be used to tell lazy VACUUM to
> ignore the Xmin of this process too, which the previous formulation
> (where all CICs were so marked) could not.  This patch doesn't do that
> yet, but it seems the natural next step to take.
>
> [1] https://postgr.es/m/20191101203310.GA12239@alvherre.pgsql

Could we consider renaming vacuumFlags?  With more flags associated to
a PGPROC entry that are not related to vacuum, the current naming
makes things confusing.  Something like statusFlags could fit better
in the picture?
--
Michael

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Dmitry Dolgov
Date:
> On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote:
> On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:
> > I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
> > can be done too, since in essence it's the same thing as a CIC from a
> > snapshot management point of view.
>
> Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
> there are no predicates and expressions involved.  The transactions
> that should be patched are all started in ReindexRelationConcurrently.
> The transaction of index_concurrently_swap() cannot set up that
> though.  Only thing to be careful is to make sure that safe_flag is
> correct depending on the list of indexes worked on.

Hi,

After looking through the thread and reading the patch it seems good,
and there are only few minor questions:

* Doing the same for REINDEX CONCURRENTLY, which does make sense. In
  fact it's already mentioned in the commentaries as done, which a bit
  confusing.

* Naming, to be more precise what suggested Michael:

> Could we consider renaming vacuumFlags?  With more flags associated to
> a PGPROC entry that are not related to vacuum, the current naming
> makes things confusing.  Something like statusFlags could fit better
> in the picture?

  which sounds reasonable, and similar one about flag name
  PROC_IN_SAFE_CIC - if it covers both CREATE INDEX/REINDEX CONCURRENTLY
  maybe just PROC_IN_SAFE_IC?

Any plans about those questions? I can imagine that are the only missing
parts.



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Dmitry Dolgov
Date:
> On Tue, Nov 03, 2020 at 07:14:47PM +0100, Dmitry Dolgov wrote:
> > On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote:
> > On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:
> > > I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
> > > can be done too, since in essence it's the same thing as a CIC from a
> > > snapshot management point of view.
> >
> > Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
> > there are no predicates and expressions involved.  The transactions
> > that should be patched are all started in ReindexRelationConcurrently.
> > The transaction of index_concurrently_swap() cannot set up that
> > though.  Only thing to be careful is to make sure that safe_flag is
> > correct depending on the list of indexes worked on.
>
> Hi,
>
> After looking through the thread and reading the patch it seems good,
> and there are only few minor questions:
>
> * Doing the same for REINDEX CONCURRENTLY, which does make sense. In
>   fact it's already mentioned in the commentaries as done, which a bit
>   confusing.

Just to give it a shot, would the attached change be enough?

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Michael Paquier
Date:
On Mon, Nov 09, 2020 at 04:47:43PM +0100, Dmitry Dolgov wrote:
> > On Tue, Nov 03, 2020 at 07:14:47PM +0100, Dmitry Dolgov wrote:
> > > On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote:
> > > On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote:
> > > > I did not set the flag in REINDEX CONCURRENTLY, but as I understand it
> > > > can be done too, since in essence it's the same thing as a CIC from a
> > > > snapshot management point of view.
> > >
> > > Yes, I see no problems for REINDEX CONCURRENTLY as well as long as
> > > there are no predicates and expressions involved.  The transactions
> > > that should be patched are all started in ReindexRelationConcurrently.
> > > The transaction of index_concurrently_swap() cannot set up that
> > > though.  Only thing to be careful is to make sure that safe_flag is
> > > correct depending on the list of indexes worked on.
> >
> > Hi,
> >
> > After looking through the thread and reading the patch it seems good,
> > and there are only few minor questions:
> >
> > * Doing the same for REINDEX CONCURRENTLY, which does make sense. In
> >   fact it's already mentioned in the commentaries as done, which a bit
> >   confusing.
>
> Just to give it a shot, would the attached change be enough?

Could it be possible to rename vacuumFlags to statusFlags first?  I
did not see any objection to do this suggestion.

> +        LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> +        MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
> +        ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
> +        LWLockRelease(ProcArrayLock);

I can't help noticing that you are repeating the same code pattern
eight times.  I think that this should be in its own routine, and that
we had better document that this should be called just after starting
a transaction, with an assertion enforcing that.
--
Michael

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
>> +        LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>> +        MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
>> +        ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
>> +        LWLockRelease(ProcArrayLock);

> I can't help noticing that you are repeating the same code pattern
> eight times.  I think that this should be in its own routine, and that
> we had better document that this should be called just after starting
> a transaction, with an assertion enforcing that.

Do we really need exclusive lock on the ProcArray to make this flag
change?  That seems pretty bad from a concurrency standpoint.

            regards, tom lane



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Michael Paquier
Date:
On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote:
> Do we really need exclusive lock on the ProcArray to make this flag
> change?  That seems pretty bad from a concurrency standpoint.

Any place where we update vacuumFlags acquires an exclusive LWLock on
ProcArrayLock.  That's held for a very short time, so IMO it won't
matter much in practice, particularly if you compare that with the
potential gains related to the existing wait phases.
--
Michael

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote:
>> Do we really need exclusive lock on the ProcArray to make this flag
>> change?  That seems pretty bad from a concurrency standpoint.

> Any place where we update vacuumFlags acquires an exclusive LWLock on
> ProcArrayLock.  That's held for a very short time, so IMO it won't
> matter much in practice, particularly if you compare that with the
> potential gains related to the existing wait phases.

Not sure I believe that it doesn't matter much in practice.  If there's
a steady stream of shared ProcArrayLock acquisitions (for snapshot
acquisition) then somebody wanting exclusive lock will create a big
hiccup, whether they hold it for a short time or not.

            regards, tom lane



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
On 2020-Nov-09, Tom Lane wrote:

> Michael Paquier <michael@paquier.xyz> writes:
> > On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote:
> >> Do we really need exclusive lock on the ProcArray to make this flag
> >> change?  That seems pretty bad from a concurrency standpoint.
> 
> > Any place where we update vacuumFlags acquires an exclusive LWLock on
> > ProcArrayLock.  That's held for a very short time, so IMO it won't
> > matter much in practice, particularly if you compare that with the
> > potential gains related to the existing wait phases.
> 
> Not sure I believe that it doesn't matter much in practice.  If there's
> a steady stream of shared ProcArrayLock acquisitions (for snapshot
> acquisition) then somebody wanting exclusive lock will create a big
> hiccup, whether they hold it for a short time or not.

Yeah ... it would be much better if we can make it use atomics instead.
Currently it's an uint8, and in PGPROC itself it's probably not a big
deal to enlarge that, but I fear that quadrupling the size of the
mirroring array in PROC_HDR might be bad for performance.  However,
maybe if we use atomics to access it, then we don't need to mirror it
anymore?  That would need some benchmarking of GetSnapshotData.




Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Michael Paquier
Date:
On Mon, Nov 09, 2020 at 11:31:15PM -0300, Alvaro Herrera wrote:
> Yeah ... it would be much better if we can make it use atomics instead.
> Currently it's an uint8, and in PGPROC itself it's probably not a big
> deal to enlarge that, but I fear that quadrupling the size of the
> mirroring array in PROC_HDR might be bad for performance.  However,
> maybe if we use atomics to access it, then we don't need to mirror it
> anymore?  That would need some benchmarking of GetSnapshotData.

Hmm.  If you worry about the performance impact here, it is possible
to do a small performance test without this patch.  vacuum_rel() sets
the flag for a non-full VACUUM, so with one backend running a manual
VACUUM in loop on an empty relation could apply some pressure on any
benchmark, even a simple pgbench.
--
Michael

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Yeah ... it would be much better if we can make it use atomics instead.

I was thinking more like "do we need any locking at all".

Assuming that a proc's vacuumFlags can be set by only the process itself,
there's no write conflicts to worry about.  On the read side, there's a
hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but
that's not any different from what the outcome would be if they looked
just before this stanza executes.  And even if they don't see it, at worst
we lose the optimization being proposed.

There is a question of whether it's important that both copies of the flag
appear to update atomically ... but that just begs the question "why in
heaven's name are there two copies?"

            regards, tom lane



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Dmitry Dolgov
Date:
> On Mon, Nov 09, 2020 at 10:02:27PM -0500, Tom Lane wrote:
>
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Yeah ... it would be much better if we can make it use atomics instead.
>
> I was thinking more like "do we need any locking at all".
>
> Assuming that a proc's vacuumFlags can be set by only the process itself,
> there's no write conflicts to worry about.  On the read side, there's a
> hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but
> that's not any different from what the outcome would be if they looked
> just before this stanza executes.  And even if they don't see it, at worst
> we lose the optimization being proposed.
>
> There is a question of whether it's important that both copies of the flag
> appear to update atomically ... but that just begs the question "why in
> heaven's name are there two copies?"

Sounds right, but after reading the thread about GetSnapshotData
scalability more thoroughly there seem to be an assumption that those
copies have to be updated at the same time under the same lock, and
claims that in some cases justification for correctness around not
taking ProcArrayLock is too complicated, at least for now.

Interesting enough, similar discussion happened about vaccumFlags before
with the same conclusion that theoretically it's fine to update without
holding the lock, but this assumption could change one day and it's
better to avoid such risks. Having said that I believe it makes sense to
continue with locking. Are there any other opinions? I'll try to
benchmark it in the meantime.



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Michael Paquier
Date:
On Thu, Nov 12, 2020 at 04:36:32PM +0100, Dmitry Dolgov wrote:
> Interesting enough, similar discussion happened about vaccumFlags before
> with the same conclusion that theoretically it's fine to update without
> holding the lock, but this assumption could change one day and it's
> better to avoid such risks. Having said that I believe it makes sense to
> continue with locking. Are there any other opinions? I'll try to
> benchmark it in the meantime.

Thanks for planning some benchmarking for this specific patch.  I have
to admit that the possibility of switching vacuumFlags to use atomics
is very appealing in the long term, with or without considering this
patch, even if we had better be sure that this patch has no actual
effect on concurrency first if atomics are not used in worst-case
scenarios.
--
Michael

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Dmitry Dolgov
Date:
> On Fri, Nov 13, 2020 at 09:25:40AM +0900, Michael Paquier wrote:
> On Thu, Nov 12, 2020 at 04:36:32PM +0100, Dmitry Dolgov wrote:
> > Interesting enough, similar discussion happened about vaccumFlags before
> > with the same conclusion that theoretically it's fine to update without
> > holding the lock, but this assumption could change one day and it's
> > better to avoid such risks. Having said that I believe it makes sense to
> > continue with locking. Are there any other opinions? I'll try to
> > benchmark it in the meantime.
>
> Thanks for planning some benchmarking for this specific patch.  I have
> to admit that the possibility of switching vacuumFlags to use atomics
> is very appealing in the long term, with or without considering this
> patch, even if we had better be sure that this patch has no actual
> effect on concurrency first if atomics are not used in worst-case
> scenarios.

I've tried first to test scenarios where GetSnapshotData produces
significant lock contention and "reindex concurrently" implementation
with locks interferes with it. The idea I had is to create a test
function that constantly calls GetSnapshotData (perf indeed shows
significant portion of time spent on contended lock), and clash it with
a stream of "reindex concurrently" of an empty relation (which still
reaches safe_index check). I guess it could be considered as an
artificial extreme case. Measuring GetSnapshotData (or rather the
surrounding wrapper, to distinguish calls from the test function from
everything else) latency without reindex, with reindex and locks, with
reindex without locks should produce different "modes" and comparing
them we can make some conclusions.

Latency histograms without reindex (nanoseconds):

     nsecs               : count     distribution
       512 -> 1023       : 0        |                                        |
      1024 -> 2047       : 10001209 |****************************************|
      2048 -> 4095       : 76936    |                                        |
      4096 -> 8191       : 1468     |                                        |
      8192 -> 16383      : 98       |                                        |
     16384 -> 32767      : 39       |                                        |
     32768 -> 65535      : 6        |                                        |

The same with reindex without locks:

     nsecs               : count     distribution
       512 -> 1023       : 0        |                                        |
      1024 -> 2047       : 111345   |                                        |
      2048 -> 4095       : 6997627  |****************************************|
      4096 -> 8191       : 18575    |                                        |
      8192 -> 16383      : 586      |                                        |
     16384 -> 32767      : 312      |                                        |
     32768 -> 65535      : 18       |                                        |

The same with reindex with locks:

     nsecs               : count     distribution
       512 -> 1023       : 0        |                                        |
      1024 -> 2047       : 59438    |                                        |
      2048 -> 4095       : 6901187  |****************************************|
      4096 -> 8191       : 18584    |                                        |
      8192 -> 16383      : 581      |                                        |
     16384 -> 32767      : 280      |                                        |
     32768 -> 65535      : 84       |                                        |

Looks like with reindex without locks is indeed faster (there are mode
samples in lower time section), but not particularly significant to the
whole distribution, especially taking into account extremity of the
test.

I'll take a look at benchmarking of switching vacuumFlags to use
atomics, but as it's probably a bit off topic I'm going to attach
another version of the patch with locks and suggested changes. To which
I have one question:

> Michael Paquier <michael@paquier.xyz> writes:

> I think that this should be in its own routine, and that we had better
> document that this should be called just after starting a transaction,
> with an assertion enforcing that.

I'm not sure which exactly assertion condition do you mean?

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Simon Riggs
Date:
On Tue, 10 Nov 2020 at 03:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Yeah ... it would be much better if we can make it use atomics instead.
>
> I was thinking more like "do we need any locking at all".
>
> Assuming that a proc's vacuumFlags can be set by only the process itself,
> there's no write conflicts to worry about.  On the read side, there's a
> hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but
> that's not any different from what the outcome would be if they looked
> just before this stanza executes.  And even if they don't see it, at worst
> we lose the optimization being proposed.
>
> There is a question of whether it's important that both copies of the flag
> appear to update atomically ... but that just begs the question "why in
> heaven's name are there two copies?"

Agreed to all of the above, but I think the issue is miniscule because
ProcArrayLock is acquired in LW_EXCLUSIVE mode at transaction commit.
So it doesn't seem like there is much need to optimize this particular
aspect of the patch.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
On 2020-Nov-16, Dmitry Dolgov wrote:

> The same with reindex without locks:
> 
>      nsecs               : count     distribution
>        512 -> 1023       : 0        |                                        |
>       1024 -> 2047       : 111345   |                                        |
>       2048 -> 4095       : 6997627  |****************************************|
>       4096 -> 8191       : 18575    |                                        |
>       8192 -> 16383      : 586      |                                        |
>      16384 -> 32767      : 312      |                                        |
>      32768 -> 65535      : 18       |                                        |
> 
> The same with reindex with locks:
> 
>      nsecs               : count     distribution
>        512 -> 1023       : 0        |                                        |
>       1024 -> 2047       : 59438    |                                        |
>       2048 -> 4095       : 6901187  |****************************************|
>       4096 -> 8191       : 18584    |                                        |
>       8192 -> 16383      : 581      |                                        |
>      16384 -> 32767      : 280      |                                        |
>      32768 -> 65535      : 84       |                                        |
> 
> Looks like with reindex without locks is indeed faster (there are mode
> samples in lower time section), but not particularly significant to the
> whole distribution, especially taking into account extremity of the
> test.

I didn't analyze these numbers super carefully, but yeah it doesn't look
significant.

I'm looking at these patches now, with intention to push.




Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
On 2020-Nov-09, Tom Lane wrote:

> Michael Paquier <michael@paquier.xyz> writes:
> >> +        LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> >> +        MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
> >> +        ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
> >> +        LWLockRelease(ProcArrayLock);
> 
> > I can't help noticing that you are repeating the same code pattern
> > eight times.  I think that this should be in its own routine, and that
> > we had better document that this should be called just after starting
> > a transaction, with an assertion enforcing that.
> 
> Do we really need exclusive lock on the ProcArray to make this flag
> change?  That seems pretty bad from a concurrency standpoint.

BTW I now know that the reason for taking ProcArrayLock is not the
vacuumFlags itself, but rather MyProc->pgxactoff, which can move.

On the other hand, if we stopped mirroring the flags in ProcGlobal, it
would mean we would have to access all procs' PGPROC entries in
GetSnapshotData, which is undesirable for performance reason (per commit
5788e258bb26).



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
I am really unsure about the REINDEX CONCURRENTLY part of this, for two
reasons:

1. It is not as good when reindexing multiple indexes, because we can
only apply the flag if *all* indexes are "safe".  Any unsafe index means
we step down from it for the whole thing.  This is probably not worth
worrying much about, but still.

2. In some of the waiting transactions, we actually do more things than
what we do in CREATE INDEX CONCURRENTLY transactions --- some catalog
updates, but we also do the whole index validation phase.  Is that OK?
It's not as clear to me that it is safe to set the flag in all those
places.

I moved the comments to the new function and made it inline.  I also
changed the way we determine how the function is safe; there's no reason
to build an IndexInfo if we can simply look at rel->rd_indexprs and
rel->indpred.

I've been wondering if it would be sane/safe to do the WaitForFoo stuff
outside of any transaction.

I'll have a look again tomorrow.

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Michael Paquier
Date:
On Mon, Nov 16, 2020 at 09:23:41PM -0300, Alvaro Herrera wrote:
> I've been wondering if it would be sane/safe to do the WaitForFoo stuff
> outside of any transaction.

This needs to remain within a transaction as CIC holds a session lock
on the parent table, which would not be cleaned up without a
transaction context.
--
Michael

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
On 2020-Nov-16, Alvaro Herrera wrote:

> On 2020-Nov-09, Tom Lane wrote:
> 
> > Michael Paquier <michael@paquier.xyz> writes:
> > >> +        LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> > >> +        MyProc->vacuumFlags |= PROC_IN_SAFE_IC;
> > >> +        ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags;
> > >> +        LWLockRelease(ProcArrayLock);
> > 
> > > I can't help noticing that you are repeating the same code pattern
> > > eight times.  I think that this should be in its own routine, and that
> > > we had better document that this should be called just after starting
> > > a transaction, with an assertion enforcing that.
> > 
> > Do we really need exclusive lock on the ProcArray to make this flag
> > change?  That seems pretty bad from a concurrency standpoint.
> 
> BTW I now know that the reason for taking ProcArrayLock is not the
> vacuumFlags itself, but rather MyProc->pgxactoff, which can move.

... ah, but I realize now that this means that we can use shared lock
here, not exclusive, which is already an enormous improvement.  That's
because ->pgxactoff can only be changed with exclusive lock held; so as
long as we hold shared, the array item cannot move.



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
So I made the change to set the statusFlags with only LW_SHARED -- both
in existing uses (see 0002) and in the new CIC code (0003).  I don't
think 0002 is going to have a tremendous performance impact, because it
changes operations that are very infrequent.  But even so, it would be
weird to leave code around that uses exclusive lock when we're going to
add new code that uses shared lock for the same thing.

I still left the REINDEX CONCURRENTLY support in split out in 0004; I
intend to get the first three patches pushed now, and look into 0004
again later.

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Michael Paquier
Date:
On Tue, Nov 17, 2020 at 02:14:53PM -0300, Alvaro Herrera wrote:
> diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
> index f1f4df7d70..4324e32656 100644
> --- a/src/backend/replication/logical/logical.c
> +++ b/src/backend/replication/logical/logical.c
> @@ -181,7 +181,7 @@ StartupDecodingContext(List *output_plugin_options,
>       */
>      if (!IsTransactionOrTransactionBlock())
>      {
> -        LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> +        LWLockAcquire(ProcArrayLock, LW_SHARED);
>          MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING;
>          ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
>          LWLockRelease(ProcArrayLock);

We don't really document that it is safe to update statusFlags while
holding a shared lock on ProcArrayLock, right?  Could this be
clarified at least in proc.h?

> +    /* Determine whether we can call set_safe_index_flag */
> +    safe_index = indexInfo->ii_Expressions == NIL &&
> +        indexInfo->ii_Predicate == NIL;

This should tell why we check after expressions and predicates, in
short giving an explanation about the snapshot issues with build and
validation when executing those expressions and/or predicates.

> + * Set the PROC_IN_SAFE_IC flag in my PGPROC entry.
> + *
> + * When doing concurrent index builds, we can set this flag
> + * to tell other processes concurrently running CREATE
> + * INDEX CONCURRENTLY to ignore us when
> + * doing their waits for concurrent snapshots.  On one hand it
> + * avoids pointlessly waiting for a process that's not interesting
> + * anyway, but more importantly it avoids deadlocks in some cases.
> + *
> + * This can only be done for indexes that don't execute any expressions.
> + * Caller is responsible for only calling this routine when that
> + * assumption holds true.
> + *
> + * (The flag is reset automatically at transaction end, so it must be
> + * set for each transaction.)

Would it be better to document that this function had better be called
after beginning a transaction?  I am wondering if we should not
enforce some conditions here, say this one or similar:
Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);

> + */
> +static inline void
> +set_safe_index_flag(void)

This routine name is rather close to index_set_state_flags(), could it
be possible to finish with a better name?
--
Michael

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
On 2020-Nov-18, Michael Paquier wrote:

> On Tue, Nov 17, 2020 at 02:14:53PM -0300, Alvaro Herrera wrote:
> > diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
> > index f1f4df7d70..4324e32656 100644
> > --- a/src/backend/replication/logical/logical.c
> > +++ b/src/backend/replication/logical/logical.c
> > @@ -181,7 +181,7 @@ StartupDecodingContext(List *output_plugin_options,
> >       */
> >      if (!IsTransactionOrTransactionBlock())
> >      {
> > -        LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> > +        LWLockAcquire(ProcArrayLock, LW_SHARED);
> >          MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING;
> >          ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
> >          LWLockRelease(ProcArrayLock);
> 
> We don't really document that it is safe to update statusFlags while
> holding a shared lock on ProcArrayLock, right?  Could this be
> clarified at least in proc.h?

Pushed that part with a comment addition.  This stuff is completely
undocumented ...

> > +    /* Determine whether we can call set_safe_index_flag */
> > +    safe_index = indexInfo->ii_Expressions == NIL &&
> > +        indexInfo->ii_Predicate == NIL;
> 
> This should tell why we check after expressions and predicates, in
> short giving an explanation about the snapshot issues with build and
> validation when executing those expressions and/or predicates.

Fair.  It seems good to add a comment to the new function, and reference
that in each place where we set the flag.


> > + * Set the PROC_IN_SAFE_IC flag in my PGPROC entry.

> Would it be better to document that this function had better be called
> after beginning a transaction?  I am wondering if we should not
> enforce some conditions here, say this one or similar:
> Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);

I went with checking MyProc->xid and MyProc->xmin, which is the same in
practice but notionally closer to what we're doing.

> > +static inline void
> > +set_safe_index_flag(void)
> 
> This routine name is rather close to index_set_state_flags(), could it
> be possible to finish with a better name?

I went with set_indexsafe_procflags().  Ugly ...

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Andres Freund
Date:
Hi,

On 2020-11-17 12:55:01 -0300, Alvaro Herrera wrote:
> ... ah, but I realize now that this means that we can use shared lock
> here, not exclusive, which is already an enormous improvement.  That's
> because ->pgxactoff can only be changed with exclusive lock held; so as
> long as we hold shared, the array item cannot move.

Uh, wait a second. The acquisition of this lock hasn't been affected by
the snapshot scalability changes, and therefore are unrelated to
->pgxactoff changing or not.

In 13 this is:
        LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
        MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
        if (params->is_wraparound)
            MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
        LWLockRelease(ProcArrayLock);

Lowering this to a shared lock doesn't seem right, at least without a
detailed comment explaining why it's safe. Because GetSnapshotData() etc
look at all procs with just an LW_SHARED ProcArrayLock, changing
vacuumFlags without a lock means that two concurrent horizon
computations could come to a different result.

I'm not saying it's definitely wrong to relax things here, but I'm not
sure we've evaluated it sufficiently.

Greetings,

Andres Freund



"as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

From
Alvaro Herrera
Date:
> On 2020-11-17 12:55:01 -0300, Alvaro Herrera wrote:
> > ... ah, but I realize now that this means that we can use shared lock
> > here, not exclusive, which is already an enormous improvement.  That's
> > because ->pgxactoff can only be changed with exclusive lock held; so as
> > long as we hold shared, the array item cannot move.
> 
> Uh, wait a second. The acquisition of this lock hasn't been affected by
> the snapshot scalability changes, and therefore are unrelated to
> ->pgxactoff changing or not.

I'm writing a response trying to thoroughly analyze this, but I also
wanted to report that ProcSleep is being a bit generous with what it
calls "as quickly as possible" here:

            LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);

            /*
             * Only do it if the worker is not working to protect against Xid
             * wraparound.
             */
            statusFlags = ProcGlobal->statusFlags[autovac->pgxactoff];
            if ((statusFlags & PROC_IS_AUTOVACUUM) &&
                !(statusFlags & PROC_VACUUM_FOR_WRAPAROUND))
            {
                int            pid = autovac->pid;
                StringInfoData locktagbuf;
                StringInfoData logbuf;    /* errdetail for server log */

                initStringInfo(&locktagbuf);
                initStringInfo(&logbuf);
                DescribeLockTag(&locktagbuf, &lock->tag);
                appendStringInfo(&logbuf,
                                 _("Process %d waits for %s on %s."),
                                 MyProcPid,
                                 GetLockmodeName(lock->tag.locktag_lockmethodid,
                                                 lockmode),
                                 locktagbuf.data);

                /* release lock as quickly as possible */
                LWLockRelease(ProcArrayLock);

The amount of stuff that this is doing with ProcArrayLock held
exclusively seems a bit excessive; it sounds like we could copy the
values we need first, release the lock, and *then* do all that memory
allocation and string printing -- it's a lock of code for such a
contended lock.  Anytime a process sees itself as blocked by autovacuum
and wants to signal it, there's a ProcArrayLock hiccup (I didn't
actually measure it, but it's at least five function calls).  We could
make this more concurrent by copying lock->tag to a local variable,
releasing the lock, then doing all the string formatting and printing.
See attached quickly.patch.

Now, when this code was written (d7318d43d, 2012), this was a LOG
message; it was demoted to DEBUG1 later (d8f15c95bec, 2015).  I think it
would be fair to ... remove the message?  Or go back to Simon's original
formulation from commit acac68b2bca, which had this message as DEBUG2
without any string formatting.

Thoughts?

Attachment

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

From
Andres Freund
Date:
Hi,

On 2020-11-18 18:41:27 -0300, Alvaro Herrera wrote:
> The amount of stuff that this is doing with ProcArrayLock held
> exclusively seems a bit excessive; it sounds like we could copy the
> values we need first, release the lock, and *then* do all that memory
> allocation and string printing -- it's a lock of code for such a
> contended lock.

Yea, that's a good point.


> Anytime a process sees itself as blocked by autovacuum
> and wants to signal it, there's a ProcArrayLock hiccup (I didn't
> actually measure it, but it's at least five function calls).

I'm a bit doubtful it's that important - it's limited in frequency
by deadlock_timeout. But worth improving anyway.


> We could make this more concurrent by copying lock->tag to a local
> variable, releasing the lock, then doing all the string formatting and
> printing.  See attached quickly.patch.

Sounds like a plan.


> Now, when this code was written (d7318d43d, 2012), this was a LOG
> message; it was demoted to DEBUG1 later (d8f15c95bec, 2015).  I think it
> would be fair to ... remove the message?  Or go back to Simon's original
> formulation from commit acac68b2bca, which had this message as DEBUG2
> without any string formatting.

I don't really have an opinion on this.

Greetings,

Andres Freund



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Michael Paquier
Date:
On Wed, Nov 18, 2020 at 11:09:28AM -0800, Andres Freund wrote:
> Uh, wait a second. The acquisition of this lock hasn't been affected by
> the snapshot scalability changes, and therefore are unrelated to
> ->pgxactoff changing or not.
>
> In 13 this is:
>         LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>         MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
>         if (params->is_wraparound)
>             MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
>         LWLockRelease(ProcArrayLock);
>
> Lowering this to a shared lock doesn't seem right, at least without a
> detailed comment explaining why it's safe. Because GetSnapshotData() etc
> look at all procs with just an LW_SHARED ProcArrayLock, changing
> vacuumFlags without a lock means that two concurrent horizon
> computations could come to a different result.
>
> I'm not saying it's definitely wrong to relax things here, but I'm not
> sure we've evaluated it sufficiently.

Yeah.  While I do like the new assertion that 27838981 has added in
ProcArrayEndTransactionInternal(), this commit feels a bit rushed to
me.  Echoing with my comment from upthread, I am not sure that we
still document enough why a shared lock would be completely fine in
the case of statusFlags.  We have no hints that this could be fine
before 2783898, and 2783898 does not make that look better.  FWIW, I
think that just using LW_EXCLUSIVE for the CIC change would have been
fine enough first, after evaluating the performance impact.  We could
evaluate if it is possible to lower the ProcArrayLock acquisition in
those code paths on a separate thread.
--
Michael

Attachment

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

From
Michael Paquier
Date:
On Wed, Nov 18, 2020 at 02:48:40PM -0800, Andres Freund wrote:
> On 2020-11-18 18:41:27 -0300, Alvaro Herrera wrote:
>> We could make this more concurrent by copying lock->tag to a local
>> variable, releasing the lock, then doing all the string formatting and
>> printing.  See attached quickly.patch.
>
> Sounds like a plan.

+1.

>> Now, when this code was written (d7318d43d, 2012), this was a LOG
>> message; it was demoted to DEBUG1 later (d8f15c95bec, 2015).  I think it
>> would be fair to ... remove the message?  Or go back to Simon's original
>> formulation from commit acac68b2bca, which had this message as DEBUG2
>> without any string formatting.
>
> I don't really have an opinion on this.

That still looks useful for debugging, so DEBUG1 sounds fine to me.
--
Michael

Attachment

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

From
Michael Paquier
Date:
On Thu, Nov 19, 2020 at 12:13:44PM +0900, Michael Paquier wrote:
> That still looks useful for debugging, so DEBUG1 sounds fine to me.

By the way, it strikes me that you could just do nothing as long as
(log_min_messages > DEBUG1), so you could encapsulate most of the
logic that plays with the lock tag using that.
--
Michael

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
On 2020-Nov-18, Andres Freund wrote:

> In 13 this is:
>         LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>         MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
>         if (params->is_wraparound)
>             MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND;
>         LWLockRelease(ProcArrayLock);
> 
> Lowering this to a shared lock doesn't seem right, at least without a
> detailed comment explaining why it's safe. Because GetSnapshotData() etc
> look at all procs with just an LW_SHARED ProcArrayLock, changing
> vacuumFlags without a lock means that two concurrent horizon
> computations could come to a different result.
> 
> I'm not saying it's definitely wrong to relax things here, but I'm not
> sure we've evaluated it sufficiently.

True.  Let's evaluate it.

I changed three spots where statusFlags bits are written:

a) StartupDecodingContext: sets PROC_IN_LOGICAL_DECODING
b) ReplicationSlotRelease: removes PROC_IN_LOGICAL_DECODING
c) vacuum_rel: sets PROC_IN_VACUUM and potentially
   PROC_VACUUM_FOR_WRAPAROUND

Who reads these flags?

PROC_IN_LOGICAL_DECODING is read by:
 * ComputeXidHorizons
 * GetSnapshotData

PROC_IN_VACUUM is read by:
 * GetCurrentVirtualXIDs
 * ComputeXidHorizons
 * GetSnapshotData
 * ProcArrayInstallImportedXmin

PROC_VACUUM_FOR_WRAPAROUND is read by:
 * ProcSleep


ProcSleep:  no bug here.
The flags are checked to see if we should kill() the process (used when
autovac blocks some other process).  The lock here appears to be
inconsequential, since we release it before we do kill(); so strictly
speaking, there is still a race condition where the autovac process
could reset the flag after we read it and before we get a chance to
signal it.  The lock level autovac uses to set the flag is not relevant
either.

ProcArrayInstallImportedXmin:
This one is just searching for a matching backend; not affected by the
flags.

PROC_IN_LOGICAL_DECODING:
Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
ReplicationSlotRelease might be the most problematic one of the lot.
That's because a proc's xmin that had been ignored all along by
ComputeXidHorizons, will now be included in the computation.  Adding
asserts that proc->xmin and proc->xid are InvalidXid by the time we
reset the flag, I got hits in pg_basebackup, test_decoding and
subscription tests.  I think it's OK for ComputeXidHorizons (since it
just means that a vacuum that reads a later will remove less rows.)  But
in GetSnapshotData it is just not correct to have the Xmin go backwards.

Therefore it seems to me that this code has a bug independently of the
lock level used.


GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:

In these cases, what we need is that the code computes some xmin (or
equivalent computation) based on a set of transactions that exclude
those marked with the flags.  The behavior we want is that if some
transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
one*.  In other words, if there's no Xid/Xmin, then the flag is not
important.  So if we can ensure that the flag is set first, and the
Xid/xmin is installed later, that's sufficient correctness and we don't
need to hold exclusive lock.  But if we can't ensure that, then we must
use exclusive lock, because otherwise we risk another process seeing our
Xid first and not our flag, which would be bad.

In other words, my conclusion is that there definitely is a bug here and
I am going to restore the use of exclusive lock for setting the
statusFlags.


GetSnapshotData has an additional difficulty -- we do the
UINT32_ACCESS_ONCE(ProcGlobal->xid[i]) read *before* testing the bit. 
So it's not valid to set the bit without locking out GSD, regardless of
any barriers we had; if we want this to be safe, we'd have to change
this so that the flag is read first, and we read the xid only
afterwards, with a read barrier.

I *think* we could relax the lock if we had a write barrier in
between: set the flag first, issue a write barrier, set the Xid.
(I have to admit I'm confused about what needs to happen in the read
case: read the bit first, potentially skip the PGPROC entry; but can we
read the Xid ahead of reading the flag, and if we do read the xid
afterwards, do we need a read barrier in between?)
Given this uncertainty, I'm not proposing to relax the lock from
exclusive to shared.


I didn't get around to reading ComputeXidHorizons, but it seems like
it'd have the same problem as GSD.



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> PROC_IN_LOGICAL_DECODING:
> Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
> ReplicationSlotRelease might be the most problematic one of the lot.
> That's because a proc's xmin that had been ignored all along by
> ComputeXidHorizons, will now be included in the computation.  Adding
> asserts that proc->xmin and proc->xid are InvalidXid by the time we
> reset the flag, I got hits in pg_basebackup, test_decoding and
> subscription tests.  I think it's OK for ComputeXidHorizons (since it
> just means that a vacuum that reads a later will remove less rows.)  But
> in GetSnapshotData it is just not correct to have the Xmin go backwards.

> Therefore it seems to me that this code has a bug independently of the
> lock level used.

That is only a bug if the flags are *cleared* in a way that's not
atomic with clearing the transaction's xid/xmin, no?  I agree that
once set, the flag had better stay set till transaction end, but
that's not what's at stake here.

> GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:

> In these cases, what we need is that the code computes some xmin (or
> equivalent computation) based on a set of transactions that exclude
> those marked with the flags.  The behavior we want is that if some
> transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
> one*.  In other words, if there's no Xid/Xmin, then the flag is not
> important.  So if we can ensure that the flag is set first, and the
> Xid/xmin is installed later, that's sufficient correctness and we don't
> need to hold exclusive lock.  But if we can't ensure that, then we must
> use exclusive lock, because otherwise we risk another process seeing our
> Xid first and not our flag, which would be bad.

I don't buy this either.  You get the same result if someone looks just
before you take the ProcArrayLock to set the flag.  So if there's a
problem, it's inherent in the way that the flags are defined or used;
the strength of lock used in this stanza won't affect it.

            regards, tom lane



Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

From
Alvaro Herrera
Date:
On 2020-Nov-19, Michael Paquier wrote:

> On Thu, Nov 19, 2020 at 12:13:44PM +0900, Michael Paquier wrote:
> > That still looks useful for debugging, so DEBUG1 sounds fine to me.
> 
> By the way, it strikes me that you could just do nothing as long as
> (log_min_messages > DEBUG1), so you could encapsulate most of the
> logic that plays with the lock tag using that.

Good idea, done.

I also noticed that if we're going to accept a race (which BTW already
exists) we may as well simplify the code about it.

I think the attached is the final form of this.

Attachment
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2020-Nov-19, Michael Paquier wrote:
>> By the way, it strikes me that you could just do nothing as long as
>> (log_min_messages > DEBUG1), so you could encapsulate most of the
>> logic that plays with the lock tag using that.

> Good idea, done.

I'm less sure that that's a good idea.  It embeds knowledge here that
should not exist outside elog.c; moreover, I'm not entirely sure that
it's even correct, given the nonlinear ranking of log_min_messages.

Maybe it'd be a good idea to have elog.c expose a new function
along the lines of "bool message_level_is_interesting(int elevel)"
to support this and similar future optimizations in a less fragile way.

            regards, tom lane



Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

From
Alvaro Herrera
Date:
On 2020-Nov-23, Tom Lane wrote:

> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > On 2020-Nov-19, Michael Paquier wrote:
> >> By the way, it strikes me that you could just do nothing as long as
> >> (log_min_messages > DEBUG1), so you could encapsulate most of the
> >> logic that plays with the lock tag using that.
> 
> > Good idea, done.
> 
> I'm less sure that that's a good idea.  It embeds knowledge here that
> should not exist outside elog.c; moreover, I'm not entirely sure that
> it's even correct, given the nonlinear ranking of log_min_messages.

Well, we already do this in a number of places.  But I can get behind
this:

> Maybe it'd be a good idea to have elog.c expose a new function
> along the lines of "bool message_level_is_interesting(int elevel)"
> to support this and similar future optimizations in a less fragile way.



Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Well, we already do this in a number of places.  But I can get behind
> this:

>> Maybe it'd be a good idea to have elog.c expose a new function
>> along the lines of "bool message_level_is_interesting(int elevel)"
>> to support this and similar future optimizations in a less fragile way.

I'll see about a patch for that.

            regards, tom lane



Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

From
Alvaro Herrera
Date:
On 2020-Nov-23, Tom Lane wrote:

> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Well, we already do this in a number of places.  But I can get behind
> > this:
> 
> >> Maybe it'd be a good idea to have elog.c expose a new function
> >> along the lines of "bool message_level_is_interesting(int elevel)"
> >> to support this and similar future optimizations in a less fragile way.
> 
> I'll see about a patch for that.

Looking at that now ...



Here's a draft patch.

            regards, tom lane

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 03c553e7ea..9cd0b7c11b 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5344,7 +5344,7 @@ static void
 ShowTransactionState(const char *str)
 {
     /* skip work if message will definitely not be printed */
-    if (log_min_messages <= DEBUG5 || client_min_messages <= DEBUG5)
+    if (message_level_is_interesting(DEBUG5))
         ShowTransactionStateRec(str, CurrentTransactionState);
 }

@@ -5371,7 +5371,6 @@ ShowTransactionStateRec(const char *str, TransactionState s)
     if (s->parent)
         ShowTransactionStateRec(str, s->parent);

-    /* use ereport to suppress computation if msg will not be printed */
     ereport(DEBUG5,
             (errmsg_internal("%s(%d) name: %s; blockState: %s; state: %s, xid/subid/cid: %u/%u/%u%s%s",
                              str, s->nestingLevel,
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 7e915bcadf..32a3099c1f 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -105,7 +105,7 @@ log_invalid_page(RelFileNode node, ForkNumber forkno, BlockNumber blkno,
      * tracing of the cause (note the elog context mechanism will tell us
      * something about the XLOG record that generated the reference).
      */
-    if (log_min_messages <= DEBUG1 || client_min_messages <= DEBUG1)
+    if (message_level_is_interesting(DEBUG1))
         report_invalid_page(DEBUG1, node, forkno, blkno, present);

     if (invalid_page_tab == NULL)
@@ -159,7 +159,7 @@ forget_invalid_pages(RelFileNode node, ForkNumber forkno, BlockNumber minblkno)
             hentry->key.forkno == forkno &&
             hentry->key.blkno >= minblkno)
         {
-            if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
+            if (message_level_is_interesting(DEBUG2))
             {
                 char       *path = relpathperm(hentry->key.node, forkno);

@@ -192,7 +192,7 @@ forget_invalid_pages_db(Oid dbid)
     {
         if (hentry->key.node.dbNode == dbid)
         {
-            if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
+            if (message_level_is_interesting(DEBUG2))
             {
                 char       *path = relpathperm(hentry->key.node, hentry->key.forkno);

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index b0d037600e..245c2f4fc8 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1146,15 +1146,9 @@ reportDependentObjects(const ObjectAddresses *targetObjects,
      * If no error is to be thrown, and the msglevel is too low to be shown to
      * either client or server log, there's no need to do any of the rest of
      * the work.
-     *
-     * Note: this code doesn't know all there is to be known about elog
-     * levels, but it works for NOTICE and DEBUG2, which are the only values
-     * msglevel can currently have.  We also assume we are running in a normal
-     * operating environment.
      */
     if (behavior == DROP_CASCADE &&
-        msglevel < client_min_messages &&
-        (msglevel < log_min_messages || log_min_messages == LOG))
+        !message_level_is_interesting(msglevel))
         return;

     /*
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index babee386c4..87c3ea450e 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1215,7 +1215,7 @@ ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime)
     walrcv->lastMsgReceiptTime = lastMsgReceiptTime;
     SpinLockRelease(&walrcv->mutex);

-    if (log_min_messages <= DEBUG2)
+    if (message_level_is_interesting(DEBUG2))
     {
         char       *sendtime;
         char       *receipttime;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 5d1b1a16be..2eb19ad293 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1900,7 +1900,7 @@ ProcessStandbyReplyMessage(void)
     replyTime = pq_getmsgint64(&reply_message);
     replyRequested = pq_getmsgbyte(&reply_message);

-    if (log_min_messages <= DEBUG2)
+    if (message_level_is_interesting(DEBUG2))
     {
         char       *replyTimeStr;

@@ -2082,7 +2082,7 @@ ProcessStandbyHSFeedbackMessage(void)
     feedbackCatalogXmin = pq_getmsgint(&reply_message, 4);
     feedbackCatalogEpoch = pq_getmsgint(&reply_message, 4);

-    if (log_min_messages <= DEBUG2)
+    if (message_level_is_interesting(DEBUG2))
     {
         char       *replyTimeStr;

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1ba47c194b..04c2245338 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -600,6 +600,42 @@ errfinish(const char *filename, int lineno, const char *funcname)
     CHECK_FOR_INTERRUPTS();
 }

+/*
+ * message_level_is_interesting --- would ereport/elog do anything?
+ *
+ * Returns true if ereport/elog with this elevel will not be a no-op.
+ * This is useful to short-circuit any expensive preparatory work that
+ * might be needed for a logging message.  There is no point in
+ * prepending this to a bare ereport/elog call, however.
+ */
+bool
+message_level_is_interesting(int elevel)
+{
+    bool        output_to_server;
+    bool        output_to_client;
+
+    /*
+     * Keep this in sync with the decision-making in errstart().
+     */
+    if (elevel >= ERROR)
+        return true;
+
+    output_to_server = is_log_level_output(elevel, log_min_messages);
+    if (output_to_server)
+        return true;
+
+    if (whereToSendOutput == DestRemote && elevel != LOG_SERVER_ONLY &&
+        !ClientAuthInProgress)
+    {
+        output_to_client = (elevel >= client_min_messages ||
+                            elevel == INFO);
+        if (output_to_client)
+            return true;
+    }
+
+    return false;
+}
+

 /*
  * errcode --- add SQLSTATE error code to the current error
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 1e09ee0541..de5bba0a4a 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -149,6 +149,8 @@
 extern bool errstart(int elevel, const char *domain);
 extern void errfinish(const char *filename, int lineno, const char *funcname);

+extern bool message_level_is_interesting(int elevel);
+
 extern int    errcode(int sqlerrcode);

 extern int    errcode_for_file_access(void);

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

From
Alvaro Herrera
Date:
On 2020-Nov-23, Tom Lane wrote:

> Here's a draft patch.

Here's another of my own.  Outside of elog.c it seems identical.


Attachment

Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

From
Alvaro Herrera
Date:
On 2020-Nov-23, Alvaro Herrera wrote:

> On 2020-Nov-23, Tom Lane wrote:
> 
> > Here's a draft patch.
> 
> Here's another of my own.  Outside of elog.c it seems identical.

Your version has the advantage that errstart() doesn't get a new
function call.  I'm +1 for going with that ... we could avoid the
duplicate code with some additional contortions but this changes so
rarely that it's probably not worth the trouble.




Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2020-Nov-23, Tom Lane wrote:
>> Here's a draft patch.

> Here's another of my own.  Outside of elog.c it seems identical.

Ah, I see I didn't cover the case in ProcSleep that you were originally on
about ... I'd just looked for existing references to log_min_messages
and client_min_messages.

I think it's important to have the explicit check for elevel >= ERROR.
I'm not too fussed about whether we invent is_log_level_output_client,
although that name doesn't seem well-chosen compared to
is_log_level_output.

Shall I press forward with this, or do you want to?

            regards, tom lane



Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Your version has the advantage that errstart() doesn't get a new
> function call.  I'm +1 for going with that ... we could avoid the
> duplicate code with some additional contortions but this changes so
> rarely that it's probably not worth the trouble.

I was considering adding that factorization, but marking the function
inline to avoid adding overhead.  Most of elog.c predates our use of
inline, so it wasn't considered when this code was written.

            regards, tom lane



Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

From
Alvaro Herrera
Date:
On 2020-Nov-23, Tom Lane wrote:

> Ah, I see I didn't cover the case in ProcSleep that you were originally on
> about ... I'd just looked for existing references to log_min_messages
> and client_min_messages.

Yeah, it seemed bad form to add that when you had just argued against it
:-)

> I think it's important to have the explicit check for elevel >= ERROR.
> I'm not too fussed about whether we invent is_log_level_output_client,
> although that name doesn't seem well-chosen compared to
> is_log_level_output.

Just replacing "log" for "client" in that seemed strictly worse, and I
didn't (don't) have any other ideas.

> Shall I press forward with this, or do you want to?

Please feel free to go ahead, including the change to ProcSleep.



Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2020-Nov-23, Tom Lane wrote:
>> I'm not too fussed about whether we invent is_log_level_output_client,
>> although that name doesn't seem well-chosen compared to
>> is_log_level_output.

> Just replacing "log" for "client" in that seemed strictly worse, and I
> didn't (don't) have any other ideas.

I never cared that much for "is_log_level_output" either.  Thinking
about renaming it to "should_output_to_log()", and then the new function
would be "should_output_to_client()".

>> Shall I press forward with this, or do you want to?

> Please feel free to go ahead, including the change to ProcSleep.

Will do.

            regards, tom lane



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Andres Freund
Date:
Hi,

On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:
> ProcSleep:  no bug here.
> The flags are checked to see if we should kill() the process (used when
> autovac blocks some other process).  The lock here appears to be
> inconsequential, since we release it before we do kill(); so strictly
> speaking, there is still a race condition where the autovac process
> could reset the flag after we read it and before we get a chance to
> signal it.  The lock level autovac uses to set the flag is not relevant
> either.

Yea. Even before the recent changes building the lock message under the
lock didn't make sense. Now that the flags are mirrored in pgproc, we
probably should just make this use READ_ONCE(autovac->statusFlags),
without any other use of ProcArrayLock.  As you say the race condition
is between the flag test, the kill, and the signal being processed,
which wasn't previously protected either.


> PROC_IN_LOGICAL_DECODING:
> Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
> ReplicationSlotRelease might be the most problematic one of the lot.
> That's because a proc's xmin that had been ignored all along by
> ComputeXidHorizons, will now be included in the computation.  Adding
> asserts that proc->xmin and proc->xid are InvalidXid by the time we
> reset the flag, I got hits in pg_basebackup, test_decoding and
> subscription tests.  I think it's OK for ComputeXidHorizons (since it
> just means that a vacuum that reads a later will remove less rows.)  But
> in GetSnapshotData it is just not correct to have the Xmin go backwards.

I don't think there's a problem. PROC_IN_LOGICAL_DECODING can only be
set when outside a transaction block, i.e. walsender. In which case
there shouldn't be an xid/xmin, I think? Or did you gate your assert on
PROC_IN_LOGICAL_DECODING being set?


> GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:
> 
> In these cases, what we need is that the code computes some xmin (or
> equivalent computation) based on a set of transactions that exclude
> those marked with the flags.  The behavior we want is that if some
> transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
> one*. In other words, if there's no Xid/Xmin, then the flag is not
> important.  So if we can ensure that the flag is set first, and the
> Xid/xmin is installed later, that's sufficient correctness and we don't
> need to hold exclusive lock.

That'd require at least memory barriers in GetSnapshotData()'s loop,
which I'd really like to avoid. Otherwise the order in which memory gets
written in one process doesn't guarantee the order of visibility in
another process...



> In other words, my conclusion is that there definitely is a bug here and
> I am going to restore the use of exclusive lock for setting the
> statusFlags.

Cool.


> GetSnapshotData has an additional difficulty -- we do the
> UINT32_ACCESS_ONCE(ProcGlobal->xid[i]) read *before* testing the bit. 
> So it's not valid to set the bit without locking out GSD, regardless of
> any barriers we had; if we want this to be safe, we'd have to change
> this so that the flag is read first, and we read the xid only
> afterwards, with a read barrier.

Which we don't want, because that'd mean slowing down the *extremely*
common case of the majority of backends neither having an xid assigned
nor doing logical decoding / vacuum. We'd be reading two cachelines
instead of one.

Greetings,

Andres Freund



Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

From
Alvaro Herrera
Date:
On 2020-Nov-23, Tom Lane wrote:

> I never cared that much for "is_log_level_output" either.  Thinking
> about renaming it to "should_output_to_log()", and then the new function
> would be "should_output_to_client()".

Ah, that sounds nicely symmetric and grammatical.

Thanks!



Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)

From
Michael Paquier
Date:
On Mon, Nov 23, 2020 at 06:13:17PM -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>> Please feel free to go ahead, including the change to ProcSleep.
>
> Will do.

Thank you both for 450c823 and 789b938.
--
Michael

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
On 2020-Nov-23, Andres Freund wrote:

> On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:

> > PROC_IN_LOGICAL_DECODING:
> > Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in
> > ReplicationSlotRelease might be the most problematic one of the lot.
> > That's because a proc's xmin that had been ignored all along by
> > ComputeXidHorizons, will now be included in the computation.  Adding
> > asserts that proc->xmin and proc->xid are InvalidXid by the time we
> > reset the flag, I got hits in pg_basebackup, test_decoding and
> > subscription tests.  I think it's OK for ComputeXidHorizons (since it
> > just means that a vacuum that reads a later will remove less rows.)  But
> > in GetSnapshotData it is just not correct to have the Xmin go backwards.
> 
> I don't think there's a problem. PROC_IN_LOGICAL_DECODING can only be
> set when outside a transaction block, i.e. walsender. In which case
> there shouldn't be an xid/xmin, I think? Or did you gate your assert on
> PROC_IN_LOGICAL_DECODING being set?

Ah, you're right about this one --  I missed the significance of setting
the flag only "when outside of a transaction block" at the time we call
StartupDecodingContext.




Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
On 2020-Nov-23, Tom Lane wrote:

> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

> > GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData:
> 
> > In these cases, what we need is that the code computes some xmin (or
> > equivalent computation) based on a set of transactions that exclude
> > those marked with the flags.  The behavior we want is that if some
> > transaction is marked as vacuum, we ignore the Xid/Xmin *if there is
> > one*.  In other words, if there's no Xid/Xmin, then the flag is not
> > important.  So if we can ensure that the flag is set first, and the
> > Xid/xmin is installed later, that's sufficient correctness and we don't
> > need to hold exclusive lock.  But if we can't ensure that, then we must
> > use exclusive lock, because otherwise we risk another process seeing our
> > Xid first and not our flag, which would be bad.
> 
> I don't buy this either.  You get the same result if someone looks just
> before you take the ProcArrayLock to set the flag.  So if there's a
> problem, it's inherent in the way that the flags are defined or used;
> the strength of lock used in this stanza won't affect it.

The problem is that the writes could be reordered in a way that makes
the Xid appear set to an onlooker before PROC_IN_VACUUM appears set.
Vacuum always sets the bit first, and *then* the xid.  If the reader
always reads it like that then it's not a problem.  But in order to
guarantee that, we would have to have a read barrier for each pass
through the loop.

With the LW_EXCLUSIVE lock, we block the readers so that the bit is
known set by the time they examine it.  As I understand, getting the
lock is itself a barrier, so there's no danger that we'll set the bit
and they won't see it.


... at least, that how I *imagine* the argument to be.  In practice,
vacuum_rel() calls GetSnapshotData() before installing the
PROC_IN_VACUUM bit, and therefore there *is* a risk that reader 1 will
get MyProc->xmin included in their snapshot (because bit not yet set),
and reader 2 won't.  If my understanding is correct, then we should move
the PushActiveSnapshot(GetTransactionSnapshot()) call to after we have
the PROC_IN_VACUUM bit set.



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
On 2020-Nov-23, Andres Freund wrote:

> On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:

> > In other words, my conclusion is that there definitely is a bug here and
> > I am going to restore the use of exclusive lock for setting the
> > statusFlags.
> 
> Cool.

Here's a patch.

Note it also moves the computation of vacuum's Xmin (per
GetTransactionSnapshot) to *after* the bit has been set in statusFlags.

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
On 2020-Nov-25, Alvaro Herrera wrote:

> On 2020-Nov-23, Andres Freund wrote:
> 
> > On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:
> 
> > > In other words, my conclusion is that there definitely is a bug here and
> > > I am going to restore the use of exclusive lock for setting the
> > > statusFlags.
> > 
> > Cool.
> 
> Here's a patch.

Pushed, thanks.




Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
So let's discuss the next step in this series: what to do about REINDEX
CONCURRENTLY.

I started with Dmitry's patch (an updated version of which I already
posted in [1]).  However, Dmitry missed (and I hadn't noticed) that some
of the per-index loops are starting transactions also, and that we need
to set the flag in those.  And what's more, in a couple of the
function-scope transactions we do set the flag pointlessly: the
transactions there do not acquire a snapshot, so there's no reason to
set the flag at all, because WaitForOlderSnapshots ignores sessions
whose Xmin is 0.

There are also transactions that wait first, without setting a snapshot,
and then do some catalog manipulations.  I think it's prett much useless
to set the flag for those, because they're going to be very short
anyway.  (There's also one case of this in CREATE INDEX CONCURRENTLY.)

But there's a more interesting point also.  In Dmitry's patch, we
determine safety for *all* indexes being processed as a set, and then
apply the flag only if they're all deemed safe.  But we can optimize
this, and set the flag for each index' transaction individually, and
only skip it for those specific indexes that are unsafe.  So I propose
to change the data structure used in ReindexRelationConcurrently from
the current list of OIDs to a list of (oid,boolean) pairs, to be able to
track setting the flag individually.

There's one more useful observation: in the function-scope transactions
(outside the per-index loops), we don't touch the contents of any
indexes; we just wait or do some catalog manipulation.  So we can set
the flag *regardless of the safety of any indexes*.  We only need to
care about the safety of the indexes in the phases where we build the
indexes and when we validate them.


[1] https://postgr.es/m/20201118175804.GA23027@alvherre.pgsql



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
On 2020-Nov-26, Alvaro Herrera wrote:

> So let's discuss the next step in this series: what to do about REINDEX
> CONCURRENTLY.

> [...]

... as in the attached.


Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
Actually, I noticed two things.  The first of them, addressed in this
new version of the patch, is that REINDEX CONCURRENTLY is doing a lot of
repetitive work by reopening each index and table in the build/validate
loops, so that they can report progress.  This is easy to remedy by
adding a couple more members to the new struct (which I also renamed to
ReindexIndexInfo), for tableId and amId.  The code seems a bit simpler
this way.

The other thing is that ReindexRelationConcurrenty seems to always be
called with the relations already locked by RangeVarGetRelidExtended.
So claiming to acquire locks on the relations over and over is
pointless.  (I only noticed this because there was an obvious deadlock
hazard in one of the loops, that locked index before table.)  I think we
should reduce all those to NoLock.  My patch does not do that.

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
In the interest of showing progress, I'm going to mark this CF item as
committed, and I'll submit the remaining pieces in a new thread.

Thanks!



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Michael Paquier
Date:
On Mon, Nov 30, 2020 at 04:15:27PM -0300, Alvaro Herrera wrote:
> In the interest of showing progress, I'm going to mark this CF item as
> committed, and I'll submit the remaining pieces in a new thread.

Thanks for splitting!
--
Michael

Attachment

Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Andres Freund
Date:
Hi,

On 2020-11-25 17:03:58 -0300, Alvaro Herrera wrote:
> On 2020-Nov-23, Andres Freund wrote:
> 
> > On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote:
> 
> > > In other words, my conclusion is that there definitely is a bug here and
> > > I am going to restore the use of exclusive lock for setting the
> > > statusFlags.
> > 
> > Cool.
> 
> Here's a patch.
> 
> Note it also moves the computation of vacuum's Xmin (per
> GetTransactionSnapshot) to *after* the bit has been set in statusFlags.

> From b813c67a4abe2127b8bd13db7e920f958db15d59 Mon Sep 17 00:00:00 2001
> From: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Date: Tue, 24 Nov 2020 18:10:42 -0300
> Subject: [PATCH] Restore lock level to update statusFlags
> 
> Reverts 27838981be9d (some comments are kept).  Per discussion, it does
> not seem safe to relax the lock level used for this; in order for it to
> be safe, there would have to be memory barriers between the point we set
> the flag and the point we set the trasaction Xid, which perhaps would
> not be so bad; but there would also have to be barriers at the readers'
> side, which from a performance perspective might be bad.
> 
> Now maybe this analysis is wrong and it *is* safe for some reason, but
> proof of that is not trivial.

I just noticed that this commit (dcfff74fb16) didn't revert the change of lock
level in ReplicationSlotRelease(). Was that intentional?

Greetings,

Andres Freund



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Alvaro Herrera
Date:
On 2021-Nov-10, Andres Freund wrote:

> > Reverts 27838981be9d (some comments are kept).  Per discussion, it does
> > not seem safe to relax the lock level used for this; in order for it to
> > be safe, there would have to be memory barriers between the point we set
> > the flag and the point we set the trasaction Xid, which perhaps would
> > not be so bad; but there would also have to be barriers at the readers'
> > side, which from a performance perspective might be bad.
> > 
> > Now maybe this analysis is wrong and it *is* safe for some reason, but
> > proof of that is not trivial.
> 
> I just noticed that this commit (dcfff74fb16) didn't revert the change of lock
> level in ReplicationSlotRelease(). Was that intentional?

Hmm, no, that seems to have been a mistake.  I'll restore it.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)



Re: remove spurious CREATE INDEX CONCURRENTLY wait

From
Andres Freund
Date:
On 2021-11-11 09:38:07 -0300, Alvaro Herrera wrote:
> > I just noticed that this commit (dcfff74fb16) didn't revert the change of lock
> > level in ReplicationSlotRelease(). Was that intentional?
> 
> Hmm, no, that seems to have been a mistake.  I'll restore it.

Thanks