Thread: remove spurious CREATE INDEX CONCURRENTLY wait
I previously[1] posted a patch to have multiple CREATE INDEX CONCURRENTLY not wait for the slowest of them. This is an update of that, with minor conflicts fixed and a fresh thread. To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all other CICs running concurrently to finish, because they can't be distinguished amidst other old snapshots. We can change things by having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating that it's doing CIC; other CICs will see that flag and will know that they don't need to wait for those processes. With this, CIC on small tables don't have to wait for CIC on large tables to complete. [1] https://postgr.es/m/20200805021109.GA9079@alvherre.pgsql -- Álvaro Herrera http://www.linkedin.com/in/alvherre "Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)
Attachment
+ James Coleman -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all > other CICs running concurrently to finish, because they can't be > distinguished amidst other old snapshots. We can change things by > having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating > that it's doing CIC; other CICs will see that flag and will know that > they don't need to wait for those processes. With this, CIC on small > tables don't have to wait for CIC on large tables to complete. Hm. +1 for improving this, if we can, but ... It seems clearly unsafe to ignore a CIC that is in active index-building; a snapshot held for that purpose is just as real as any other. It *might* be all right to ignore a CIC that is just waiting, but you haven't made any argument in the patch comments as to why that's safe either. (Moreover, at the points where we're just waiting, I don't think we have a snapshot, so another CIC's WaitForOlderSnapshots shouldn't wait for us anyway.) Actually, it doesn't look like you've touched the comments at all. WaitForOlderSnapshots' header comment has a long explanation of why it's safe to ignore certain processes. That certainly needs to be updated by any patch that's going to change the rules. BTW, what about REINDEX CONCURRENTLY? regards, tom lane
On Mon, Aug 10, 2020 at 8:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > To recap: currently, any CREATE INDEX CONCURRENTLY will wait for all > > other CICs running concurrently to finish, because they can't be > > distinguished amidst other old snapshots. We can change things by > > having CIC set a special flag in PGPROC (like PROC_IN_VACUUM) indicating > > that it's doing CIC; other CICs will see that flag and will know that > > they don't need to wait for those processes. With this, CIC on small > > tables don't have to wait for CIC on large tables to complete. > > Hm. +1 for improving this, if we can, but ... > > It seems clearly unsafe to ignore a CIC that is in active index-building; > a snapshot held for that purpose is just as real as any other. It *might* > be all right to ignore a CIC that is just waiting, but you haven't made > any argument in the patch comments as to why that's safe either. > (Moreover, at the points where we're just waiting, I don't think we have > a snapshot, so another CIC's WaitForOlderSnapshots shouldn't wait for us > anyway.) Why is a CIC in active index-building something we need to wait for? Wouldn't it fall under a similar kind of logic to the other snapshot types we can explicitly ignore? CIC can't be run in a manual transaction, so the snapshot it holds won't be used to perform arbitrary operations (i.e., the reason why a manual ANALYZE can't be ignored). > Actually, it doesn't look like you've touched the comments at all. > WaitForOlderSnapshots' header comment has a long explanation of why > it's safe to ignore certain processes. That certainly needs to be > updated by any patch that's going to change the rules. Agreed that the comment needs to be updated to discuss the (im)possibility of arbitrary operations within a snapshot held by CIC. James
James Coleman <jtc331@gmail.com> writes: > Why is a CIC in active index-building something we need to wait for? > Wouldn't it fall under a similar kind of logic to the other snapshot > types we can explicitly ignore? CIC can't be run in a manual > transaction, so the snapshot it holds won't be used to perform > arbitrary operations (i.e., the reason why a manual ANALYZE can't be > ignored). Expression indexes that call user-defined functions seem like a pretty serious risk factor for that argument. Those are exactly the same expressions that ANALYZE will evaluate, as a result of which we judge it unsafe to ignore. Why would CIC be different? regards, tom lane
On Tue, Aug 11, 2020 at 11:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > James Coleman <jtc331@gmail.com> writes: > > Why is a CIC in active index-building something we need to wait for? > > Wouldn't it fall under a similar kind of logic to the other snapshot > > types we can explicitly ignore? CIC can't be run in a manual > > transaction, so the snapshot it holds won't be used to perform > > arbitrary operations (i.e., the reason why a manual ANALYZE can't be > > ignored). > > Expression indexes that call user-defined functions seem like a > pretty serious risk factor for that argument. Those are exactly > the same expressions that ANALYZE will evaluate, as a result of > which we judge it unsafe to ignore. Why would CIC be different? The comments for WaitForOlderSnapshots() don't discuss that problem at all; as far as ANALYZE goes they only say: * Manual ANALYZEs, however, can't be excluded because they * might be within transactions that are going to do arbitrary operations * later. But nonetheless over in the thread Álvaro linked to we'd discussed these issues already. Andres in [1] and [2] believed that: > For the snapshot waits we can add a procarray flag > (alongside PROCARRAY_VACUUM_FLAG) indicating that the backend is > doing. Which WaitForOlderSnapshots() can then use to ignore those CICs, > which is safe, because those transactions definitely don't insert into > relations targeted by CIC. The change to WaitForOlderSnapshots() would > just be to pass the new flag to GetCurrentVirtualXIDs, I think. and > What I was thinking of was a new flag, with a distinct value from > PROC_IN_VACUUM. It'd currently just be specified in the > GetCurrentVirtualXIDs() calls in WaitForOlderSnapshots(). That'd avoid > needing to wait for other CICs on different relations. Since CIC is not > permitted on system tables, and CIC doesn't do DML on normal tables, it > seems fairly obviously correct to exclude other CICs. In [3] I'd brought up that a function index can do arbitrary operations (including, terribly, a query of another table), and Andres (in [4]) noted that: > Well, even if we consider this an actual problem, we could still use > PROC_IN_CIC for non-expression non-partial indexes (index operator > themselves better ensure this isn't a problem, or they're ridiculously > broken already - they can get called during vacuum). But went on to describe how this is a problem with all expression indexes (even if those expressions don't do dangerous things) because of syscache lookups, but that ideally for expression indexes we'd have a way to use a different (or more frequently taken) snapshot for the purpose of computing those expressions. That's a significantly more involved patch though. So from what I understand, everything that I'd claimed in my previous message still holds true for non-expression/non-partial indexes. Is there something else I'm missing? James 1: https://www.postgresql.org/message-id/20200325191935.jjhdg2zy5k7ath5v%40alap3.anarazel.de 2: https://www.postgresql.org/message-id/20200325195841.gq4hpl25t6pxv3gl%40alap3.anarazel.de 3: https://www.postgresql.org/message-id/CAAaqYe_fveT_tvKonVt1imujOURUUMrydGeaBGahqLLy9D-REw%40mail.gmail.com 4: https://www.postgresql.org/message-id/20200416221207.wmnzbxvjqqpazeob%40alap3.anarazel.de
On 2020-Aug-11, James Coleman wrote: > In [3] I'd brought up that a function index can do arbitrary > operations (including, terribly, a query of another table), and Andres > (in [4]) noted that: > > > Well, even if we consider this an actual problem, we could still use > > PROC_IN_CIC for non-expression non-partial indexes (index operator > > themselves better ensure this isn't a problem, or they're ridiculously > > broken already - they can get called during vacuum). > > But went on to describe how this is a problem with all expression > indexes (even if those expressions don't do dangerous things) because > of syscache lookups, but that ideally for expression indexes we'd have > a way to use a different (or more frequently taken) snapshot for the > purpose of computing those expressions. That's a significantly more > involved patch though. So the easy first patch here is to add the flag as PROC_IN_SAFE_CIC, which is set only for CIC when it's used for indexes that are neither on expressions nor partial. Then ignore those in WaitForOlderSnapshots. The attached patch does it that way. (Also updated per recent conflicts). I did not set the flag in REINDEX CONCURRENTLY, but as I understand it can be done too, since in essence it's the same thing as a CIC from a snapshot management point of view. Also, per [1], ISTM this flag could be used to tell lazy VACUUM to ignore the Xmin of this process too, which the previous formulation (where all CICs were so marked) could not. This patch doesn't do that yet, but it seems the natural next step to take. [1] https://postgr.es/m/20191101203310.GA12239@alvherre.pgsql -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote: > I did not set the flag in REINDEX CONCURRENTLY, but as I understand it > can be done too, since in essence it's the same thing as a CIC from a > snapshot management point of view. Yes, I see no problems for REINDEX CONCURRENTLY as well as long as there are no predicates and expressions involved. The transactions that should be patched are all started in ReindexRelationConcurrently. The transaction of index_concurrently_swap() cannot set up that though. Only thing to be careful is to make sure that safe_flag is correct depending on the list of indexes worked on. > Also, per [1], ISTM this flag could be used to tell lazy VACUUM to > ignore the Xmin of this process too, which the previous formulation > (where all CICs were so marked) could not. This patch doesn't do that > yet, but it seems the natural next step to take. > > [1] https://postgr.es/m/20191101203310.GA12239@alvherre.pgsql Could we consider renaming vacuumFlags? With more flags associated to a PGPROC entry that are not related to vacuum, the current naming makes things confusing. Something like statusFlags could fit better in the picture? -- Michael
Attachment
> On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote: > On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote: > > I did not set the flag in REINDEX CONCURRENTLY, but as I understand it > > can be done too, since in essence it's the same thing as a CIC from a > > snapshot management point of view. > > Yes, I see no problems for REINDEX CONCURRENTLY as well as long as > there are no predicates and expressions involved. The transactions > that should be patched are all started in ReindexRelationConcurrently. > The transaction of index_concurrently_swap() cannot set up that > though. Only thing to be careful is to make sure that safe_flag is > correct depending on the list of indexes worked on. Hi, After looking through the thread and reading the patch it seems good, and there are only few minor questions: * Doing the same for REINDEX CONCURRENTLY, which does make sense. In fact it's already mentioned in the commentaries as done, which a bit confusing. * Naming, to be more precise what suggested Michael: > Could we consider renaming vacuumFlags? With more flags associated to > a PGPROC entry that are not related to vacuum, the current naming > makes things confusing. Something like statusFlags could fit better > in the picture? which sounds reasonable, and similar one about flag name PROC_IN_SAFE_CIC - if it covers both CREATE INDEX/REINDEX CONCURRENTLY maybe just PROC_IN_SAFE_IC? Any plans about those questions? I can imagine that are the only missing parts.
> On Tue, Nov 03, 2020 at 07:14:47PM +0100, Dmitry Dolgov wrote: > > On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote: > > On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote: > > > I did not set the flag in REINDEX CONCURRENTLY, but as I understand it > > > can be done too, since in essence it's the same thing as a CIC from a > > > snapshot management point of view. > > > > Yes, I see no problems for REINDEX CONCURRENTLY as well as long as > > there are no predicates and expressions involved. The transactions > > that should be patched are all started in ReindexRelationConcurrently. > > The transaction of index_concurrently_swap() cannot set up that > > though. Only thing to be careful is to make sure that safe_flag is > > correct depending on the list of indexes worked on. > > Hi, > > After looking through the thread and reading the patch it seems good, > and there are only few minor questions: > > * Doing the same for REINDEX CONCURRENTLY, which does make sense. In > fact it's already mentioned in the commentaries as done, which a bit > confusing. Just to give it a shot, would the attached change be enough?
Attachment
On Mon, Nov 09, 2020 at 04:47:43PM +0100, Dmitry Dolgov wrote: > > On Tue, Nov 03, 2020 at 07:14:47PM +0100, Dmitry Dolgov wrote: > > > On Thu, Aug 20, 2020 at 03:11:19PM +0900, Michael Paquier wrote: > > > On Wed, Aug 19, 2020 at 02:16:46PM -0400, Alvaro Herrera wrote: > > > > I did not set the flag in REINDEX CONCURRENTLY, but as I understand it > > > > can be done too, since in essence it's the same thing as a CIC from a > > > > snapshot management point of view. > > > > > > Yes, I see no problems for REINDEX CONCURRENTLY as well as long as > > > there are no predicates and expressions involved. The transactions > > > that should be patched are all started in ReindexRelationConcurrently. > > > The transaction of index_concurrently_swap() cannot set up that > > > though. Only thing to be careful is to make sure that safe_flag is > > > correct depending on the list of indexes worked on. > > > > Hi, > > > > After looking through the thread and reading the patch it seems good, > > and there are only few minor questions: > > > > * Doing the same for REINDEX CONCURRENTLY, which does make sense. In > > fact it's already mentioned in the commentaries as done, which a bit > > confusing. > > Just to give it a shot, would the attached change be enough? Could it be possible to rename vacuumFlags to statusFlags first? I did not see any objection to do this suggestion. > + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > + MyProc->vacuumFlags |= PROC_IN_SAFE_IC; > + ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags; > + LWLockRelease(ProcArrayLock); I can't help noticing that you are repeating the same code pattern eight times. I think that this should be in its own routine, and that we had better document that this should be called just after starting a transaction, with an assertion enforcing that. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: >> + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); >> + MyProc->vacuumFlags |= PROC_IN_SAFE_IC; >> + ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags; >> + LWLockRelease(ProcArrayLock); > I can't help noticing that you are repeating the same code pattern > eight times. I think that this should be in its own routine, and that > we had better document that this should be called just after starting > a transaction, with an assertion enforcing that. Do we really need exclusive lock on the ProcArray to make this flag change? That seems pretty bad from a concurrency standpoint. regards, tom lane
On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote: > Do we really need exclusive lock on the ProcArray to make this flag > change? That seems pretty bad from a concurrency standpoint. Any place where we update vacuumFlags acquires an exclusive LWLock on ProcArrayLock. That's held for a very short time, so IMO it won't matter much in practice, particularly if you compare that with the potential gains related to the existing wait phases. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote: >> Do we really need exclusive lock on the ProcArray to make this flag >> change? That seems pretty bad from a concurrency standpoint. > Any place where we update vacuumFlags acquires an exclusive LWLock on > ProcArrayLock. That's held for a very short time, so IMO it won't > matter much in practice, particularly if you compare that with the > potential gains related to the existing wait phases. Not sure I believe that it doesn't matter much in practice. If there's a steady stream of shared ProcArrayLock acquisitions (for snapshot acquisition) then somebody wanting exclusive lock will create a big hiccup, whether they hold it for a short time or not. regards, tom lane
On 2020-Nov-09, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: > > On Mon, Nov 09, 2020 at 08:32:13PM -0500, Tom Lane wrote: > >> Do we really need exclusive lock on the ProcArray to make this flag > >> change? That seems pretty bad from a concurrency standpoint. > > > Any place where we update vacuumFlags acquires an exclusive LWLock on > > ProcArrayLock. That's held for a very short time, so IMO it won't > > matter much in practice, particularly if you compare that with the > > potential gains related to the existing wait phases. > > Not sure I believe that it doesn't matter much in practice. If there's > a steady stream of shared ProcArrayLock acquisitions (for snapshot > acquisition) then somebody wanting exclusive lock will create a big > hiccup, whether they hold it for a short time or not. Yeah ... it would be much better if we can make it use atomics instead. Currently it's an uint8, and in PGPROC itself it's probably not a big deal to enlarge that, but I fear that quadrupling the size of the mirroring array in PROC_HDR might be bad for performance. However, maybe if we use atomics to access it, then we don't need to mirror it anymore? That would need some benchmarking of GetSnapshotData.
On Mon, Nov 09, 2020 at 11:31:15PM -0300, Alvaro Herrera wrote: > Yeah ... it would be much better if we can make it use atomics instead. > Currently it's an uint8, and in PGPROC itself it's probably not a big > deal to enlarge that, but I fear that quadrupling the size of the > mirroring array in PROC_HDR might be bad for performance. However, > maybe if we use atomics to access it, then we don't need to mirror it > anymore? That would need some benchmarking of GetSnapshotData. Hmm. If you worry about the performance impact here, it is possible to do a small performance test without this patch. vacuum_rel() sets the flag for a non-full VACUUM, so with one backend running a manual VACUUM in loop on an empty relation could apply some pressure on any benchmark, even a simple pgbench. -- Michael
Attachment
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Yeah ... it would be much better if we can make it use atomics instead. I was thinking more like "do we need any locking at all". Assuming that a proc's vacuumFlags can be set by only the process itself, there's no write conflicts to worry about. On the read side, there's a hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but that's not any different from what the outcome would be if they looked just before this stanza executes. And even if they don't see it, at worst we lose the optimization being proposed. There is a question of whether it's important that both copies of the flag appear to update atomically ... but that just begs the question "why in heaven's name are there two copies?" regards, tom lane
> On Mon, Nov 09, 2020 at 10:02:27PM -0500, Tom Lane wrote: > > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > Yeah ... it would be much better if we can make it use atomics instead. > > I was thinking more like "do we need any locking at all". > > Assuming that a proc's vacuumFlags can be set by only the process itself, > there's no write conflicts to worry about. On the read side, there's a > hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but > that's not any different from what the outcome would be if they looked > just before this stanza executes. And even if they don't see it, at worst > we lose the optimization being proposed. > > There is a question of whether it's important that both copies of the flag > appear to update atomically ... but that just begs the question "why in > heaven's name are there two copies?" Sounds right, but after reading the thread about GetSnapshotData scalability more thoroughly there seem to be an assumption that those copies have to be updated at the same time under the same lock, and claims that in some cases justification for correctness around not taking ProcArrayLock is too complicated, at least for now. Interesting enough, similar discussion happened about vaccumFlags before with the same conclusion that theoretically it's fine to update without holding the lock, but this assumption could change one day and it's better to avoid such risks. Having said that I believe it makes sense to continue with locking. Are there any other opinions? I'll try to benchmark it in the meantime.
On Thu, Nov 12, 2020 at 04:36:32PM +0100, Dmitry Dolgov wrote: > Interesting enough, similar discussion happened about vaccumFlags before > with the same conclusion that theoretically it's fine to update without > holding the lock, but this assumption could change one day and it's > better to avoid such risks. Having said that I believe it makes sense to > continue with locking. Are there any other opinions? I'll try to > benchmark it in the meantime. Thanks for planning some benchmarking for this specific patch. I have to admit that the possibility of switching vacuumFlags to use atomics is very appealing in the long term, with or without considering this patch, even if we had better be sure that this patch has no actual effect on concurrency first if atomics are not used in worst-case scenarios. -- Michael
Attachment
> On Fri, Nov 13, 2020 at 09:25:40AM +0900, Michael Paquier wrote: > On Thu, Nov 12, 2020 at 04:36:32PM +0100, Dmitry Dolgov wrote: > > Interesting enough, similar discussion happened about vaccumFlags before > > with the same conclusion that theoretically it's fine to update without > > holding the lock, but this assumption could change one day and it's > > better to avoid such risks. Having said that I believe it makes sense to > > continue with locking. Are there any other opinions? I'll try to > > benchmark it in the meantime. > > Thanks for planning some benchmarking for this specific patch. I have > to admit that the possibility of switching vacuumFlags to use atomics > is very appealing in the long term, with or without considering this > patch, even if we had better be sure that this patch has no actual > effect on concurrency first if atomics are not used in worst-case > scenarios. I've tried first to test scenarios where GetSnapshotData produces significant lock contention and "reindex concurrently" implementation with locks interferes with it. The idea I had is to create a test function that constantly calls GetSnapshotData (perf indeed shows significant portion of time spent on contended lock), and clash it with a stream of "reindex concurrently" of an empty relation (which still reaches safe_index check). I guess it could be considered as an artificial extreme case. Measuring GetSnapshotData (or rather the surrounding wrapper, to distinguish calls from the test function from everything else) latency without reindex, with reindex and locks, with reindex without locks should produce different "modes" and comparing them we can make some conclusions. Latency histograms without reindex (nanoseconds): nsecs : count distribution 512 -> 1023 : 0 | | 1024 -> 2047 : 10001209 |****************************************| 2048 -> 4095 : 76936 | | 4096 -> 8191 : 1468 | | 8192 -> 16383 : 98 | | 16384 -> 32767 : 39 | | 32768 -> 65535 : 6 | | The same with reindex without locks: nsecs : count distribution 512 -> 1023 : 0 | | 1024 -> 2047 : 111345 | | 2048 -> 4095 : 6997627 |****************************************| 4096 -> 8191 : 18575 | | 8192 -> 16383 : 586 | | 16384 -> 32767 : 312 | | 32768 -> 65535 : 18 | | The same with reindex with locks: nsecs : count distribution 512 -> 1023 : 0 | | 1024 -> 2047 : 59438 | | 2048 -> 4095 : 6901187 |****************************************| 4096 -> 8191 : 18584 | | 8192 -> 16383 : 581 | | 16384 -> 32767 : 280 | | 32768 -> 65535 : 84 | | Looks like with reindex without locks is indeed faster (there are mode samples in lower time section), but not particularly significant to the whole distribution, especially taking into account extremity of the test. I'll take a look at benchmarking of switching vacuumFlags to use atomics, but as it's probably a bit off topic I'm going to attach another version of the patch with locks and suggested changes. To which I have one question: > Michael Paquier <michael@paquier.xyz> writes: > I think that this should be in its own routine, and that we had better > document that this should be called just after starting a transaction, > with an assertion enforcing that. I'm not sure which exactly assertion condition do you mean?
Attachment
On Tue, 10 Nov 2020 at 03:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > Yeah ... it would be much better if we can make it use atomics instead. > > I was thinking more like "do we need any locking at all". > > Assuming that a proc's vacuumFlags can be set by only the process itself, > there's no write conflicts to worry about. On the read side, there's a > hazard that onlookers will not see the PROC_IN_SAFE_IC flag set; but > that's not any different from what the outcome would be if they looked > just before this stanza executes. And even if they don't see it, at worst > we lose the optimization being proposed. > > There is a question of whether it's important that both copies of the flag > appear to update atomically ... but that just begs the question "why in > heaven's name are there two copies?" Agreed to all of the above, but I think the issue is miniscule because ProcArrayLock is acquired in LW_EXCLUSIVE mode at transaction commit. So it doesn't seem like there is much need to optimize this particular aspect of the patch. -- Simon Riggs http://www.EnterpriseDB.com/
On 2020-Nov-16, Dmitry Dolgov wrote: > The same with reindex without locks: > > nsecs : count distribution > 512 -> 1023 : 0 | | > 1024 -> 2047 : 111345 | | > 2048 -> 4095 : 6997627 |****************************************| > 4096 -> 8191 : 18575 | | > 8192 -> 16383 : 586 | | > 16384 -> 32767 : 312 | | > 32768 -> 65535 : 18 | | > > The same with reindex with locks: > > nsecs : count distribution > 512 -> 1023 : 0 | | > 1024 -> 2047 : 59438 | | > 2048 -> 4095 : 6901187 |****************************************| > 4096 -> 8191 : 18584 | | > 8192 -> 16383 : 581 | | > 16384 -> 32767 : 280 | | > 32768 -> 65535 : 84 | | > > Looks like with reindex without locks is indeed faster (there are mode > samples in lower time section), but not particularly significant to the > whole distribution, especially taking into account extremity of the > test. I didn't analyze these numbers super carefully, but yeah it doesn't look significant. I'm looking at these patches now, with intention to push.
On 2020-Nov-09, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: > >> + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > >> + MyProc->vacuumFlags |= PROC_IN_SAFE_IC; > >> + ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags; > >> + LWLockRelease(ProcArrayLock); > > > I can't help noticing that you are repeating the same code pattern > > eight times. I think that this should be in its own routine, and that > > we had better document that this should be called just after starting > > a transaction, with an assertion enforcing that. > > Do we really need exclusive lock on the ProcArray to make this flag > change? That seems pretty bad from a concurrency standpoint. BTW I now know that the reason for taking ProcArrayLock is not the vacuumFlags itself, but rather MyProc->pgxactoff, which can move. On the other hand, if we stopped mirroring the flags in ProcGlobal, it would mean we would have to access all procs' PGPROC entries in GetSnapshotData, which is undesirable for performance reason (per commit 5788e258bb26).
I am really unsure about the REINDEX CONCURRENTLY part of this, for two reasons: 1. It is not as good when reindexing multiple indexes, because we can only apply the flag if *all* indexes are "safe". Any unsafe index means we step down from it for the whole thing. This is probably not worth worrying much about, but still. 2. In some of the waiting transactions, we actually do more things than what we do in CREATE INDEX CONCURRENTLY transactions --- some catalog updates, but we also do the whole index validation phase. Is that OK? It's not as clear to me that it is safe to set the flag in all those places. I moved the comments to the new function and made it inline. I also changed the way we determine how the function is safe; there's no reason to build an IndexInfo if we can simply look at rel->rd_indexprs and rel->indpred. I've been wondering if it would be sane/safe to do the WaitForFoo stuff outside of any transaction. I'll have a look again tomorrow.
Attachment
On Mon, Nov 16, 2020 at 09:23:41PM -0300, Alvaro Herrera wrote: > I've been wondering if it would be sane/safe to do the WaitForFoo stuff > outside of any transaction. This needs to remain within a transaction as CIC holds a session lock on the parent table, which would not be cleaned up without a transaction context. -- Michael
Attachment
On 2020-Nov-16, Alvaro Herrera wrote: > On 2020-Nov-09, Tom Lane wrote: > > > Michael Paquier <michael@paquier.xyz> writes: > > >> + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > > >> + MyProc->vacuumFlags |= PROC_IN_SAFE_IC; > > >> + ProcGlobal->vacuumFlags[MyProc->pgxactoff] = MyProc->vacuumFlags; > > >> + LWLockRelease(ProcArrayLock); > > > > > I can't help noticing that you are repeating the same code pattern > > > eight times. I think that this should be in its own routine, and that > > > we had better document that this should be called just after starting > > > a transaction, with an assertion enforcing that. > > > > Do we really need exclusive lock on the ProcArray to make this flag > > change? That seems pretty bad from a concurrency standpoint. > > BTW I now know that the reason for taking ProcArrayLock is not the > vacuumFlags itself, but rather MyProc->pgxactoff, which can move. ... ah, but I realize now that this means that we can use shared lock here, not exclusive, which is already an enormous improvement. That's because ->pgxactoff can only be changed with exclusive lock held; so as long as we hold shared, the array item cannot move.
So I made the change to set the statusFlags with only LW_SHARED -- both in existing uses (see 0002) and in the new CIC code (0003). I don't think 0002 is going to have a tremendous performance impact, because it changes operations that are very infrequent. But even so, it would be weird to leave code around that uses exclusive lock when we're going to add new code that uses shared lock for the same thing. I still left the REINDEX CONCURRENTLY support in split out in 0004; I intend to get the first three patches pushed now, and look into 0004 again later.
Attachment
On Tue, Nov 17, 2020 at 02:14:53PM -0300, Alvaro Herrera wrote: > diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c > index f1f4df7d70..4324e32656 100644 > --- a/src/backend/replication/logical/logical.c > +++ b/src/backend/replication/logical/logical.c > @@ -181,7 +181,7 @@ StartupDecodingContext(List *output_plugin_options, > */ > if (!IsTransactionOrTransactionBlock()) > { > - LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > + LWLockAcquire(ProcArrayLock, LW_SHARED); > MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING; > ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags; > LWLockRelease(ProcArrayLock); We don't really document that it is safe to update statusFlags while holding a shared lock on ProcArrayLock, right? Could this be clarified at least in proc.h? > + /* Determine whether we can call set_safe_index_flag */ > + safe_index = indexInfo->ii_Expressions == NIL && > + indexInfo->ii_Predicate == NIL; This should tell why we check after expressions and predicates, in short giving an explanation about the snapshot issues with build and validation when executing those expressions and/or predicates. > + * Set the PROC_IN_SAFE_IC flag in my PGPROC entry. > + * > + * When doing concurrent index builds, we can set this flag > + * to tell other processes concurrently running CREATE > + * INDEX CONCURRENTLY to ignore us when > + * doing their waits for concurrent snapshots. On one hand it > + * avoids pointlessly waiting for a process that's not interesting > + * anyway, but more importantly it avoids deadlocks in some cases. > + * > + * This can only be done for indexes that don't execute any expressions. > + * Caller is responsible for only calling this routine when that > + * assumption holds true. > + * > + * (The flag is reset automatically at transaction end, so it must be > + * set for each transaction.) Would it be better to document that this function had better be called after beginning a transaction? I am wondering if we should not enforce some conditions here, say this one or similar: Assert(GetTopTransactionIdIfAny() == InvalidTransactionId); > + */ > +static inline void > +set_safe_index_flag(void) This routine name is rather close to index_set_state_flags(), could it be possible to finish with a better name? -- Michael
Attachment
On 2020-Nov-18, Michael Paquier wrote: > On Tue, Nov 17, 2020 at 02:14:53PM -0300, Alvaro Herrera wrote: > > diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c > > index f1f4df7d70..4324e32656 100644 > > --- a/src/backend/replication/logical/logical.c > > +++ b/src/backend/replication/logical/logical.c > > @@ -181,7 +181,7 @@ StartupDecodingContext(List *output_plugin_options, > > */ > > if (!IsTransactionOrTransactionBlock()) > > { > > - LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > > + LWLockAcquire(ProcArrayLock, LW_SHARED); > > MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING; > > ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags; > > LWLockRelease(ProcArrayLock); > > We don't really document that it is safe to update statusFlags while > holding a shared lock on ProcArrayLock, right? Could this be > clarified at least in proc.h? Pushed that part with a comment addition. This stuff is completely undocumented ... > > + /* Determine whether we can call set_safe_index_flag */ > > + safe_index = indexInfo->ii_Expressions == NIL && > > + indexInfo->ii_Predicate == NIL; > > This should tell why we check after expressions and predicates, in > short giving an explanation about the snapshot issues with build and > validation when executing those expressions and/or predicates. Fair. It seems good to add a comment to the new function, and reference that in each place where we set the flag. > > + * Set the PROC_IN_SAFE_IC flag in my PGPROC entry. > Would it be better to document that this function had better be called > after beginning a transaction? I am wondering if we should not > enforce some conditions here, say this one or similar: > Assert(GetTopTransactionIdIfAny() == InvalidTransactionId); I went with checking MyProc->xid and MyProc->xmin, which is the same in practice but notionally closer to what we're doing. > > +static inline void > > +set_safe_index_flag(void) > > This routine name is rather close to index_set_state_flags(), could it > be possible to finish with a better name? I went with set_indexsafe_procflags(). Ugly ...
Attachment
Hi, On 2020-11-17 12:55:01 -0300, Alvaro Herrera wrote: > ... ah, but I realize now that this means that we can use shared lock > here, not exclusive, which is already an enormous improvement. That's > because ->pgxactoff can only be changed with exclusive lock held; so as > long as we hold shared, the array item cannot move. Uh, wait a second. The acquisition of this lock hasn't been affected by the snapshot scalability changes, and therefore are unrelated to ->pgxactoff changing or not. In 13 this is: LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); MyPgXact->vacuumFlags |= PROC_IN_VACUUM; if (params->is_wraparound) MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND; LWLockRelease(ProcArrayLock); Lowering this to a shared lock doesn't seem right, at least without a detailed comment explaining why it's safe. Because GetSnapshotData() etc look at all procs with just an LW_SHARED ProcArrayLock, changing vacuumFlags without a lock means that two concurrent horizon computations could come to a different result. I'm not saying it's definitely wrong to relax things here, but I'm not sure we've evaluated it sufficiently. Greetings, Andres Freund
"as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Alvaro Herrera
Date:
> On 2020-11-17 12:55:01 -0300, Alvaro Herrera wrote: > > ... ah, but I realize now that this means that we can use shared lock > > here, not exclusive, which is already an enormous improvement. That's > > because ->pgxactoff can only be changed with exclusive lock held; so as > > long as we hold shared, the array item cannot move. > > Uh, wait a second. The acquisition of this lock hasn't been affected by > the snapshot scalability changes, and therefore are unrelated to > ->pgxactoff changing or not. I'm writing a response trying to thoroughly analyze this, but I also wanted to report that ProcSleep is being a bit generous with what it calls "as quickly as possible" here: LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); /* * Only do it if the worker is not working to protect against Xid * wraparound. */ statusFlags = ProcGlobal->statusFlags[autovac->pgxactoff]; if ((statusFlags & PROC_IS_AUTOVACUUM) && !(statusFlags & PROC_VACUUM_FOR_WRAPAROUND)) { int pid = autovac->pid; StringInfoData locktagbuf; StringInfoData logbuf; /* errdetail for server log */ initStringInfo(&locktagbuf); initStringInfo(&logbuf); DescribeLockTag(&locktagbuf, &lock->tag); appendStringInfo(&logbuf, _("Process %d waits for %s on %s."), MyProcPid, GetLockmodeName(lock->tag.locktag_lockmethodid, lockmode), locktagbuf.data); /* release lock as quickly as possible */ LWLockRelease(ProcArrayLock); The amount of stuff that this is doing with ProcArrayLock held exclusively seems a bit excessive; it sounds like we could copy the values we need first, release the lock, and *then* do all that memory allocation and string printing -- it's a lock of code for such a contended lock. Anytime a process sees itself as blocked by autovacuum and wants to signal it, there's a ProcArrayLock hiccup (I didn't actually measure it, but it's at least five function calls). We could make this more concurrent by copying lock->tag to a local variable, releasing the lock, then doing all the string formatting and printing. See attached quickly.patch. Now, when this code was written (d7318d43d, 2012), this was a LOG message; it was demoted to DEBUG1 later (d8f15c95bec, 2015). I think it would be fair to ... remove the message? Or go back to Simon's original formulation from commit acac68b2bca, which had this message as DEBUG2 without any string formatting. Thoughts?
Attachment
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Andres Freund
Date:
Hi, On 2020-11-18 18:41:27 -0300, Alvaro Herrera wrote: > The amount of stuff that this is doing with ProcArrayLock held > exclusively seems a bit excessive; it sounds like we could copy the > values we need first, release the lock, and *then* do all that memory > allocation and string printing -- it's a lock of code for such a > contended lock. Yea, that's a good point. > Anytime a process sees itself as blocked by autovacuum > and wants to signal it, there's a ProcArrayLock hiccup (I didn't > actually measure it, but it's at least five function calls). I'm a bit doubtful it's that important - it's limited in frequency by deadlock_timeout. But worth improving anyway. > We could make this more concurrent by copying lock->tag to a local > variable, releasing the lock, then doing all the string formatting and > printing. See attached quickly.patch. Sounds like a plan. > Now, when this code was written (d7318d43d, 2012), this was a LOG > message; it was demoted to DEBUG1 later (d8f15c95bec, 2015). I think it > would be fair to ... remove the message? Or go back to Simon's original > formulation from commit acac68b2bca, which had this message as DEBUG2 > without any string formatting. I don't really have an opinion on this. Greetings, Andres Freund
On Wed, Nov 18, 2020 at 11:09:28AM -0800, Andres Freund wrote: > Uh, wait a second. The acquisition of this lock hasn't been affected by > the snapshot scalability changes, and therefore are unrelated to > ->pgxactoff changing or not. > > In 13 this is: > LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > MyPgXact->vacuumFlags |= PROC_IN_VACUUM; > if (params->is_wraparound) > MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND; > LWLockRelease(ProcArrayLock); > > Lowering this to a shared lock doesn't seem right, at least without a > detailed comment explaining why it's safe. Because GetSnapshotData() etc > look at all procs with just an LW_SHARED ProcArrayLock, changing > vacuumFlags without a lock means that two concurrent horizon > computations could come to a different result. > > I'm not saying it's definitely wrong to relax things here, but I'm not > sure we've evaluated it sufficiently. Yeah. While I do like the new assertion that 27838981 has added in ProcArrayEndTransactionInternal(), this commit feels a bit rushed to me. Echoing with my comment from upthread, I am not sure that we still document enough why a shared lock would be completely fine in the case of statusFlags. We have no hints that this could be fine before 2783898, and 2783898 does not make that look better. FWIW, I think that just using LW_EXCLUSIVE for the CIC change would have been fine enough first, after evaluating the performance impact. We could evaluate if it is possible to lower the ProcArrayLock acquisition in those code paths on a separate thread. -- Michael
Attachment
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Michael Paquier
Date:
On Wed, Nov 18, 2020 at 02:48:40PM -0800, Andres Freund wrote: > On 2020-11-18 18:41:27 -0300, Alvaro Herrera wrote: >> We could make this more concurrent by copying lock->tag to a local >> variable, releasing the lock, then doing all the string formatting and >> printing. See attached quickly.patch. > > Sounds like a plan. +1. >> Now, when this code was written (d7318d43d, 2012), this was a LOG >> message; it was demoted to DEBUG1 later (d8f15c95bec, 2015). I think it >> would be fair to ... remove the message? Or go back to Simon's original >> formulation from commit acac68b2bca, which had this message as DEBUG2 >> without any string formatting. > > I don't really have an opinion on this. That still looks useful for debugging, so DEBUG1 sounds fine to me. -- Michael
Attachment
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Michael Paquier
Date:
On Thu, Nov 19, 2020 at 12:13:44PM +0900, Michael Paquier wrote: > That still looks useful for debugging, so DEBUG1 sounds fine to me. By the way, it strikes me that you could just do nothing as long as (log_min_messages > DEBUG1), so you could encapsulate most of the logic that plays with the lock tag using that. -- Michael
Attachment
On 2020-Nov-18, Andres Freund wrote: > In 13 this is: > LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > MyPgXact->vacuumFlags |= PROC_IN_VACUUM; > if (params->is_wraparound) > MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND; > LWLockRelease(ProcArrayLock); > > Lowering this to a shared lock doesn't seem right, at least without a > detailed comment explaining why it's safe. Because GetSnapshotData() etc > look at all procs with just an LW_SHARED ProcArrayLock, changing > vacuumFlags without a lock means that two concurrent horizon > computations could come to a different result. > > I'm not saying it's definitely wrong to relax things here, but I'm not > sure we've evaluated it sufficiently. True. Let's evaluate it. I changed three spots where statusFlags bits are written: a) StartupDecodingContext: sets PROC_IN_LOGICAL_DECODING b) ReplicationSlotRelease: removes PROC_IN_LOGICAL_DECODING c) vacuum_rel: sets PROC_IN_VACUUM and potentially PROC_VACUUM_FOR_WRAPAROUND Who reads these flags? PROC_IN_LOGICAL_DECODING is read by: * ComputeXidHorizons * GetSnapshotData PROC_IN_VACUUM is read by: * GetCurrentVirtualXIDs * ComputeXidHorizons * GetSnapshotData * ProcArrayInstallImportedXmin PROC_VACUUM_FOR_WRAPAROUND is read by: * ProcSleep ProcSleep: no bug here. The flags are checked to see if we should kill() the process (used when autovac blocks some other process). The lock here appears to be inconsequential, since we release it before we do kill(); so strictly speaking, there is still a race condition where the autovac process could reset the flag after we read it and before we get a chance to signal it. The lock level autovac uses to set the flag is not relevant either. ProcArrayInstallImportedXmin: This one is just searching for a matching backend; not affected by the flags. PROC_IN_LOGICAL_DECODING: Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in ReplicationSlotRelease might be the most problematic one of the lot. That's because a proc's xmin that had been ignored all along by ComputeXidHorizons, will now be included in the computation. Adding asserts that proc->xmin and proc->xid are InvalidXid by the time we reset the flag, I got hits in pg_basebackup, test_decoding and subscription tests. I think it's OK for ComputeXidHorizons (since it just means that a vacuum that reads a later will remove less rows.) But in GetSnapshotData it is just not correct to have the Xmin go backwards. Therefore it seems to me that this code has a bug independently of the lock level used. GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData: In these cases, what we need is that the code computes some xmin (or equivalent computation) based on a set of transactions that exclude those marked with the flags. The behavior we want is that if some transaction is marked as vacuum, we ignore the Xid/Xmin *if there is one*. In other words, if there's no Xid/Xmin, then the flag is not important. So if we can ensure that the flag is set first, and the Xid/xmin is installed later, that's sufficient correctness and we don't need to hold exclusive lock. But if we can't ensure that, then we must use exclusive lock, because otherwise we risk another process seeing our Xid first and not our flag, which would be bad. In other words, my conclusion is that there definitely is a bug here and I am going to restore the use of exclusive lock for setting the statusFlags. GetSnapshotData has an additional difficulty -- we do the UINT32_ACCESS_ONCE(ProcGlobal->xid[i]) read *before* testing the bit. So it's not valid to set the bit without locking out GSD, regardless of any barriers we had; if we want this to be safe, we'd have to change this so that the flag is read first, and we read the xid only afterwards, with a read barrier. I *think* we could relax the lock if we had a write barrier in between: set the flag first, issue a write barrier, set the Xid. (I have to admit I'm confused about what needs to happen in the read case: read the bit first, potentially skip the PGPROC entry; but can we read the Xid ahead of reading the flag, and if we do read the xid afterwards, do we need a read barrier in between?) Given this uncertainty, I'm not proposing to relax the lock from exclusive to shared. I didn't get around to reading ComputeXidHorizons, but it seems like it'd have the same problem as GSD.
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > PROC_IN_LOGICAL_DECODING: > Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in > ReplicationSlotRelease might be the most problematic one of the lot. > That's because a proc's xmin that had been ignored all along by > ComputeXidHorizons, will now be included in the computation. Adding > asserts that proc->xmin and proc->xid are InvalidXid by the time we > reset the flag, I got hits in pg_basebackup, test_decoding and > subscription tests. I think it's OK for ComputeXidHorizons (since it > just means that a vacuum that reads a later will remove less rows.) But > in GetSnapshotData it is just not correct to have the Xmin go backwards. > Therefore it seems to me that this code has a bug independently of the > lock level used. That is only a bug if the flags are *cleared* in a way that's not atomic with clearing the transaction's xid/xmin, no? I agree that once set, the flag had better stay set till transaction end, but that's not what's at stake here. > GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData: > In these cases, what we need is that the code computes some xmin (or > equivalent computation) based on a set of transactions that exclude > those marked with the flags. The behavior we want is that if some > transaction is marked as vacuum, we ignore the Xid/Xmin *if there is > one*. In other words, if there's no Xid/Xmin, then the flag is not > important. So if we can ensure that the flag is set first, and the > Xid/xmin is installed later, that's sufficient correctness and we don't > need to hold exclusive lock. But if we can't ensure that, then we must > use exclusive lock, because otherwise we risk another process seeing our > Xid first and not our flag, which would be bad. I don't buy this either. You get the same result if someone looks just before you take the ProcArrayLock to set the flag. So if there's a problem, it's inherent in the way that the flags are defined or used; the strength of lock used in this stanza won't affect it. regards, tom lane
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Alvaro Herrera
Date:
On 2020-Nov-19, Michael Paquier wrote: > On Thu, Nov 19, 2020 at 12:13:44PM +0900, Michael Paquier wrote: > > That still looks useful for debugging, so DEBUG1 sounds fine to me. > > By the way, it strikes me that you could just do nothing as long as > (log_min_messages > DEBUG1), so you could encapsulate most of the > logic that plays with the lock tag using that. Good idea, done. I also noticed that if we're going to accept a race (which BTW already exists) we may as well simplify the code about it. I think the attached is the final form of this.
Attachment
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2020-Nov-19, Michael Paquier wrote: >> By the way, it strikes me that you could just do nothing as long as >> (log_min_messages > DEBUG1), so you could encapsulate most of the >> logic that plays with the lock tag using that. > Good idea, done. I'm less sure that that's a good idea. It embeds knowledge here that should not exist outside elog.c; moreover, I'm not entirely sure that it's even correct, given the nonlinear ranking of log_min_messages. Maybe it'd be a good idea to have elog.c expose a new function along the lines of "bool message_level_is_interesting(int elevel)" to support this and similar future optimizations in a less fragile way. regards, tom lane
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Alvaro Herrera
Date:
On 2020-Nov-23, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > On 2020-Nov-19, Michael Paquier wrote: > >> By the way, it strikes me that you could just do nothing as long as > >> (log_min_messages > DEBUG1), so you could encapsulate most of the > >> logic that plays with the lock tag using that. > > > Good idea, done. > > I'm less sure that that's a good idea. It embeds knowledge here that > should not exist outside elog.c; moreover, I'm not entirely sure that > it's even correct, given the nonlinear ranking of log_min_messages. Well, we already do this in a number of places. But I can get behind this: > Maybe it'd be a good idea to have elog.c expose a new function > along the lines of "bool message_level_is_interesting(int elevel)" > to support this and similar future optimizations in a less fragile way.
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Well, we already do this in a number of places. But I can get behind > this: >> Maybe it'd be a good idea to have elog.c expose a new function >> along the lines of "bool message_level_is_interesting(int elevel)" >> to support this and similar future optimizations in a less fragile way. I'll see about a patch for that. regards, tom lane
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Alvaro Herrera
Date:
On 2020-Nov-23, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > Well, we already do this in a number of places. But I can get behind > > this: > > >> Maybe it'd be a good idea to have elog.c expose a new function > >> along the lines of "bool message_level_is_interesting(int elevel)" > >> to support this and similar future optimizations in a less fragile way. > > I'll see about a patch for that. Looking at that now ...
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Tom Lane
Date:
Here's a draft patch. regards, tom lane diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 03c553e7ea..9cd0b7c11b 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -5344,7 +5344,7 @@ static void ShowTransactionState(const char *str) { /* skip work if message will definitely not be printed */ - if (log_min_messages <= DEBUG5 || client_min_messages <= DEBUG5) + if (message_level_is_interesting(DEBUG5)) ShowTransactionStateRec(str, CurrentTransactionState); } @@ -5371,7 +5371,6 @@ ShowTransactionStateRec(const char *str, TransactionState s) if (s->parent) ShowTransactionStateRec(str, s->parent); - /* use ereport to suppress computation if msg will not be printed */ ereport(DEBUG5, (errmsg_internal("%s(%d) name: %s; blockState: %s; state: %s, xid/subid/cid: %u/%u/%u%s%s", str, s->nestingLevel, diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 7e915bcadf..32a3099c1f 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -105,7 +105,7 @@ log_invalid_page(RelFileNode node, ForkNumber forkno, BlockNumber blkno, * tracing of the cause (note the elog context mechanism will tell us * something about the XLOG record that generated the reference). */ - if (log_min_messages <= DEBUG1 || client_min_messages <= DEBUG1) + if (message_level_is_interesting(DEBUG1)) report_invalid_page(DEBUG1, node, forkno, blkno, present); if (invalid_page_tab == NULL) @@ -159,7 +159,7 @@ forget_invalid_pages(RelFileNode node, ForkNumber forkno, BlockNumber minblkno) hentry->key.forkno == forkno && hentry->key.blkno >= minblkno) { - if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2) + if (message_level_is_interesting(DEBUG2)) { char *path = relpathperm(hentry->key.node, forkno); @@ -192,7 +192,7 @@ forget_invalid_pages_db(Oid dbid) { if (hentry->key.node.dbNode == dbid) { - if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2) + if (message_level_is_interesting(DEBUG2)) { char *path = relpathperm(hentry->key.node, hentry->key.forkno); diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index b0d037600e..245c2f4fc8 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -1146,15 +1146,9 @@ reportDependentObjects(const ObjectAddresses *targetObjects, * If no error is to be thrown, and the msglevel is too low to be shown to * either client or server log, there's no need to do any of the rest of * the work. - * - * Note: this code doesn't know all there is to be known about elog - * levels, but it works for NOTICE and DEBUG2, which are the only values - * msglevel can currently have. We also assume we are running in a normal - * operating environment. */ if (behavior == DROP_CASCADE && - msglevel < client_min_messages && - (msglevel < log_min_messages || log_min_messages == LOG)) + !message_level_is_interesting(msglevel)) return; /* diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index babee386c4..87c3ea450e 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -1215,7 +1215,7 @@ ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime) walrcv->lastMsgReceiptTime = lastMsgReceiptTime; SpinLockRelease(&walrcv->mutex); - if (log_min_messages <= DEBUG2) + if (message_level_is_interesting(DEBUG2)) { char *sendtime; char *receipttime; diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 5d1b1a16be..2eb19ad293 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1900,7 +1900,7 @@ ProcessStandbyReplyMessage(void) replyTime = pq_getmsgint64(&reply_message); replyRequested = pq_getmsgbyte(&reply_message); - if (log_min_messages <= DEBUG2) + if (message_level_is_interesting(DEBUG2)) { char *replyTimeStr; @@ -2082,7 +2082,7 @@ ProcessStandbyHSFeedbackMessage(void) feedbackCatalogXmin = pq_getmsgint(&reply_message, 4); feedbackCatalogEpoch = pq_getmsgint(&reply_message, 4); - if (log_min_messages <= DEBUG2) + if (message_level_is_interesting(DEBUG2)) { char *replyTimeStr; diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 1ba47c194b..04c2245338 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -600,6 +600,42 @@ errfinish(const char *filename, int lineno, const char *funcname) CHECK_FOR_INTERRUPTS(); } +/* + * message_level_is_interesting --- would ereport/elog do anything? + * + * Returns true if ereport/elog with this elevel will not be a no-op. + * This is useful to short-circuit any expensive preparatory work that + * might be needed for a logging message. There is no point in + * prepending this to a bare ereport/elog call, however. + */ +bool +message_level_is_interesting(int elevel) +{ + bool output_to_server; + bool output_to_client; + + /* + * Keep this in sync with the decision-making in errstart(). + */ + if (elevel >= ERROR) + return true; + + output_to_server = is_log_level_output(elevel, log_min_messages); + if (output_to_server) + return true; + + if (whereToSendOutput == DestRemote && elevel != LOG_SERVER_ONLY && + !ClientAuthInProgress) + { + output_to_client = (elevel >= client_min_messages || + elevel == INFO); + if (output_to_client) + return true; + } + + return false; +} + /* * errcode --- add SQLSTATE error code to the current error diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index 1e09ee0541..de5bba0a4a 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -149,6 +149,8 @@ extern bool errstart(int elevel, const char *domain); extern void errfinish(const char *filename, int lineno, const char *funcname); +extern bool message_level_is_interesting(int elevel); + extern int errcode(int sqlerrcode); extern int errcode_for_file_access(void);
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Alvaro Herrera
Date:
On 2020-Nov-23, Tom Lane wrote: > Here's a draft patch. Here's another of my own. Outside of elog.c it seems identical.
Attachment
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Alvaro Herrera
Date:
On 2020-Nov-23, Alvaro Herrera wrote: > On 2020-Nov-23, Tom Lane wrote: > > > Here's a draft patch. > > Here's another of my own. Outside of elog.c it seems identical. Your version has the advantage that errstart() doesn't get a new function call. I'm +1 for going with that ... we could avoid the duplicate code with some additional contortions but this changes so rarely that it's probably not worth the trouble.
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2020-Nov-23, Tom Lane wrote: >> Here's a draft patch. > Here's another of my own. Outside of elog.c it seems identical. Ah, I see I didn't cover the case in ProcSleep that you were originally on about ... I'd just looked for existing references to log_min_messages and client_min_messages. I think it's important to have the explicit check for elevel >= ERROR. I'm not too fussed about whether we invent is_log_level_output_client, although that name doesn't seem well-chosen compared to is_log_level_output. Shall I press forward with this, or do you want to? regards, tom lane
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Your version has the advantage that errstart() doesn't get a new > function call. I'm +1 for going with that ... we could avoid the > duplicate code with some additional contortions but this changes so > rarely that it's probably not worth the trouble. I was considering adding that factorization, but marking the function inline to avoid adding overhead. Most of elog.c predates our use of inline, so it wasn't considered when this code was written. regards, tom lane
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Alvaro Herrera
Date:
On 2020-Nov-23, Tom Lane wrote: > Ah, I see I didn't cover the case in ProcSleep that you were originally on > about ... I'd just looked for existing references to log_min_messages > and client_min_messages. Yeah, it seemed bad form to add that when you had just argued against it :-) > I think it's important to have the explicit check for elevel >= ERROR. > I'm not too fussed about whether we invent is_log_level_output_client, > although that name doesn't seem well-chosen compared to > is_log_level_output. Just replacing "log" for "client" in that seemed strictly worse, and I didn't (don't) have any other ideas. > Shall I press forward with this, or do you want to? Please feel free to go ahead, including the change to ProcSleep.
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2020-Nov-23, Tom Lane wrote: >> I'm not too fussed about whether we invent is_log_level_output_client, >> although that name doesn't seem well-chosen compared to >> is_log_level_output. > Just replacing "log" for "client" in that seemed strictly worse, and I > didn't (don't) have any other ideas. I never cared that much for "is_log_level_output" either. Thinking about renaming it to "should_output_to_log()", and then the new function would be "should_output_to_client()". >> Shall I press forward with this, or do you want to? > Please feel free to go ahead, including the change to ProcSleep. Will do. regards, tom lane
Hi, On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote: > ProcSleep: no bug here. > The flags are checked to see if we should kill() the process (used when > autovac blocks some other process). The lock here appears to be > inconsequential, since we release it before we do kill(); so strictly > speaking, there is still a race condition where the autovac process > could reset the flag after we read it and before we get a chance to > signal it. The lock level autovac uses to set the flag is not relevant > either. Yea. Even before the recent changes building the lock message under the lock didn't make sense. Now that the flags are mirrored in pgproc, we probably should just make this use READ_ONCE(autovac->statusFlags), without any other use of ProcArrayLock. As you say the race condition is between the flag test, the kill, and the signal being processed, which wasn't previously protected either. > PROC_IN_LOGICAL_DECODING: > Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in > ReplicationSlotRelease might be the most problematic one of the lot. > That's because a proc's xmin that had been ignored all along by > ComputeXidHorizons, will now be included in the computation. Adding > asserts that proc->xmin and proc->xid are InvalidXid by the time we > reset the flag, I got hits in pg_basebackup, test_decoding and > subscription tests. I think it's OK for ComputeXidHorizons (since it > just means that a vacuum that reads a later will remove less rows.) But > in GetSnapshotData it is just not correct to have the Xmin go backwards. I don't think there's a problem. PROC_IN_LOGICAL_DECODING can only be set when outside a transaction block, i.e. walsender. In which case there shouldn't be an xid/xmin, I think? Or did you gate your assert on PROC_IN_LOGICAL_DECODING being set? > GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData: > > In these cases, what we need is that the code computes some xmin (or > equivalent computation) based on a set of transactions that exclude > those marked with the flags. The behavior we want is that if some > transaction is marked as vacuum, we ignore the Xid/Xmin *if there is > one*. In other words, if there's no Xid/Xmin, then the flag is not > important. So if we can ensure that the flag is set first, and the > Xid/xmin is installed later, that's sufficient correctness and we don't > need to hold exclusive lock. That'd require at least memory barriers in GetSnapshotData()'s loop, which I'd really like to avoid. Otherwise the order in which memory gets written in one process doesn't guarantee the order of visibility in another process... > In other words, my conclusion is that there definitely is a bug here and > I am going to restore the use of exclusive lock for setting the > statusFlags. Cool. > GetSnapshotData has an additional difficulty -- we do the > UINT32_ACCESS_ONCE(ProcGlobal->xid[i]) read *before* testing the bit. > So it's not valid to set the bit without locking out GSD, regardless of > any barriers we had; if we want this to be safe, we'd have to change > this so that the flag is read first, and we read the xid only > afterwards, with a read barrier. Which we don't want, because that'd mean slowing down the *extremely* common case of the majority of backends neither having an xid assigned nor doing logical decoding / vacuum. We'd be reading two cachelines instead of one. Greetings, Andres Freund
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Alvaro Herrera
Date:
On 2020-Nov-23, Tom Lane wrote: > I never cared that much for "is_log_level_output" either. Thinking > about renaming it to "should_output_to_log()", and then the new function > would be "should_output_to_client()". Ah, that sounds nicely symmetric and grammatical. Thanks!
Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
From
Michael Paquier
Date:
On Mon, Nov 23, 2020 at 06:13:17PM -0500, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >> Please feel free to go ahead, including the change to ProcSleep. > > Will do. Thank you both for 450c823 and 789b938. -- Michael
Attachment
On 2020-Nov-23, Andres Freund wrote: > On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote: > > PROC_IN_LOGICAL_DECODING: > > Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in > > ReplicationSlotRelease might be the most problematic one of the lot. > > That's because a proc's xmin that had been ignored all along by > > ComputeXidHorizons, will now be included in the computation. Adding > > asserts that proc->xmin and proc->xid are InvalidXid by the time we > > reset the flag, I got hits in pg_basebackup, test_decoding and > > subscription tests. I think it's OK for ComputeXidHorizons (since it > > just means that a vacuum that reads a later will remove less rows.) But > > in GetSnapshotData it is just not correct to have the Xmin go backwards. > > I don't think there's a problem. PROC_IN_LOGICAL_DECODING can only be > set when outside a transaction block, i.e. walsender. In which case > there shouldn't be an xid/xmin, I think? Or did you gate your assert on > PROC_IN_LOGICAL_DECODING being set? Ah, you're right about this one -- I missed the significance of setting the flag only "when outside of a transaction block" at the time we call StartupDecodingContext.
On 2020-Nov-23, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData: > > > In these cases, what we need is that the code computes some xmin (or > > equivalent computation) based on a set of transactions that exclude > > those marked with the flags. The behavior we want is that if some > > transaction is marked as vacuum, we ignore the Xid/Xmin *if there is > > one*. In other words, if there's no Xid/Xmin, then the flag is not > > important. So if we can ensure that the flag is set first, and the > > Xid/xmin is installed later, that's sufficient correctness and we don't > > need to hold exclusive lock. But if we can't ensure that, then we must > > use exclusive lock, because otherwise we risk another process seeing our > > Xid first and not our flag, which would be bad. > > I don't buy this either. You get the same result if someone looks just > before you take the ProcArrayLock to set the flag. So if there's a > problem, it's inherent in the way that the flags are defined or used; > the strength of lock used in this stanza won't affect it. The problem is that the writes could be reordered in a way that makes the Xid appear set to an onlooker before PROC_IN_VACUUM appears set. Vacuum always sets the bit first, and *then* the xid. If the reader always reads it like that then it's not a problem. But in order to guarantee that, we would have to have a read barrier for each pass through the loop. With the LW_EXCLUSIVE lock, we block the readers so that the bit is known set by the time they examine it. As I understand, getting the lock is itself a barrier, so there's no danger that we'll set the bit and they won't see it. ... at least, that how I *imagine* the argument to be. In practice, vacuum_rel() calls GetSnapshotData() before installing the PROC_IN_VACUUM bit, and therefore there *is* a risk that reader 1 will get MyProc->xmin included in their snapshot (because bit not yet set), and reader 2 won't. If my understanding is correct, then we should move the PushActiveSnapshot(GetTransactionSnapshot()) call to after we have the PROC_IN_VACUUM bit set.
On 2020-Nov-23, Andres Freund wrote: > On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote: > > In other words, my conclusion is that there definitely is a bug here and > > I am going to restore the use of exclusive lock for setting the > > statusFlags. > > Cool. Here's a patch. Note it also moves the computation of vacuum's Xmin (per GetTransactionSnapshot) to *after* the bit has been set in statusFlags.
Attachment
On 2020-Nov-25, Alvaro Herrera wrote: > On 2020-Nov-23, Andres Freund wrote: > > > On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote: > > > > In other words, my conclusion is that there definitely is a bug here and > > > I am going to restore the use of exclusive lock for setting the > > > statusFlags. > > > > Cool. > > Here's a patch. Pushed, thanks.
So let's discuss the next step in this series: what to do about REINDEX CONCURRENTLY. I started with Dmitry's patch (an updated version of which I already posted in [1]). However, Dmitry missed (and I hadn't noticed) that some of the per-index loops are starting transactions also, and that we need to set the flag in those. And what's more, in a couple of the function-scope transactions we do set the flag pointlessly: the transactions there do not acquire a snapshot, so there's no reason to set the flag at all, because WaitForOlderSnapshots ignores sessions whose Xmin is 0. There are also transactions that wait first, without setting a snapshot, and then do some catalog manipulations. I think it's prett much useless to set the flag for those, because they're going to be very short anyway. (There's also one case of this in CREATE INDEX CONCURRENTLY.) But there's a more interesting point also. In Dmitry's patch, we determine safety for *all* indexes being processed as a set, and then apply the flag only if they're all deemed safe. But we can optimize this, and set the flag for each index' transaction individually, and only skip it for those specific indexes that are unsafe. So I propose to change the data structure used in ReindexRelationConcurrently from the current list of OIDs to a list of (oid,boolean) pairs, to be able to track setting the flag individually. There's one more useful observation: in the function-scope transactions (outside the per-index loops), we don't touch the contents of any indexes; we just wait or do some catalog manipulation. So we can set the flag *regardless of the safety of any indexes*. We only need to care about the safety of the indexes in the phases where we build the indexes and when we validate them. [1] https://postgr.es/m/20201118175804.GA23027@alvherre.pgsql
On 2020-Nov-26, Alvaro Herrera wrote: > So let's discuss the next step in this series: what to do about REINDEX > CONCURRENTLY. > [...] ... as in the attached.
Attachment
Actually, I noticed two things. The first of them, addressed in this new version of the patch, is that REINDEX CONCURRENTLY is doing a lot of repetitive work by reopening each index and table in the build/validate loops, so that they can report progress. This is easy to remedy by adding a couple more members to the new struct (which I also renamed to ReindexIndexInfo), for tableId and amId. The code seems a bit simpler this way. The other thing is that ReindexRelationConcurrenty seems to always be called with the relations already locked by RangeVarGetRelidExtended. So claiming to acquire locks on the relations over and over is pointless. (I only noticed this because there was an obvious deadlock hazard in one of the loops, that locked index before table.) I think we should reduce all those to NoLock. My patch does not do that.
Attachment
In the interest of showing progress, I'm going to mark this CF item as committed, and I'll submit the remaining pieces in a new thread. Thanks!
On Mon, Nov 30, 2020 at 04:15:27PM -0300, Alvaro Herrera wrote: > In the interest of showing progress, I'm going to mark this CF item as > committed, and I'll submit the remaining pieces in a new thread. Thanks for splitting! -- Michael
Attachment
Hi, On 2020-11-25 17:03:58 -0300, Alvaro Herrera wrote: > On 2020-Nov-23, Andres Freund wrote: > > > On 2020-11-23 12:30:05 -0300, Alvaro Herrera wrote: > > > > In other words, my conclusion is that there definitely is a bug here and > > > I am going to restore the use of exclusive lock for setting the > > > statusFlags. > > > > Cool. > > Here's a patch. > > Note it also moves the computation of vacuum's Xmin (per > GetTransactionSnapshot) to *after* the bit has been set in statusFlags. > From b813c67a4abe2127b8bd13db7e920f958db15d59 Mon Sep 17 00:00:00 2001 > From: Alvaro Herrera <alvherre@alvh.no-ip.org> > Date: Tue, 24 Nov 2020 18:10:42 -0300 > Subject: [PATCH] Restore lock level to update statusFlags > > Reverts 27838981be9d (some comments are kept). Per discussion, it does > not seem safe to relax the lock level used for this; in order for it to > be safe, there would have to be memory barriers between the point we set > the flag and the point we set the trasaction Xid, which perhaps would > not be so bad; but there would also have to be barriers at the readers' > side, which from a performance perspective might be bad. > > Now maybe this analysis is wrong and it *is* safe for some reason, but > proof of that is not trivial. I just noticed that this commit (dcfff74fb16) didn't revert the change of lock level in ReplicationSlotRelease(). Was that intentional? Greetings, Andres Freund
On 2021-Nov-10, Andres Freund wrote: > > Reverts 27838981be9d (some comments are kept). Per discussion, it does > > not seem safe to relax the lock level used for this; in order for it to > > be safe, there would have to be memory barriers between the point we set > > the flag and the point we set the trasaction Xid, which perhaps would > > not be so bad; but there would also have to be barriers at the readers' > > side, which from a performance perspective might be bad. > > > > Now maybe this analysis is wrong and it *is* safe for some reason, but > > proof of that is not trivial. > > I just noticed that this commit (dcfff74fb16) didn't revert the change of lock > level in ReplicationSlotRelease(). Was that intentional? Hmm, no, that seems to have been a mistake. I'll restore it. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "El miedo atento y previsor es la madre de la seguridad" (E. Burke)
On 2021-11-11 09:38:07 -0300, Alvaro Herrera wrote: > > I just noticed that this commit (dcfff74fb16) didn't revert the change of lock > > level in ReplicationSlotRelease(). Was that intentional? > > Hmm, no, that seems to have been a mistake. I'll restore it. Thanks