Thread: concurrent index builds unneeded lock?
We just ran into a case where we were performing two concurrent index builds on two different tables in two different schemas in the same database (no relational constraints between them). One of the index builds locked on the other. The first index build started... The second index build started... The first one locked on the second one.... The second one finished... The first one was allows to continue and finish. quux=# select * from pg_locks where pid IN (25264, 20108); locktype | database | relation | page | tuple | virtualxid| transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted ------------+----------+----------+------+-------+------------ +---------------+---------+-------+----------+-------------------- +-------+--------------------------+--------- relation | 16385 | 25852 | | | | | | | | 9/3041 | 20108 | RowExclusiveLock | t relation | 16385 | 25861 | | | | | | | | 1/15735 | 25264 | RowExclusiveLock | t relation | 16385 | 16421 | | | | | | | | 9/3041 | 20108 | ShareUpdateExclusiveLock | t virtualxid | | | | | 9/3041 | | | | | 9/3041 | 20108 | ExclusiveLock | t virtualxid | | | | | 1/15735 | | | | | 1/15735 | 25264 | ExclusiveLock | t virtualxid | | | | | 9/3041 | | | | | 1/15735 | 25264 | ShareLock | f relation | 16385 | 16528 | | | | | | | | 1/15735 | 25264 | ShareUpdateExclusiveLock | t (7 rows) Reading the comments in the concurrent index build code, it seems that in prep for phase 3 of the index build it looks for any open txns that could feasibly see deleted tuples prior to the snap. I would think it would be txns that would be reading that table, but I'm thinking it is a bit to aggressive. Am I reading the code wrong there? I'm thinking it should be more selective about vxids it chooses to block on. I'd expect it to block on vxids touching the same table only. Thoughts? -- Theo Schlossnagle http://omniti.com/is/theo-schlossnagle p: +1.443.325.1357 x201 f: +1.410.872.4911
Theo Schlossnagle <jesus@omniti.com> writes: > I would think it would be txns that would be reading that table, but > I'm thinking it is a bit to aggressive. Am I reading the code wrong > there? I'm thinking it should be more selective about vxids it > chooses to block on. I'd expect it to block on vxids touching the > same table only. There is no way to know whether a currently active vxid will try to look at the other table later. We can not just ignore this case... regards, tom lane
On Jul 11, 2009, at 12:12 AM, Tom Lane wrote: > Theo Schlossnagle <jesus@omniti.com> writes: >> I would think it would be txns that would be reading that table, but >> I'm thinking it is a bit to aggressive. Am I reading the code wrong >> there? I'm thinking it should be more selective about vxids it >> chooses to block on. I'd expect it to block on vxids touching the >> same table only. > > There is no way to know whether a currently active vxid will try to > look > at the other table later. We can not just ignore this case... > > regards, tom lane Can't you know "that" if the other active query in question is a concurrent index build? Concurrent index builds by their current implementation cannot exist within another transaction, so you know everythere there is to know about that transaction by looking at it (no risk of prior or future work). While very much unlike a vacuum (a special exclusion in concurrent index builds), they still seem to constitute a "special case" for exclusion. Happy to be wrong here, I really haven't completely digested the code. -- Theo Schlossnagle http://omniti.com/is/theo-schlossnagle p: +1.443.325.1357 x201 f: +1.410.872.4911
On Sat, Jul 11, 2009 at 6:17 AM, Theo Schlossnagle<jesus@omniti.com> wrote: > > > On Jul 11, 2009, at 12:12 AM, Tom Lane wrote: > >> Theo Schlossnagle <jesus@omniti.com> writes: >>> >>> I would think it would be txns that would be reading that table, but >>> I'm thinking it is a bit to aggressive. Am I reading the code wrong >>> there? I'm thinking it should be more selective about vxids it >>> chooses to block on. I'd expect it to block on vxids touching the >>> same table only. >> >> There is no way to know whether a currently active vxid will try to look >> at the other table later. We can not just ignore this case... >> >> regards, tom lane > > > Can't you know "that" if the other active query in question is a concurrent > index build? I think so. Hm. Actually maybe not. What if the index is an expression index and the expression includes a function which does an SQL operation? I'm not sure how realistic that is since to be a danger that SQL operation would have to be an insert, update, or delete which is not just bending the rules. The real problem is that we only added the ability to distinguish vacuums relatively recently and adding more special cases of transactions which can be ignored for one purpose or another seems like a lot of corner cases to worry about. I wonder whether an earlier more general proposal could have some leverage here though: some way to indicate that the transaction has taken all the locks it plans to take already. This was originally proposed as a way for vacuum to know it can ignore the snapshots of a transaction since it isn't going to access the table being vacuumed. In this case the concurrent index build could mark itself as having taken all the locks it plans to take and other concurrent index builds could ignore it. They could also ignore any transactions which have that flag set through any other mechanisms we might add such as a manual SQL command. -- greg http://mit.edu/~gsstark/resume.pdf
On Jul 11, 2009, at 6:50 AM, Greg Stark wrote: > On Sat, Jul 11, 2009 at 6:17 AM, Theo Schlossnagle<jesus@omniti.com> > wrote: >> >> >> On Jul 11, 2009, at 12:12 AM, Tom Lane wrote: >> >>> Theo Schlossnagle <jesus@omniti.com> writes: >>>> >>>> I would think it would be txns that would be reading that table, >>>> but >>>> I'm thinking it is a bit to aggressive. Am I reading the code >>>> wrong >>>> there? I'm thinking it should be more selective about vxids it >>>> chooses to block on. I'd expect it to block on vxids touching the >>>> same table only. >>> >>> There is no way to know whether a currently active vxid will try >>> to look >>> at the other table later. We can not just ignore this case... >>> >>> regards, tom lane >> >> >> Can't you know "that" if the other active query in question is a >> concurrent >> index build? > > I think so. > > Hm. Actually maybe not. What if the index is an expression index and > the expression includes a function which does an SQL operation? I'm > not sure how realistic that is since to be a danger that SQL operation > would have to be an insert, update, or delete which is not just > bending the rules. > > The real problem is that we only added the ability to distinguish > vacuums relatively recently and adding more special cases of > transactions which can be ignored for one purpose or another seems > like a lot of corner cases to worry about. > > I wonder whether an earlier more general proposal could have some > leverage here though: some way to indicate that the transaction has > taken all the locks it plans to take already. This was originally > proposed as a way for vacuum to know it can ignore the snapshots of a > transaction since it isn't going to access the table being vacuumed. > > In this case the concurrent index build could mark itself as having > taken all the locks it plans to take and other concurrent index builds > could ignore it. They could also ignore any transactions which have > that flag set through any other mechanisms we might add such as a > manual SQL command. While I see that feature being extremely hard for users to leverage via SQL, it seems like a very simplistic and useful internal control. When acquire locks, we simply check if we've have the future-locking bit flipped and abort the transaction if it is. It seems really safe. I'd imagine the various places we try to use that we'll be reminded of the incidental locks we'll be needing to grab. But, after looking at the concurrent index code, it seems that it would solve my issue at least. Also much more elegant that adding more exceptions. -- Theo Schlossnagle http://omniti.com/is/theo-schlossnagle p: +1.443.325.1357 x201 f: +1.410.872.4911
Greg Stark <gsstark@mit.edu> writes: > I wonder whether an earlier more general proposal could have some > leverage here though: some way to indicate that the transaction has > taken all the locks it plans to take already. This was originally > proposed as a way for vacuum to know it can ignore the snapshots of a > transaction since it isn't going to access the table being vacuumed. Again, this doesn't work for any interesting cases. You can't for example assume that a user-defined datatype won't choose to look into tables that hadn't been accessed as of the start of the index build. (This isn't a hypothetical example --- I believe PostGIS does some such things already, because it keeps spatial reference definitions in a central catalog table.) regards, tom lane
On 7/11/09 3:50 AM, Greg Stark wrote: > Hm. Actually maybe not. What if the index is an expression index and > the expression includes a function which does an SQL operation? I'm > not sure how realistic that is since to be a danger that SQL operation > would have to be an insert, update, or delete which is not just > bending the rules. It's not realistic at all. People are only supposed to use IMMUTABLE functions for experession indexes; if they declare a volatile function as immutable, then it's their own lookout if they corrupt their data. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > On 7/11/09 3:50 AM, Greg Stark wrote: >> Hm. Actually maybe not. What if the index is an expression index and >> the expression includes a function which does an SQL operation? I'm >> not sure how realistic that is since to be a danger that SQL operation >> would have to be an insert, update, or delete which is not just >> bending the rules. > It's not realistic at all. People are only supposed to use IMMUTABLE > functions for experession indexes; if they declare a volatile function > as immutable, then it's their own lookout if they corrupt their data. The requirement wasn't just on not changing SQL data though. To make use of this you'd also have to forbid indexed functions from *reading* other tables. Which is something we discourage because of the risk that the results aren't really immutable, but we don't forbid it; and there are obvious use-cases. regards, tom lane
On Sun, Jul 12, 2009 at 4:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > The requirement wasn't just on not changing SQL data though. To make > use of this you'd also have to forbid indexed functions from *reading* > other tables. Which is something we discourage because of the risk that > the results aren't really immutable, but we don't forbid it; and there > are obvious use-cases. Well it's my fault but the discussion kind of mutated in the middle. For the original use case I think only changing SQL data would actually be a threat. The concurrent index build only has to wait out any transactions which might update the table without updating the index. Which, even if there are volatile functions in the index expression, index where clause, or index operators, they aren't really likely to do. The other thing is that the worst case if they do is you end up with a corrupted index which is missing entries or has duplicate entries. That's the same risk you always have if you have volatile functions mismarked and used in your index definition. For the mutated discussion where I was trying to find a mechanism that would be more generally useful that's not the case. Vacuum needs to know whether you ever plan to *read* from a table in the future. But that's not what concurrent index builds need to know. So I think we're back to looking at a special case for concurrent index builds to not wait on other concurrent index builds. -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> writes: > So I think we're back to looking at a special case for concurrent > index builds to not wait on other concurrent index builds. I'm kind of wondering how big the use case for that really is. AFAICT the point of a concurrent build is to (re)build an index without incurring too much performance penalty for foreground query processing. So how often are you really going to want to fire off several of them in parallel? If you can afford to saturate your machine with indexing work, you could use plain index builds. regards, tom lane
Actually ... why do we have to have that third wait step at all? Doesn't the indcheckxmin mechanism render it unnecessary, or couldn't we adjust the comparison xmin to make it unnecessary? regards, tom lane
On Sun, Jul 12, 2009 at 4:42 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > > I'm kind of wondering how big the use case for that really is. > AFAICT the point of a concurrent build is to (re)build an index > without incurring too much performance penalty for foreground > query processing. So how often are you really going to want > to fire off several of them in parallel? If you can afford to > saturate your machine with indexing work, you could use plain > index builds. I don't really see those as comparable cases. Firing off multiple concurrent index builds only requires lots of available I/O throughput; using plain index builds requires a maintenance window where any updates to the table is shut down. Being able to run multiple concurrent index builds just means being able to roll out a schema change more quickly. It doesn't let you do anything that was impossible before. Another thing that's annoyed me about our current support for concurrent index builds is that you can't run multiple concurrent builds on the same table. Since they all take the strangely named ShareUpdateExclusiveLock you can only run one at a time. Fixing that would require introducing a new, uh, ShareUpdateSharedLock(?) which conflicts with the vacuum lock but not itself. It didn't seem worth introducing a new lock type at the time but with syncscanning and the evidence people are actually doing this I'm starting to wonder. -- greg http://mit.edu/~gsstark/resume.pdf