Thread: VACUUM ANALYZE blocking both reads and writes to a table
Hello, my understanding, and generally my experience, has been that VACUUM and VACUUM ANALYZE (but not VACUUM FULL) are never supposed to block neither SELECT:s nor UPDATE:s/INSERT:s/DELETE:s to a table. This is seemingly confirmed by reading the "explicit locking" documentation, in terms of the locks acquired by various forms of vacuuming, and with which other lock modes they conflict. I have now seen it happen twice that a VACUMM ANALYZE has seemingly been the triggering factor to blocking queries. In the first instance, we had two particularly interesting things going on: VACUUM ANALYZE thetable LOCK TABLE thetable IN ACCESS SHARE MODE In addition there was one SELECT from the table, and a bunch of INSERT:s (this is based on pg_stat_activity). While I am unsure of why there is an explicit LOCK going on with ACCESS SHARE MODE (no explicit locking is ever done on this table by the application), it is supposed to be the locking used for selects. I suspect it may be a referential integrity related acquisition generated by PG. The second time it happned, there was again a single SELECT, a bunch of INSERT:s, and then: VACUUM ANALYZE thetable This time there was no explicit LOCK visible. In both cases, actitivy was completely blocked until the VACUUM ANALYZE completed. Does anyone have input on why this could be happening? The PostgreSQL version is 8.2.4[1]. Am I correct in that it *should* not be possible for this to happen? For the next time this happens I will try to have a query prepared that will dump as much relevant information as possible regarding acquired locks. If it makes a difference the SELECT does have a subselect that also selcts from the same table - a MAX(colum) on an indexed column. [1] I did check the ChangeLog for 8.2.x releases above .4, and the 8.3 releases, but did not see anything that indicated locking/conflict related fixes in relation to vacuums. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org
Attachment
Peter Schuller wrote: > Does anyone have input on why this could be happening? The PostgreSQL > version is 8.2.4[1]. Am I correct in that it *should* not be possible > for this to happen? No. VACUUM takes an exclusive lock at the end of the operation to truncate empty pages. (If it cannot get the lock then it'll just skip this step.) In 8.2.4 there was a bug that caused it to sleep according to vacuum_delay during the scan to identify possibly empty pages. This was fixed in 8.2.5: revision 1.81.2.1 date: 2007-09-10 13:58:50 -0400; author: alvherre; state: Exp; lines: +6 -2; Remove the vacuum_delay_point call in count_nondeletable_pages, because we hold an exclusive lock on the table at this point, which we want to release as soon as possible. This is called in the phase of lazy vacuum where we truncate the empty pages at the end of the table. An alternative solution would be to lower the vacuum delay settings before starting the truncating phase, but this doesn't work very well in autovacuum due to the autobalancing code (which can cause other processes to change our cost delay settings). This case could be considered in the balancing code, but it is simpler this way. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hello, > No. VACUUM takes an exclusive lock at the end of the operation to > truncate empty pages. (If it cannot get the lock then it'll just skip > this step.) In 8.2.4 there was a bug that caused it to sleep > according to vacuum_delay during the scan to identify possibly empty > pages. This was fixed in 8.2.5: [snip revision log] Thank you very much! This does indeed seem to be the likely culprit. Will try to either upgrade, or if not possible in time for the next occurance, confirm that this is what is happening based on pg_locks. Thanks again for the very informative response. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org
Attachment
Actually, while on the topic: > date: 2007-09-10 13:58:50 -0400; author: alvherre; state: Exp; lines: +6 -2; > Remove the vacuum_delay_point call in count_nondeletable_pages, because we hold > an exclusive lock on the table at this point, which we want to release as soon > as possible. This is called in the phase of lazy vacuum where we truncate the > empty pages at the end of the table. Even with the fix the lock is held. Is the operation expected to be "fast" (for some definition of "fast") and in-memory, or is this something that causes significant disk I/O and/or scales badly with table size or similar? I.e., is this enough that, even without the .4 bug, one should not really consider VACUUM ANALYZE non-blocking with respect to other transactions? (I realize various exclusive locks are taken for short periods of time even for things that are officially declared non-blocking; the question is whether this falls into this category.) -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org
Attachment
Peter Schuller wrote: > Actually, while on the topic: > > > date: 2007-09-10 13:58:50 -0400; author: alvherre; state: Exp; lines: +6 -2; > > Remove the vacuum_delay_point call in count_nondeletable_pages, because we hold > > an exclusive lock on the table at this point, which we want to release as soon > > as possible. This is called in the phase of lazy vacuum where we truncate the > > empty pages at the end of the table. > > Even with the fix the lock is held. Is the operation expected to be > "fast" (for some definition of "fast") and in-memory, or is this > something that causes significant disk I/O and/or scales badly with > table size or similar? It is fast. > I.e., is this enough that, even without the .4 bug, one should not > really consider VACUUM ANALYZE non-blocking with respect to other > transactions? You should consider it non-blocking. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Peter Schuller wrote: >> Even with the fix the lock is held. Is the operation expected to be >> "fast" (for some definition of "fast") and in-memory, or is this >> something that causes significant disk I/O and/or scales badly with >> table size or similar? > It is fast. Well, it's *normally* fast. In a situation where there are a whole lot of empty pages at the end of the table, it could be slow. That's probably not very likely on a heavily used table. One should also note that (1) The only way vacuum will be able to obtain an exclusive lock in the first place is if there are *no* other transactions using the table at the time. (2) If it's autovacuum we're talking about, it will get kicked off the table if anyone else comes along and wants a conflicting lock. regards, tom lane
Alvaro Herrera wrote: > Peter Schuller wrote: > > Actually, while on the topic: > > > > > date: 2007-09-10 13:58:50 -0400; author: alvherre; state: Exp; lines: +6 -2; > > > Remove the vacuum_delay_point call in count_nondeletable_pages, because we hold > > > an exclusive lock on the table at this point, which we want to release as soon > > > as possible. This is called in the phase of lazy vacuum where we truncate the > > > empty pages at the end of the table. > > > > Even with the fix the lock is held. Is the operation expected to be > > "fast" (for some definition of "fast") and in-memory, or is this > > something that causes significant disk I/O and/or scales badly with > > table size or similar? > > It is fast. To elaborate: it scans the relation backwards and makes note of how many are unused. As soon as it finds a non-empty one, it stops scanning. Typically this should be quick. It is not impossible that there are a lot of empty blocks at the end though, but I have never heard a problem report about this. It could definitely cause I/O though. > > I.e., is this enough that, even without the .4 bug, one should not > > really consider VACUUM ANALYZE non-blocking with respect to other > > transactions? > > You should consider it non-blocking. The lock in conditionally acquired: as I said earlier, the code would rather skip this part than block. So if there's some other operation going on, there's no lock held at all. If this grabs the lock, then other operations are going to block behind it, but the time holding the lock should be short. Note, however, that sleeping for 20ms or more because of vacuum_delay (the bug fixed above) clearly falls out of this category, and easily explains the behavior you're seeing with 8.2.4. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Tom Lane wrote: > (2) If it's autovacuum we're talking about, it will get kicked off the > table if anyone else comes along and wants a conflicting lock. Not on 8.2 though. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
> > (2) If it's autovacuum we're talking about, it will get kicked off the > > table if anyone else comes along and wants a conflicting lock. > > Not on 8.2 though. That is also nice to know. One more reason to upgrade to 8.3. Thank you very much, both Alvaro and Tom, for the very insightful discussion! -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org