Thread: Vacuum as "easily obtained" locks
Hi all, I have an application that is reading from a queue table, as part of my testing I stressed the table to check performance, but after the test was completed I have the a very large empty table: SELECT relname, pg_size_pretty(pg_relation_size(oid)) AS size, reltuples::bigint FROM pg_class; relname | size | reltuples ----------------------------+----------+----------- logdata5queue | 142 GB | 0 From reading the documentation I see that postgres would return this space to that system after a normal vacuum if "one or more pages at the end of a table become entirely free and an exclusive table lock can be easily obtained". What does "easily obtained" mean in this context? Would my applications constant polling of the queue mean that the lock could not be easily obtained? Cheers, -- Michael Graham <mgraham@bloxx.com>
On 8/3/2011 4:47 AM, Michael Graham wrote: > Hi all, > > I have an application that is reading from a queue table, as part of my > testing I stressed the table to check performance, but after the test > was completed I have the a very large empty table: > > SELECT relname, pg_size_pretty(pg_relation_size(oid)) AS size, > reltuples::bigint FROM pg_class; > > relname | size | reltuples > ----------------------------+----------+----------- > logdata5queue | 142 GB | 0 > > From reading the documentation I see that postgres would return this > space to that system after a normal vacuum if "one or more pages at the > end of a table become entirely free and an exclusive table lock can be > easily obtained". > > What does "easily obtained" mean in this context? Would my applications > constant polling of the queue mean that the lock could not be easily > obtained? > > Cheers, I'm not sure what "easily obtained" means, sorry. But here is what I can offer. Depending on how long you ran your test, and the conf settings, and the size of your database, autovacuum may never have even tried. If you have lots and lots of tables, autovacuum only checks one at a time, then wait's a bit. Did you run your test for several days? You can take a look on pg_stat_all_tables, under the *vacuum columns, to see if it ever even ran. (a date in last_autovacuum would indicate a successful run, it wont show failures). It should, however, write something to the system log. I recall something like "autovacuum canceled because...something or other" type a message. While you are running a test, you could keep an eye on the log to see if you get any of those messages. I think that would indicate autovacuum could not get a lock. You can also watch pg_stat_activity during the test, current_query will show you what table is being vacuumed. At worst case, if your usage does prevent autovacuum from running, you can still run it yourself. Either at a scheduled downtime, or as part of your regular client sql. -Andy
On Wed, Aug 3, 2011 at 8:03 AM, Andy Colson <andy@squeakycode.net> wrote: > If you have lots and lots of tables, autovacuum only checks one at a time, > then wait's a bit. Did you run your test for several days? Not true. autovac naps by default 1 minute between each db. i.e. if you have 5 dbs it takes 5 minutes by default to check each database. So if autovac is on it should have run within a few minutes on his database, unless he has hundreds of dbs.
Michael Graham <mgraham@bloxx.com> writes: > From reading the documentation I see that postgres would return this > space to that system after a normal vacuum if "one or more pages at the > end of a table become entirely free and an exclusive table lock can be > easily obtained". > What does "easily obtained" mean in this context? It means that AccessExclusiveLock can be obtained without waiting, ie, there is nothing else accessing the table at the instant VACUUM decides to try to truncate the file. > Would my applications > constant polling of the queue mean that the lock could not be easily > obtained? Very possible, depending on what duty cycle is involved there. regards, tom lane
On 3 August 2011 18:17, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Would my applications >> constant polling of the queue mean that the lock could not be easily >> obtained? > > Very possible, depending on what duty cycle is involved there. Is there any ways of guaranteed concurrent obtaining it? > > regards, tom lane > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Sergey Konoplev Blog: http://gray-hemp.blogspot.com / Linkedin: http://ru.linkedin.com/in/grayhemp / JID/GTalk: gray.ru@gmail.com / Skype: gray-hemp
On Wed, 2011-08-03 at 09:03 -0500, Andy Colson wrote: > Depending on how long you ran your test, and the conf settings, and > the size of your database, autovacuum may never have even tried. I know that the vacuum is definitely running (in fact isn't it the vacuum that set the reltuples to 0?), the test was running for a number of weeks but this is the first time it has emptied the queue. > You can take a look on pg_stat_all_tables, under the *vacuum columns, > to see if it ever even ran. (a date in last_autovacuum would indicate > a successful run, it wont show failures). It should, however, write > something to the system log. I recall something like "autovacuum > canceled because...something or other" type a message. Yeah it said it last ran yesterday (and is currently running now), but I did I notice in the log: 2011-08-02 19:43:35 BST ERROR: canceling autovacuum task 2011-08-02 19:43:35 BST CONTEXT: automatic vacuum of table "traffic.public.logdata5queue" Which is interesting if not particularly useful. > While you are running a test, you could keep an eye on the log to see > if you get any of those messages. I think that would indicate > autovacuum could not get a lock. You can also watch pg_stat_activity > during the test, current_query will show you what table is being > vacuumed. I'm pretty certain the autovacuumer is running while the test is on going what I can't work out is under what circumstances it will be able to return unused space to the OS in when it can't. Cheers, -- Michael Graham <mgraham@bloxx.com>
On Wed, 2011-08-03 at 10:17 -0400, Tom Lane wrote: > Michael Graham <mgraham@bloxx.com> writes: > > Would my applications > > constant polling of the queue mean that the lock could not be easily > > obtained? > > Very possible, depending on what duty cycle is involved there. Hmm. The clients aren't that aggressive, especially when they failed to find data on a previous select, there are 4 clients, they each poll every 10 seconds and the select runs in <1ms. It might be worth noting that they don't ever disconnect from the server, but I assume that is not an issue for getting the AccessExclusiveLock on the table? My worry at the moment is that because the table is so large the vacuum takes a very long time to run (one has been running for 5hrs) and I assume it will continue to run until it is able to get the AccessExclusiveLock is so desperately wants. Cheers, -- Michael Graham <mgraham@bloxx.com>
On Wed, Aug 3, 2011 at 10:57 AM, Michael Graham <mgraham@bloxx.com> wrote: > On Wed, 2011-08-03 at 10:17 -0400, Tom Lane wrote: >> Michael Graham <mgraham@bloxx.com> writes: >> > Would my applications >> > constant polling of the queue mean that the lock could not be easily >> > obtained? >> >> Very possible, depending on what duty cycle is involved there. > > Hmm. The clients aren't that aggressive, especially when they failed to > find data on a previous select, there are 4 clients, they each poll > every 10 seconds and the select runs in <1ms. > > It might be worth noting that they don't ever disconnect from the > server, but I assume that is not an issue for getting the > AccessExclusiveLock on the table? > I don't think so, unless they keep the transaction open. > My worry at the moment is that because the table is so large the vacuum > takes a very long time to run (one has been running for 5hrs) and I > assume it will continue to run until it is able to get the > AccessExclusiveLock is so desperately wants. > You can run a CLUSTER command on the table which would guarantee that an exclusive lock is taken and the table is compacted, but remember that it will block out your select queries until the command completes. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
On Wed, Aug 3, 2011 at 8:57 AM, Michael Graham <mgraham@bloxx.com> wrote: > On Wed, 2011-08-03 at 10:17 -0400, Tom Lane wrote: >> Michael Graham <mgraham@bloxx.com> writes: >> > Would my applications >> > constant polling of the queue mean that the lock could not be easily >> > obtained? >> >> Very possible, depending on what duty cycle is involved there. > > Hmm. The clients aren't that aggressive, especially when they failed to > find data on a previous select, there are 4 clients, they each poll > every 10 seconds and the select runs in <1ms. > > It might be worth noting that they don't ever disconnect from the > server, but I assume that is not an issue for getting the > AccessExclusiveLock on the table? > > My worry at the moment is that because the table is so large the vacuum > takes a very long time to run (one has been running for 5hrs) and I > assume it will continue to run until it is able to get the > AccessExclusiveLock is so desperately wants. Assuming you have the spare IO look at making autovacuum more aggressive. Reduce naptime and increase cost
In response to Michael Graham <mgraham@bloxx.com>: > On Wed, 2011-08-03 at 10:17 -0400, Tom Lane wrote: > > Michael Graham <mgraham@bloxx.com> writes: > > > Would my applications > > > constant polling of the queue mean that the lock could not be easily > > > obtained? > > > > Very possible, depending on what duty cycle is involved there. > > Hmm. The clients aren't that aggressive, especially when they failed to > find data on a previous select, there are 4 clients, they each poll > every 10 seconds and the select runs in <1ms. So, under optimal conditions, the table is queried about every 1s. What about table inserts? Really, there are lots of situations that can cause a 1ms query to occasionally take a few seconds, so it's possible that table is locked more often than you realize. > It might be worth noting that they don't ever disconnect from the > server, but I assume that is not an issue for getting the > AccessExclusiveLock on the table? Unless those clients are starting transactions and leaving them running for long periods. Some client software is known to do that unless you specifically tell it not to. The definitive way to determine this is to monitor the pg_locks table. > My worry at the moment is that because the table is so large the vacuum > takes a very long time to run (one has been running for 5hrs) and I > assume it will continue to run until it is able to get the > AccessExclusiveLock is so desperately wants. If it's gotten very bad, you may have to explicitly VACUUM FULL it manually to get things back under control. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Pavan Deolasee <pavan.deolasee@gmail.com> writes: > On Wed, Aug 3, 2011 at 10:57 AM, Michael Graham <mgraham@bloxx.com> wrote: >> On Wed, 2011-08-03 at 10:17 -0400, Tom Lane wrote: > Michael Graham <mgraham@bloxx.com> writes: >>>> Would my applications >>>> constant polling of the queue mean that the lock could not be easily >>>> obtained? >>> Very possible, depending on what duty cycle is involved there. >> Hmm. �The clients aren't that aggressive, especially when they failed to >> find data on a previous select, there are 4 clients, they each poll >> every 10 seconds and the select runs in <1ms. >> >> It might be worth noting that they don't ever disconnect from the >> server, but I assume that is not an issue for getting the >> AccessExclusiveLock on the table? > I don't think so, unless they keep the transaction open. The other problem is that once autovacuum has gotten the lock, it has to keep it for long enough to re-scan the truncatable pages (to make sure they're still empty). And it is set up so that any access to the table will kick autovacuum off the lock. An access pattern like that would very likely prevent it from ever truncating, if there are a lot of pages that need to be truncated. (There's been some discussion of modifying this behavior, but nothing's been done about it yet.) regards, tom lane
On Wed, 2011-08-03 at 11:40 -0400, Tom Lane wrote: > The other problem is that once autovacuum has gotten the lock, it has > to keep it for long enough to re-scan the truncatable pages (to make > sure they're still empty). And it is set up so that any access to the > table will kick autovacuum off the lock. An access pattern like that > would very likely prevent it from ever truncating, if there are a lot > of pages that need to be truncated. (There's been some discussion of > modifying this behavior, but nothing's been done about it yet.) Ah! This looks like it is very much the issue. Since I've got around 150GB of data that should be truncatable and a select every ~2s. Just to confirm would postgres write: 2011-08-03 16:09:55 BST ERROR: canceling autovacuum task 2011-08-03 16:09:55 BST CONTEXT: automatic vacuum of table "traffic.public.logdata5queue" Under those circumstances? Cheers, -- Michael Graham <mgraham@bloxx.com>
Michael Graham <mgraham@bloxx.com> writes: > Ah! This looks like it is very much the issue. Since I've got around > 150GB of data that should be truncatable and a select every ~2s. > Just to confirm would postgres write: > 2011-08-03 16:09:55 BST ERROR: canceling autovacuum task > 2011-08-03 16:09:55 BST CONTEXT: automatic vacuum of table > "traffic.public.logdata5queue" > Under those circumstances? Yup ... If you do a manual VACUUM, it won't allow itself to get kicked off the lock ... but as noted upthread, that will mean your other queries get blocked till it's done. Not sure there's any simple fix for this that doesn't involve some downtime. regards, tom lane
Michael Graham <mgraham@bloxx.com> writes: > On Wed, 2011-08-03 at 10:17 -0400, Tom Lane wrote: > >> Michael Graham <mgraham@bloxx.com> writes: >> > Would my applications >> > constant polling of the queue mean that the lock could not be easily >> > obtained? >> >> Very possible, depending on what duty cycle is involved there. > > Hmm. The clients aren't that aggressive, especially when they failed to > find data on a previous select, there are 4 clients, they each poll > every 10 seconds and the select runs in <1ms. > > It might be worth noting that they don't ever disconnect from the > server, but I assume that is not an issue for getting the > AccessExclusiveLock on the table? You are certain that those clients do these quick select as auto-commit? What does select current_query from pg_stat_activity say? -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net p: 305.321.1144
At 16:35 03/08/2011, Michael Graham wrote: >Yeah it said it last ran yesterday (and is currently running now), but I >did I notice in the log: > >2011-08-02 19:43:35 BST ERROR: canceling autovacuum task >2011-08-02 19:43:35 BST CONTEXT: automatic vacuum of table >"traffic.public.logdata5queue" > >Which is interesting if not particularly useful. > > > While you are running a test, you could keep an eye on the log to see > > if you get any of those messages. I think that would indicate > > autovacuum could not get a lock. You can also watch pg_stat_activity > > during the test, current_query will show you what table is being > > vacuumed. > >I'm pretty certain the autovacuumer is running while the test is on >going what I can't work out is under what circumstances it will be able >to return unused space to the OS in when it can't. One question, while you run your tests, does "IDLE IN TRANSACTION" messages happen? If you run your tests with a permanent connection to database, the tables are locked and autovacuum cannot work. I saw this with hibernate a queue... don't remember now, that stores data in postgres without closing connection. HTH
On 08/03/11 10:21 AM, Eduardo Morras wrote: > One question, while you run your tests, does "IDLE IN TRANSACTION" > messages happen? If you run your tests with a permanent connection to > database, the tables are locked and autovacuum cannot work. its not that tables are locked, its that vacuum (auto or otherwise) can't clear any tuples newer than the oldest active transaction. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
At 19:32 03/08/2011, you wrote: >On 08/03/11 10:21 AM, Eduardo Morras wrote: >>One question, while you run your tests, does "IDLE IN TRANSACTION" >>messages happen? If you run your tests with a permanent connection >>to database, the tables are locked and autovacuum cannot work. > >its not that tables are locked, its that vacuum (auto or otherwise) >can't clear any tuples newer than the oldest active transaction. Yes, this is what happened to me 2-3 years ago. Table had only 1-3 rows but its size is 80-90 Gb. Once the app quits, autovacuum cleaned the table and became 8KB
At 16:35 03/08/2011, Michael Graham wrote: >Yeah it said it last ran yesterday (and is currently running now), but I >did I notice in the log: > >2011-08-02 19:43:35 BST ERROR: canceling autovacuum task >2011-08-02 19:43:35 BST CONTEXT: automatic vacuum of table >"traffic.public.logdata5queue" > >Which is interesting if not particularly useful. > > > While you are running a test, you could keep an eye on the log to see > > if you get any of those messages. I think that would indicate > > autovacuum could not get a lock. You can also watch pg_stat_activity > > during the test, current_query will show you what table is being > > vacuumed. > >I'm pretty certain the autovacuumer is running while the test is on >going what I can't work out is under what circumstances it will be able >to return unused space to the OS in when it can't. One question, while you run your tests, does "IDLE IN TRANSACTION" messages happen? If you run your tests with a permanent connection to database, the tables are locked and autovacuum cannot work. I saw this with hibernate a queue... don't remember now, that stores data in postgres without closing connection. HTH
At 19:32 03/08/2011, you wrote: >On 08/03/11 10:21 AM, Eduardo Morras wrote: >>One question, while you run your tests, does "IDLE IN TRANSACTION" >>messages happen? If you run your tests with a permanent connection >>to database, the tables are locked and autovacuum cannot work. > >its not that tables are locked, its that vacuum (auto or otherwise) >can't clear any tuples newer than the oldest active transaction. Yes, this is what happened to me 2-3 years ago. Table had only 1-3 rows but its size is 80-90 Gb. Once the app quits, autovacuum cleaned the table and became 8KB