Thread: Autovacuum help..
Hi, I need your help/suggestions with a problem I am facing related to autovacuum. I am using PostgreSQL 8.1.2 through a JDBC connection. The connection is long lived (established when the application starts up and is closed only when the application is shutdown). I have enabled the autovacuum daemon and setup additional parameters (for instance, stats_row_level=on) as specified in the PostgreSQL documentation. In the database, I have a table that has a fairly high rate of inserts and deletes (~10 rows a second). The table is basically a FIFO queue that can have a maximum of 800 entries. As new rows are added to the table, oldest rows are deleted such that the table always about 800 rows. The problem I see is that the database size (disk usage) is continually increasing even though I have the autovacuum daemon enabled and the PostgreSQL log file indicates that the autovacuum daemon is processing the databases every minute as expected. On digging in further, I noticed that the reltuples (in pg_catalog.pg_class) for the relevant table keeps increasing continually. I also noticed a large number of dead unremovable rows when I ran the vacuum analyze command. After shutting down my application, if I watch the reltuples, it continues to stay high until I run the analyze command (analyze verbose <table_name>) after which the reltuples drops to about 800 immediately. The analyze command output also indicates that the dead rows have been removed and I notice that the space is reused for future inserts when I restart the application. I am pretty sure that I don't have any transaction that is held open forever (the work flow is insert, commit, insert commit etc). My question is, is autovacuum expected to work in situations like this where I have a long lived connection to the database ? After I shutdown my application, why am required to run the "analyze" command before the dead rows are removed (autovacuum is not able to remove the dead rows until I run the "analyze" command) ? I'll appreciate your thoughts since I seem to be running out of things to try.. Thanks __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Tue, Aug 01, 2006 at 08:02:59AM -0700, Sundar Narayanaswamy wrote: > I need your help/suggestions with a problem I am facing related to autovacuum. > > I am using PostgreSQL 8.1.2 through a JDBC connection. The connection is > long lived (established when the application starts up and is closed only when > the application is shutdown). <snip> > On digging in further, I noticed that the reltuples (in pg_catalog.pg_class) > for the relevant table keeps increasing continually. I also noticed a large > number of dead unremovable rows when I ran the vacuum analyze command. Well, you need to work out why they are unremovable. For example, do you have any really long lived open transactions. These are generally a bad idea, for all sorts of reasons. If you don't commit occasionally, none of your changes will be visible to anyone else. > My question is, is autovacuum expected to work in situations like this where I > have a long lived connection to the database ? After I shutdown my application, It's got nothing to do with connections and everything to do with open transactions. I'd suggest you run a: select * from pg_stat_activity ; And look for messages like: IDLE in transaction. > why am required to run the "analyze" command before the dead rows are removed > (autovacuum is not able to remove the dead rows until I run the "analyze" > command) ? The stats arn't updated until the rows are actually removed. Eventually autovacuum would have done the analyze for you... Hope this helps, Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
On Tue, 1 Aug 2006 08:02:59 -0700 (PDT) Sundar Narayanaswamy <sundar007@yahoo.com> wrote: > Hi, > > I need your help/suggestions with a problem I am facing related to > autovacuum. > > I am using PostgreSQL 8.1.2 through a JDBC connection. The connection > is long lived (established when the application starts up and is > closed only when the application is shutdown). > > I have enabled the autovacuum daemon and setup additional parameters > (for instance, stats_row_level=on) as specified in the PostgreSQL > documentation. > > In the database, I have a table that has a fairly high rate of > inserts and deletes (~10 rows a second). The table is basically a > FIFO queue that can have a maximum of 800 entries. As new rows are > added to the table, oldest rows are deleted such that the table > always about 800 rows. > > The problem I see is that the database size (disk usage) is > continually increasing even though I have the autovacuum daemon > enabled and the PostgreSQL log file indicates that the autovacuum > daemon is processing the databases every minute as expected. > > On digging in further, I noticed that the reltuples (in > pg_catalog.pg_class) for the relevant table keeps increasing > continually. I also noticed a large number of dead unremovable rows > when I ran the vacuum analyze command. > > After shutting down my application, if I watch the reltuples, it > continues to stay high until I run the analyze command (analyze > verbose <table_name>) after which the reltuples drops to about 800 > immediately. The analyze command output also indicates that the dead > rows have been removed and I notice that the space is reused for > future inserts when I restart the application. > > I am pretty sure that I don't have any transaction that is held open > forever (the work flow is insert, commit, insert commit etc). > > My question is, is autovacuum expected to work in situations like > this where I have a long lived connection to the database ? After I > shutdown my application, > > why am required to run the "analyze" command before the dead rows are > removed (autovacuum is not able to remove the dead rows until I run > the "analyze" command) ? > > I'll appreciate your thoughts since I seem to be running out of > things to try.. > > Thanks > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ---------------------------(end of > broadcast)--------------------------- TIP 6: explain analyze is your > friend Sundar, Take a look at the documentation at: http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM There are a lot of configuration options that effect the autovacuum daemon. John Purser -- You will pay for your sins. If you have already paid, please disregard this message.
> Well, you need to work out why they are unremovable. For example, do > you have any really long lived open transactions. These are generally a > bad idea, for all sorts of reasons. If you don't commit occasionally, > none of your changes will be visible to anyone else. I have multiple databases on the server and a JDBC connection to each of the database from my app. Only couple of the databases have continuous inserts/delete activity and they usually stay in "idle" state. Other databases, however, don't have much of updates happen, but mostly selects. As I understand it, when executing the first select after a commit or rollback, the state changes to "Idle in Transaction" and state goes back to "idle" state when a commit/rollback happens. These databases that have selects on them most of the time stay in the "Idle in Transaction" state most of the time. Now, in this situation, if some databases stay in "Idle in transaction", would the dead rows be unremovable from other databases (that are in "idle" state) as well ? In other words, should ALL of the databases/connections in the server be in "idle" state for the autovacuum to be able to remove dead rows in any database ? > > > My question is, is autovacuum expected to work in situations like this > where I > > have a long lived connection to the database ? After I shutdown my > application, > > It's got nothing to do with connections and everything to do with open > transactions. I'd suggest you run a: > > select * from pg_stat_activity ; > > And look for messages like: IDLE in transaction. > I tried this, but I see <command string not enabled> as the current query. I searched for docs on this view (pg_stat_activity), but couldn't find much. Could you help me to enable it so that I can see the current query in this view ? I found that some databases are in "idle in transaction" from the ps -afe command. > > why am required to run the "analyze" command before the dead rows are > removed > > (autovacuum is not able to remove the dead rows until I run the "analyze" > > command) ? > > The stats arn't updated until the rows are actually removed. Eventually > autovacuum would have done the analyze for you... > > Hope this helps, > Thanks again. I am wondering as to why the state changes to "Transaction in idle" when a query is executed. It'll be nice if that happens only when a real change is made (transaction starts) to the database and not when a select query occurs. > Have a nice day, __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Tue, Aug 01, 2006 at 11:48:04AM -0700, Sundar Narayanaswamy wrote: > Now, in this situation, if some databases stay in "Idle in transaction", > would the dead rows be unremovable from other databases (that are in "idle" > state) as well ? In other words, should ALL of the databases/connections > in the server be in "idle" state for the autovacuum to be able to remove dead > rows in any database ? You'll have to check the docs, but it basically comes down to that VACUUM can only remove rows that are older than the oldest transaction. Whether this is per database or per cluster, I'm not sure... It's not that "IDLE in transaction" is bad in itself. It's that if you started a transaction three hours ago, no tuples deleted in the last three hours can be truly removed because that transaction can see them. > I tried this, but I see <command string not enabled> as the current query. > I searched for docs on this view (pg_stat_activity), but couldn't find > much. Could you help me to enable it so that I can see the current query > in this view ? I found that some databases are in "idle in transaction" from > the > ps -afe command. You have to set "stats_command_string=on" in the server config. But the output from "ps" is good also. > > Thanks again. I am wondering as to why the state changes to "Transaction in > idle" when a query is executed. It'll be nice if that happens only when > a real change is made (transaction starts) to the database and not when > a select query occurs. This makes no sense. A select query is also a query affected by transactions. In the example above, if you're in a transaction started three hours ago, a SELECT will be looking at a version of the database as it was three hours ago. Also, select queries can change the database also. Consider nextval() for example. The real question is, why are you keeping the transactions open? If they don't need to be, just commit them when you go idle and everything can be cleaned up normally. hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
> > > > Thanks again. I am wondering as to why the state changes to "Transaction in > > idle" when a query is executed. It'll be nice if that happens only when > > a real change is made (transaction starts) to the database and not when > > a select query occurs. > > This makes no sense. A select query is also a query affected by > transactions. In the example above, if you're in a transaction started > three hours ago, a SELECT will be looking at a version of the database > as it was three hours ago. Also, select queries can change the database > also. Consider nextval() for example. > > The real question is, why are you keeping the transactions open? If > they don't need to be, just commit them when you go idle and everything > can be cleaned up normally. > I am not keeping transactions open. Anytime an insert/delete/update is performed, the change is immediately committed (or rolled back). It is when selects are done that is causing a problem. The flow may be as below: insert into table ....; commit; <idle> (autovacuum can remove dead rows) <some time elapses> delete table ....; commit; <idle> (autovacuum can remove dead rows) select * from ....; read rows from result set <Idle in transaction> (autovacuum cannot remove dead rows) <LONG time elapses> (autovacuum cannot remove dead rows) . . The last select operation is the one of concern. I was just raising the point that select by itself (like the one here) probably shouldn't put the connection in "Idle in transaction" mode. Since my app does not do a commit (or rollback) after every select (and selects in my app don't modify the database), the connection is left in "Idle in transaction" state for several hours until a next insert/update/delete followed by commit takes it to "idle" state. And, autovacuum is unable to remove the dead rows until connection goes to "idle" state. Perhaps, the solution is that I should modify my app to do a rollback followed by every select. But that is a little awkward because selects don't really modify the database in my case. Thanks for your suggestions, sundar. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Wed, Aug 02, 2006 at 07:36:09AM -0700, Sundar Narayanaswamy wrote: > select * from ....; > read rows from result set > <Idle in transaction> (autovacuum cannot remove dead rows) > <LONG time elapses> > (autovacuum cannot remove dead rows) > . > . > The last select operation is the one of concern. I was just raising the point > that select by itself (like the one here) probably shouldn't put the > connection in "Idle in transaction" mode. > > Since my app does not do a commit (or rollback) after every select (and > selects in my app don't modify the database), the connection is left > in "Idle in transaction" state for several hours until a next > insert/update/delete followed by commit takes it to "idle" state. > And, autovacuum is unable to remove the dead rows until connection goes > to "idle" state. Sorry, selects still advance the transaction counter, create a snapshot, hold locks, can still fire triggers, update stats, call external functions, etc. Maybe in your case they don't but maybe someday you'll make a change to the database that will. Maybe modify your app so selects arn't run inside an explicit transaction. Then you don't need to commit or rollback anything. > Perhaps, the solution is that I should modify my app to do a rollback > followed by every select. But that is a little awkward because selects > don't really modify the database in my case. I imagine commit is cheaper and safer than a rollback... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Martijn van Oosterhout writes: >> My question is, is autovacuum expected to work in situations like this where I >> have a long lived connection to the database ? After I shutdown my application, I am having a simmilar situation, but I am just starting to debug it. Using 8.1.4 Have autovacuum running and it shows on the logs as running. Ever couple of days in the last week when I tried to run vacuum analyze postgresql complained that I needed to increase my fsm_pages. I am going to start scheduling vacuum analyze in crontab.. but if autovacuum is running shouldn't I not need to be running vacuum analyze?. and more importantly shouldn't I not be running out of fsm_pages? The first time I ran out.. I bumped it 100,000 over what it said it needed. The manual vacuum analyze ran fine after that. The same has happened 3 times so far.. 1 day apart (ie I have manually checked running vacuum analyze with 1 day difference) and I have increased it higher and higher. So if I was off by 20K pages.. I increased to 100K.. then 200K over.. then 500K over.. Also, at what level does autovacuum shows more info? Going over the archives.. I got the impression that it was debug2 so trying that. Does it only show more info when it does work?
Francisco Reyes wrote: > Using 8.1.4 > Have autovacuum running and it shows on the logs as running. > Ever couple of days in the last week when I tried to run vacuum analyze > postgresql complained that I needed to increase my fsm_pages. > > I am going to start scheduling vacuum analyze in crontab.. but if > autovacuum is running shouldn't I not need to be running vacuum > analyze?. and more importantly shouldn't I not be running out of fsm_pages? In many instances the default thresholds for autovacuum are too conservative. You can try making it more aggressive and see what happens, you can do this generally, or on a table specific basis.
Matthew T. O'Connor writes: > In many instances the default thresholds for autovacuum are too > conservative. You can try making it more aggressive and see what > happens, you can do this generally, or on a table specific basis. I think something is very wrong with the estimates I am getting back. After 3 times increasing the max_fsm_pages, every time I get the same error.. about needing to increase max_fsm_pages.. and always the error is recommending exactly the same number of additional fsm_pages. I see other threads with other people having simmilar problems. Going to see if I can find a resolution in the archives.
I haven't had issues about the max_fsm_pages. The problem I had was caused because I was not committing (or rolling back) after JDBC select calls. I didn't think that was necessary because the select calls didnt modify database. But, once I modified my selects to rollback after the data is read from the result set, autovacuum worked fine. I was hoping that with autovacumming working well, I don't need the cron tasks etc to do the vacuum full periodically.. But I read a few days back that reindexdb is recommended if there is a lot of inserts/deletes. I have to experiment with that yet. I'll check what I have set for max_fsm_pages tomorrow. --- Francisco Reyes <lists@stringsutils.com> wrote: > Matthew T. O'Connor writes: > > > In many instances the default thresholds for autovacuum are too > > conservative. You can try making it more aggressive and see what > > happens, you can do this generally, or on a table specific basis. > > I think something is very wrong with the estimates I am getting back. After > 3 times increasing the max_fsm_pages, every time I get the same error.. > about needing to increase max_fsm_pages.. and always the error is > recommending exactly the same number of additional fsm_pages. > > I see other threads with other people having simmilar problems. Going to see > if I can find a resolution in the archives. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com