Thread: Vaccuum best practice: cronjob or autovaccuum?
On our database of about 5GB we vaccuum all of our 12 tables (only one is huge, all others have about 100,000 rows or so) every hour or so. But we also have autovaccuum enabled. Is this okay? Do the two vaccuum processes contradict each other, or add unnecessary load to the system? The reason we introduced the cronjob we felt was that the autovaccuum was not really doing its job. I wonder if anyone can share some insight on whether these settings are good for a DB that is basically 24x7: autovacuum = on autovacuum_vacuum_cost_delay = 20 vacuum_cost_delay = 20 autovacuum_naptime = 10 stats_start_collector = on stats_row_level = on autovacuum_vacuum_threshold = 75 autovacuum_analyze_threshold = 25 autovacuum_analyze_scale_factor = 0.02 autovacuum_vacuum_scale_factor = 0.01 Thanks! I've read up some websites for this kind of tuning, but often the info is for older versions of DBs and some of the tweaks have changed since. I am on 8.2.9.
On Thu, Aug 28, 2008 at 07:53:17PM +0800, Phoenix Kiula wrote: > On our database of about 5GB we vaccuum all of our 12 tables (only one > is huge, all others have about 100,000 rows or so) every hour or so. > > But we also have autovaccuum enabled. Is this okay? Do the two vaccuum > processes contradict each other, or add unnecessary load to the > system? They do the latter. Especially vacuuming every hour, you're foiling the autovacuum and just introducing completely unnecessary load. > The reason we introduced the cronjob we felt was that the autovaccuum > was not really doing its job. What gave you this feeling? What's the "churn rate" on these tables (i.e. how fast do dead rows get created?) My suspicion is that, using the 8.2 autovacuum, your "huge" table needs to be vacuumed manually, and the others just left to autovacuum. But that's just a guess at the moment. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On Thu, 2008-08-28 at 19:53 +0800, Phoenix Kiula wrote: > On our database of about 5GB we vaccuum all of our 12 tables (only one > is huge, all others have about 100,000 rows or so) every hour or so. if you refer to manual VACUUM or VACUUM FULL every hour is probably too much. You should aim your vacuum full for about 1ce per week. > > But we also have autovaccuum enabled. Is this okay? Do the two vaccuum > processes contradict each other, or add unnecessary load to the > system? read the manuals in www.postgresql.org specifically read this seciton: http://www.postgresql.org/docs/8.3/static/maintenance.html you'll find that once in a while (start at once/week and build up or down from there) you can/should: - vacuum full - reindex your tables - reindex your indexes > > The reason we introduced the cronjob we felt was that the autovaccuum > was not really doing its job. how did you realise that ? turn off the cron job, wait a few days. In the meanwhile monitor your disk space ocupation (du -sh /var/lib?/pgsql....????/base/) if you see that size growing and the total row count (select count(*) from whatever) isn't gorwing you need external vacuums > I wonder if anyone can share some > insight on whether these settings are good for a DB that is basically > 24x7: like someone sayd: it's not the 24x7. it's the: how many tuples get DELETEd or UPDTATEd (for the case of autovacuum) in one day, for example. If you find that your db updates/deletes many tuples per hour > > autovacuum = on > autovacuum_vacuum_cost_delay = 20 > vacuum_cost_delay = 20 > autovacuum_naptime = 10 > stats_start_collector = on > stats_row_level = on > autovacuum_vacuum_threshold = 75 > autovacuum_analyze_threshold = 25 > autovacuum_analyze_scale_factor = 0.02 > autovacuum_vacuum_scale_factor = 0.01 these two can be tricky: if your database is very big, then 2% of 'very big' is 'a lot of changes' before autovacuum even tries to vacuum. read the documentation about these two. you might want to consider using scale factors of 0 and increase just a bit both thresholds; p. ex autovacuum_vacuum_threshold = 20000 autovacuum_analyze_threshold = 10000 Autovacuum is something that you adjust to your needs; Another tip: edit your postgresql.conf and bring the debug levels to: log_min_messages = debug3 log_min_error_statement = debug3 then just follow the logfile (location depends on distro; /var/log/postgresql; /var/pgsql/data/; whatever) IMPORTANT: don't leave the debug3 forever: it is very verbose and _will_ eat up your disc in no time; you might want to do something like this: tail -f logfile | grep vac or tail -f logfile | grep threshold take your time :) autovacuum requires a bit of dedication but, in the end it works fine (preventig disc space growth) note also that VACUUM has nothing to do with REINDEXing and REINDEXing also frees a considerable amount of disk space in certain cases. very important: read the docs: http://www.postgresql.org/docs/8.3/static/maintenance.html In my case I have autovaccum with scale factors 0 and naptime 600; also a cron job for vacuum full and reindex everything once a week (during the night). its working fine on a db with about 2 Giga and average 10000 deletes a day and well above 200000 INSERTs/UPDATEs per day. cheers joao PS: I'm saying all this because I'm just going through this process myself and I think I'm on the right track. things are starting to work the way I want them too :) > > Thanks! I've read up some websites for this kind of tuning, but often > the info is for older versions of DBs and some of the tweaks have > changed since. I am on 8.2.9. >
On Thu, Aug 28, 2008 at 9:47 AM, Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com> wrote: > > http://www.postgresql.org/docs/8.3/static/maintenance.html > > you'll find that once in a while (start at once/week and build up or > down from there) you can/should: > > - vacuum full > - reindex your tables > - reindex your indexes > Actually the manuals doesn't recomend VACUUM FULL for routine maintenance anymore """ For those not using autovacuum, one approach is to schedule a database-wide VACUUM once a day during low-usage period, supplemented by more frequent vacuuming of heavily-updated tables if necessary. (Some installations with extremely high update rates vacuum their busiest tables as often as once every few minutes.) If you have multiple databases in a cluster, don't forget to VACUUM each one; the program vacuumdb might be helpful. VACUUM FULL is recommended for cases where you know you have deleted the majority of rows in a table, so that the steady-state size of the table can be shrunk substantially with VACUUM FULL's more aggressive approach. Use plain VACUUM, not VACUUM FULL, for routine vacuuming for space recovery. """ -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. (593) 87171157
In response to Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com>: > > On Thu, 2008-08-28 at 19:53 +0800, Phoenix Kiula wrote: > > On our database of about 5GB we vaccuum all of our 12 tables (only one > > is huge, all others have about 100,000 rows or so) every hour or so. > > if you refer to manual VACUUM or VACUUM FULL every hour is probably too > much. You should aim your vacuum full for about 1ce per week. This statement is ambiguous. Advice about VACUUM is _very_ different than advice about VACUUM FULL. Running a normal VACUUM once an hour may be exactly what you need, but you didn't give enough evidence one way or the other. Look into the various system catalog tables to see how much bloat your tables and indexes have and track that over time to see if autovac and/or your cron vacuum is keeping things in check. > > But we also have autovaccuum enabled. Is this okay? Do the two vaccuum > > processes contradict each other, or add unnecessary load to the > > system? If you're cron jobs are vacuuming enough, then autovac will simply take a few CPU cycles to realize that nothing needs to be done. > you'll find that once in a while (start at once/week and build up or > down from there) you can/should: > > - vacuum full > - reindex your tables > - reindex your indexes Whoah there. How on earth did you derive that from those documents? That's a pretty lousy maintenance plan with lots of unneeded overhead, unless you're database has a very edge-case access pattern. VACUUM FULL really only needs done in extreme cases where massive data bloat is experienced, and not expected to happen again. If massive data bloat is routine, you're probably better off letting ordinary VACUUM maintain a consistent level of free space on tables. REINDEXING seems nice at times, but I've yet to see any evidence that it's necessary. There are probably some corner cases, but I've not seen them documented. I tried to document what I thought was a corner case once, and was unable to come up with anything conclusive. I don't even know what you mean by "reindex your indexes" > > The reason we introduced the cronjob we felt was that the autovaccuum > > was not really doing its job. > > how did you realise that ? turn off the cron job, wait a few days. In > the meanwhile monitor your disk space ocupation (du > -sh /var/lib?/pgsql....????/base/) I agree with most of this, although it'll probably be easier to look into PostgreSQL system tables to watch the actual table sizes: select relname, relpages from pg_class where relkind='i' and relname not like 'pg_%' order by relname; You may find that autovacuum handles everything well except for one or two tables, in which case you can optimize your cron job to vacuum just those tables. > > I wonder if anyone can share some > > insight on whether these settings are good for a DB that is basically > > 24x7: > > like someone sayd: it's not the 24x7. it's the: how many tuples get > DELETEd or UPDTATEd (for the case of autovacuum) in one day, for > example. The 24/7 aspect _is_ part of the equation. VACUUM FULL takes out an exclusive lock while doing it's work, which is not appropriate for 24/7 operation. Routine VACUUM FULL is _never_ appropriate for a 24/7 operation (added to the fact that it's seldom appropriate at all). [snip lots of information that is good and I have nothing to add to] > In my case I have autovaccum with scale factors 0 and naptime 600; also > a cron job for vacuum full and reindex everything once a week (during > the night). its working fine on a db with about 2 Giga and average 10000 > deletes a day and well above 200000 INSERTs/UPDATEs per day. You're implying that the DB has down time at night, by this comment, which means you have different capabilities on what you can do than the original poster, who does _not_ want to run VACUUM FULL at night if he wants to maintain 24/7 operation. That being said, I still doubt that VACUUM FULL is necessary. Have you _tried_ running with normal vacuum alone? Have you benchmarked the system to demonstrate that VACUUM FULL is really helping anything? That being said, if you have a big outage window every week where you can afford to run whatever you want without interfering with other users, feel free to continue as you are. It won't cause you any problems. The naptime at 600 is probably a bad idea. If you only have one user database on this system, then it only gets investigated by autovac once every 40 minutes (template0 ... template1 ... postgres ... yourdb) Consider that autovac uses very little resources when it determines that it has no work to do. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
For the record: Bill Moran escribió: > The naptime at 600 is probably a bad idea. If you only have one user > database on this system, then it only gets investigated by autovac once > every 40 minutes (template0 ... template1 ... postgres ... yourdb) > Consider that autovac uses very little resources when it determines that > it has no work to do. Note that on 8.3, the meaning of naptime has changed -- it would be effectively "check each database once every 10 minutes" if set at 600. This was changed precisely because the previous semantics were difficult to explain/use. The OP is using 8.2 though so it doesn't apply here. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
In response to Alvaro Herrera <alvherre@commandprompt.com>: > For the record: > > Bill Moran escribió: > > > The naptime at 600 is probably a bad idea. If you only have one user > > database on this system, then it only gets investigated by autovac once > > every 40 minutes (template0 ... template1 ... postgres ... yourdb) > > Consider that autovac uses very little resources when it determines that > > it has no work to do. > > Note that on 8.3, the meaning of naptime has changed -- it would be > effectively "check each database once every 10 minutes" if set at 600. > This was changed precisely because the previous semantics were difficult > to explain/use. Ooo ... that's an exciting "gotcha"! -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Hello, I'dd like to apologise about my short knowledge of VACUUM FULL and REINDEX. I'm just stating what I do in my case. I don not know if it is a corner case or not. I've been dealing with this specific application which is very demanding for Postgres for about 2 years. When autovacuum was introduced, I kept the weekly VACUUM FULL because it efectively brings disk ocupatio down, dispite it grows back after a few hours. It's just re-assuring to me to make sure that at least one of the vacuums it's working when I see the weekly drop of disk ocupation. On Thu, 2008-08-28 at 11:27 -0400, Bill Moran wrote: > In response to Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com>: > > > > On Thu, 2008-08-28 at 19:53 +0800, Phoenix Kiula wrote: > > > On our database of about 5GB we vaccuum all of our 12 tables (only one > > > is huge, all others have about 100,000 rows or so) every hour or so. > > > > if you refer to manual VACUUM or VACUUM FULL every hour is probably too > > much. You should aim your vacuum full for about 1ce per week. > > This statement is ambiguous. Advice about VACUUM is _very_ different > than advice about VACUUM FULL. > > Running a normal VACUUM once an hour may be exactly what you need, but > you didn't give enough evidence one way or the other. Look into the > various system catalog tables to see how much bloat your tables and > indexes have and track that over time to see if autovac and/or your > cron vacuum is keeping things in check. > > > > But we also have autovaccuum enabled. Is this okay? Do the two vaccuum > > > processes contradict each other, or add unnecessary load to the > > > system? > > If you're cron jobs are vacuuming enough, then autovac will simply > take a few CPU cycles to realize that nothing needs to be done. > > > you'll find that once in a while (start at once/week and build up or > > down from there) you can/should: > > > > - vacuum full > > - reindex your tables > > - reindex your indexes > Whoah there. How on earth did you derive that from those documents? I did not say I did. I just advised the original poster to read the docs. I'm not trying to say 'this is the way'. I'm trynig to say 'read the docs; the docs are good'. but, BTW, quoting: http://www.postgresql.org/docs/8.3/static/routine-reindex.html --------------------------------------------- The potential for bloat in non-B-tree indexes has not been well characterized. ---------------------------------------------------- ------------------------------------------------- It is a good idea to keep an eye on the index's physical size when using any non-B-tree index type. ---------------------------------------------------- ---------------------------------------------------- Also, for B-tree indexes a freshly-constructed index is somewhat faster to access than one that has been updated many times, because logically adjacent pages are usually also physically adjacent in a newly built index. (This consideration does not currently apply to non-B-tree indexes.) It might be worthwhile to reindex periodically just to improve access speed. ---------------------------------------------------------- I did some tests (lets say about 50 INSERT/UPDATES per second and somw thousande DELETEs once in a while). Query execution time DROPs 10 times after REINDEXING and VACUUM I've seen 600Megas DROP in disk size ocupation just by reindexing; additionally about 500 Megas drop by VACCUMING; this was on a 100Megas of usefull data. > That's a pretty lousy maintenance plan with lots of unneeded overhead, > unless you're database has a very edge-case access pattern. !? Many people mention these 'edeg-case access pattern', and 'corner cases'. I don't now if mine is such. My appliucation executes patterns of varying INSERTING and UPDATING but it must overcome stress tests with about 100 UPDATES mixed with INSERTS per second. Additionaly it DELETEs up to 10000 records once every night. It's a 'near-realtime' log store which stores log information for 'you name it' logs in Linux based systems. It also removes old data once a night. > > VACUUM FULL really only needs done in extreme cases where massive data > bloat is experienced, and not expected to happen again. If massive data > bloat is routine, you're probably better off letting ordinary VACUUM > maintain a consistent level of free space on tables. well. I've seen massive data and index bloating in my application. In some cases REINDEXING and VACUUM FULL ing was the only way to clear up things. but you're probably right. I'll give it a try. thx > > REINDEXING seems nice at times, but I've yet to see any evidence that > it's necessary. There are probably some corner cases, but I've not seen > them documented. neither did I. REINDEXING helps me keep query execution time low. Othewise it will increase to levels I cannot have. We use 'modest' processors and 'modest' storage, in hioghly dependable embedded systems. > I tried to document what I thought was a corner case > once, and was unable to come up with anything conclusive. > > I don't even know what you mean by "reindex your indexes" I mean this: http://www.postgresql.org/docs/8.3/static/sql-reindex.html this operation, on my table and my indexes, is helping me effectiuvely keep query execution times down. > > > > The reason we introduced the cronjob we felt was that the autovaccuum > > > was not really doing its job. > > > > how did you realise that ? turn off the cron job, wait a few days. In > > the meanwhile monitor your disk space ocupation (du > > -sh /var/lib?/pgsql....????/base/) > > I agree with most of this, although it'll probably be easier to look into > PostgreSQL system tables to watch the actual table sizes: > select relname, relpages from pg_class > where relkind='i' and relname not like 'pg_%' order by relname; > > You may find that autovacuum handles everything well except for one or > two tables, in which case you can optimize your cron job to vacuum just > those tables. > > > > I wonder if anyone can share some > > > insight on whether these settings are good for a DB that is basically > > > 24x7: > > > > like someone sayd: it's not the 24x7. it's the: how many tuples get > > DELETEd or UPDTATEd (for the case of autovacuum) in one day, for > > example. > > The 24/7 aspect _is_ part of the equation. yes. I agree. sorry for the confusion. I did not mean that 24x7 was not important. I meant that it is not releveant to the efectiveness of any of the vacuum types. And yes I agree that VACUM FULL uses locking which will bring the system unusable for a certain amount of time.
On Thu, Aug 28, 2008 at 05:15:06PM +0100, Joao Ferreira gmail wrote: > When autovacuum was introduced, I kept the weekly VACUUM FULL because it > efectively brings disk ocupatio down, dispite it grows back after a few > hours. It's just re-assuring to me to make sure that at least one of the > vacuums it's working when I see the weekly drop of disk ocupation. It actually bloats your index and hurts you. Don't do that. If the number of tuples in your various tables is staying about the same, your size on disk should stabilise. That's the thing you want to see. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
In response to Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com>: > > When autovacuum was introduced, I kept the weekly VACUUM FULL because it > effectively brings disk occupation down, despite it grows back after a few > hours. It's just re-assuring to me to make sure that at least one of the > vacuums it's working when I see the weekly drop of disk occupation. Note the "it grows back after a few hours" is unneeded load on your system. You just spent a lot of IO shrinking a file that PG is now going to spend a lot of IO re-enlarging. Thus you made PG work harder for a while after the VACUUM FULL in order to get the table back to a stable size. > quoting: > > http://www.postgresql.org/docs/8.3/static/routine-reindex.html > > --------------------------------------------- > The potential for bloat in non-B-tree indexes has not been well > characterized. > ---------------------------------------------------- > > ------------------------------------------------- > It is a good idea to keep an eye on the index's physical size when using > any non-B-tree index type. > ---------------------------------------------------- Both special cases. B-tree's are default. > ---------------------------------------------------- > Also, for B-tree indexes a freshly-constructed index is somewhat faster > to access than one that has been updated many times, because logically > adjacent pages are usually also physically adjacent in a newly built > index. (This consideration does not currently apply to non-B-tree > indexes.) It might be worthwhile to reindex periodically just to improve > access speed. > ---------------------------------------------------------- > > I did some tests (lets say about 50 INSERT/UPDATES per second and somw > thousande DELETEs once in a while). > > Query execution time DROPs 10 times after REINDEXING and VACUUM This would be interesting to this conversation if you _only_ did the REINDEX. The addition of the VACUUM makes it ambiguous as to which command actually made the improvement. > I've seen 600Megas DROP in disk size ocupation just by reindexing; Is that consistent, or was it an extraordinary case? It's pretty amazing to imagine 600M of index bloat on a 100M database ... how many indexes do you have? Are you sure you don't have duplicate indexes or some other issue? > additionally about 500 Megas drop by VACCUMING; this was on a 100Megas > of usefull data. It's probably because you're only vacuuming once a week. > > That's a pretty lousy maintenance plan with lots of unneeded overhead, > > unless you're database has a very edge-case access pattern. > > !? Many people mention these 'edeg-case access pattern', and 'corner > cases'. I don't now if mine is such. My appliucation executes patterns > of varying INSERTING and UPDATING but it must overcome stress tests with > about 100 UPDATES mixed with INSERTS per second. Additionaly it DELETEs > up to 10000 records once every night. > > It's a 'near-realtime' log store which stores log information for 'you > name it' logs in Linux based systems. It also removes old data once a > night. That is a fairly edge-case use that's actually documented in the docs you reference to be likely to cause abnormal amounts of bloat. However, it's just one index on one table (unless you have a bunch of indexes on that table that are all continually increasing?) Unless there's something I'm missing, it hardly justifies reindexing the entire database. > > VACUUM FULL really only needs done in extreme cases where massive data > > bloat is experienced, and not expected to happen again. If massive data > > bloat is routine, you're probably better off letting ordinary VACUUM > > maintain a consistent level of free space on tables. > > well. I've seen massive data and index bloating in my application. In > some cases REINDEXING and VACUUM FULL ing was the only way to clear up > things. It's possible that the stable size for you tables has more free space than actual data. While that seems crazy, it's really no different than having a file server with 4x as much disk space as people normally are using. > > REINDEXING seems nice at times, but I've yet to see any evidence that > > it's necessary. There are probably some corner cases, but I've not seen > > them documented. > > neither did I. REINDEXING helps me keep query execution time low. > Othewise it will increase to levels I cannot have. We use 'modest' > processors and 'modest' storage, in highly dependable embedded systems. Oh ... well this might change a lot. Depending on what you mean by "embedded", you may be using experience that's _very_ specialized to make generalizations from. The RAM/CPU/storage/etc constraints on anything I would call "embedded" are different than just about anyone else would be dealing with for tuning a DB server. If you're looking to tune PG to run well on an embedded platform, my advice would be to throw out everything you've heard and test extensively for yourself -- which sounds like what you've done. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023