Thread: REINDEXing database-wide daily
Hi,
We're using PostgreSQL 8.2.
I have a question in connection to this question posted by me earlier:
http://archives.postgresql.org/pgsql-performance/2010-03/msg00343.php
In our application, DML operations (INSERT/UPDATE/DELETE) are heavily performed in a day.
I also read about pg_autovacuum & REINDEX at:
http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html
http://www.postgresql.org/docs/8.2/static/sql-reindex.html
I do not want to run pg_autovacuum daemon on a busy hour.
In case, if I can afford to take my database offline at low-usage time and perform REINDEX database-wide manually/linux cron, to boost up index performance, what is the community answer/suggestion on the following:
1. Is it a good idea to perform this on a daily basis?
2. Any implications of doing this on a daily basis?
3. Is there a way to find out bloated indexes?
4. Any other maintenance command, like ANALYZE, that has to be executed before/after REINDEX?
5. Is there a way to find out when REINDEX was last run on an INDEX/TABLE/DATABASE?
NOTE: I've also seen from my past experience that REINDEX database-wide greatly improves performance of the application.
On 3/30/2010 4:32 AM, Gnanakumar wrote: > Hi, > > We're using PostgreSQL 8.2. > > I have a question in connection to this question posted by me earlier: > > http://archives.postgresql.org/pgsql-performance/2010-03/msg00343.php > > In our application, DML operations (INSERT/UPDATE/DELETE) are heavily > performed in a day. > > I also read about pg_autovacuum & REINDEX at: > > http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html > > http://www.postgresql.org/docs/8.2/static/sql-reindex.html > > I do not want to run pg_autovacuum daemon on a busy hour. > > In case, if I can afford to take my database offline at low-usage time > and perform REINDEX database-wide manually/linux cron, to boost up index > performance, what is the community answer/suggestion on the following: > > 1. Is it a good idea to perform this on a daily basis? > > 2. Any implications of doing this on a daily basis? > > 3. Is there a way to find out bloated indexes? > > 4. Any other maintenance command, like ANALYZE, that has to be executed > before/after REINDEX? > > 5. Is there a way to find out when REINDEX was last run on an > INDEX/TABLE/DATABASE? > > NOTE: I've also seen from my past experience that REINDEX database-wide > greatly improves performance of the application. > I could be way off base here, so I hope others will confirm/deny this: I think the more often you run vacuum, the less you notice it. If you wait for too long then vacuum will have to work harder and you'll notice a speed decrease. But many small vacuums which dont have as much work to do, you wont notice. It could be, and I'm guessing again, because your database grew from 3 to 30 gig (if I recall the numbers right), REINDEX had lots of affect. But if vacuum can keep up with space reuse, REINDEX may not be needed. (maybe a few weeks or once a month). -Andy
"Gnanakumar" <gnanam@zoniac.com> wrote: > We're using PostgreSQL 8.2. Newer versions have much improved the VACUUM and CLUSTER features. You might want to consider upgrading to a later major version. > I have a question in connection to this question posted by me > earlier: > > http://archives.postgresql.org/pgsql-performance/2010-03/msg00343.php I hope that you have stopped using VACUUM FULL on a regular basis, based on the responses to that post. The FULL option is only intended as a means to recover from extreme heap bloat when there is not room for a CLUSTER. Any other use is going to cause problems. If you continue to use it for other purposes, you may not get a lot of sympathy when you inevitably experience those problems. > I do not want to run pg_autovacuum daemon on a busy hour. You would probably be surprised to see how much of a performance boost you can get during your busy times by having a properly configured autovacuum running. My initial reaction to seeing performance degradation during autovacuum was to make it less aggressive, which lead to increasing bloat between autovacuum runs, which degraded performance between runs and made things that much worse when autovacuum finally kicked in. It was only by using aggressive maintenance to clean up the bloat and then configuring autovacuum to be much more aggressive that I saw performance during peak periods improve; although on some systems I had to introduce a 10 ms vacuum cost delay. This is one of those areas where your initial intuitions can be totally counter-productive. > In case, if I can afford to take my database offline at low-usage > time and perform REINDEX database-wide manually/linux cron, to > boost up index performance, what is the community > answer/suggestion on the following: > > 1. Is it a good idea to perform this on a daily basis? No. It is generally not something to run on a routine basis, and if you're not careful you could make performance worse, by making the indexes so "tight" that most of your inserts or updates will cause index page splits. > 2. Any implications of doing this on a daily basis? We haven't found it necessary or useful, but if you have an appropriate fill factor, I suppose it might not actually do any damage. There is some chance, based on your usage pattern, that a daily CLUSTER of some tables might boost performance by reducing random access, but daily REINDEX is unlikely to be a win. > 3. Is there a way to find out bloated indexes? I don't have anything offhand, but you might poke around pg_class looking at reltuples and relpages. > 4. Any other maintenance command, like ANALYZE, that has to be > executed before/after REINDEX? Not generally, but I seem to remember that there can be exceptions. Indexes on expressions? GIN? > 5. Is there a way to find out when REINDEX was last run on an > INDEX/TABLE/DATABASE? I don't think so. > NOTE: I've also seen from my past experience that REINDEX > database-wide greatly improves performance of the application. I don't doubt that; if you've been shooting yourself in the foot by running VACUUM FULL, then REINDEX would be a good bandage to alleviate the pain. My suggestion is to clean up your existing bloat by running CLUSTER on all tables, configure autovacuum to aggressive values similar to what you see in 8.3 or 8.4 and turn it on, run a nightly VACUUM ANALYZE VERBOSE of the database and review the output to make sure your fsm settings are adequate and to monitor bloat, and eliminate all use of VACUUM FULL or REINDEX unless you've somehow slipped up and allowed extreme bloat. This will allow tables and indexes to "settle in" to an efficient size where they are not constantly giving up disk space to the OS at night and then having to reacquire it from the OS when under heavy load during the day. -Kevin