Thread: REINDEXing database-wide daily

REINDEXing database-wide daily

From
"Gnanakumar"
Date:

Hi,

 

We're using PostgreSQL 8.2.

 

I have a question in connection to this question posted by me earlier:

http://archives.postgresql.org/pgsql-performance/2010-03/msg00343.php

 

In our application, DML operations (INSERT/UPDATE/DELETE) are heavily performed in a day.

 

I also read about pg_autovacuum & REINDEX at:

http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html

http://www.postgresql.org/docs/8.2/static/sql-reindex.html

 

I do not want to run pg_autovacuum daemon on a busy hour.

 

In case, if I can afford to take my database offline at low-usage time and perform REINDEX database-wide manually/linux cron, to boost up index  performance, what is the community answer/suggestion on the following:

1.     Is it a good idea to perform this on a daily basis?

2.     Any implications of doing this on a daily basis?

3.     Is there a way to find out bloated indexes?

4.     Any other maintenance command, like ANALYZE, that has to be executed before/after REINDEX?

5.     Is there a way to find out when REINDEX was last run on an INDEX/TABLE/DATABASE?

 

NOTE: I've also seen from my past experience that REINDEX database-wide greatly improves performance of the application.

 

Re: REINDEXing database-wide daily

From
Andy Colson
Date:
On 3/30/2010 4:32 AM, Gnanakumar wrote:
> Hi,
>
> We're using PostgreSQL 8.2.
>
> I have a question in connection to this question posted by me earlier:
>
> http://archives.postgresql.org/pgsql-performance/2010-03/msg00343.php
>
> In our application, DML operations (INSERT/UPDATE/DELETE) are heavily
> performed in a day.
>
> I also read about pg_autovacuum & REINDEX at:
>
> http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html
>
> http://www.postgresql.org/docs/8.2/static/sql-reindex.html
>
> I do not want to run pg_autovacuum daemon on a busy hour.
>
> In case, if I can afford to take my database offline at low-usage time
> and perform REINDEX database-wide manually/linux cron, to boost up index
> performance, what is the community answer/suggestion on the following:
>
> 1. Is it a good idea to perform this on a daily basis?
>
> 2. Any implications of doing this on a daily basis?
>
> 3. Is there a way to find out bloated indexes?
>
> 4. Any other maintenance command, like ANALYZE, that has to be executed
> before/after REINDEX?
>
> 5. Is there a way to find out when REINDEX was last run on an
> INDEX/TABLE/DATABASE?
>
> NOTE: I've also seen from my past experience that REINDEX database-wide
> greatly improves performance of the application.
>


I could be way off base here, so I hope others will confirm/deny this:
I think the more often you run vacuum, the less you notice it.  If you
wait for too long then vacuum will have to work harder and you'll notice
a speed decrease.  But many small vacuums which dont have as much work
to do, you wont notice.

It could be, and I'm guessing again, because your database grew from 3
to 30 gig (if I recall the numbers right), REINDEX had lots of affect.
But if vacuum can keep up with space reuse, REINDEX may not be needed.
(maybe a few weeks or once a month).

-Andy



Re: REINDEXing database-wide daily

From
"Kevin Grittner"
Date:
"Gnanakumar" <gnanam@zoniac.com> wrote:

> We're using PostgreSQL 8.2.

Newer versions have much improved the VACUUM and CLUSTER features.
You might want to consider upgrading to a later major version.

> I have a question in connection to this question posted by me
> earlier:
>
>
http://archives.postgresql.org/pgsql-performance/2010-03/msg00343.php

I hope that you have stopped using VACUUM FULL on a regular basis,
based on the responses to that post.  The FULL option is only
intended as a means to recover from extreme heap bloat when there is
not room for a CLUSTER.  Any other use is going to cause problems.
If you continue to use it for other purposes, you may not get a lot
of sympathy when you inevitably experience those problems.

> I do not want to run pg_autovacuum daemon on a busy hour.

You would probably be surprised to see how much of a performance
boost you can get during your busy times by having a properly
configured autovacuum running.  My initial reaction to seeing
performance degradation during autovacuum was to make it less
aggressive, which lead to increasing bloat between autovacuum runs,
which degraded performance between runs and made things that much
worse when autovacuum finally kicked in.  It was only by using
aggressive maintenance to clean up the bloat and then configuring
autovacuum to be much more aggressive that I saw performance during
peak periods improve; although on some systems I had to introduce a
10 ms vacuum cost delay.

This is one of those areas where your initial intuitions can be
totally counter-productive.

> In case, if I can afford to take my database offline at low-usage
> time and perform REINDEX database-wide manually/linux cron, to
> boost up index performance, what is the community
> answer/suggestion on the following:
>
> 1.     Is it a good idea to perform this on a daily basis?

No.  It is generally not something to run on a routine basis, and if
you're not careful you could make performance worse, by making the
indexes so "tight" that most of your inserts or updates will cause
index page splits.

> 2.     Any implications of doing this on a daily basis?

We haven't found it necessary or useful, but if you have an
appropriate fill factor, I suppose it might not actually do any
damage.  There is some chance, based on your usage pattern, that a
daily CLUSTER of some tables might boost performance by reducing
random access, but daily REINDEX is unlikely to be a win.

> 3.     Is there a way to find out bloated indexes?

I don't have anything offhand, but you might poke around pg_class
looking at reltuples and relpages.

> 4.     Any other maintenance command, like ANALYZE, that has to be
> executed before/after REINDEX?

Not generally, but I seem to remember that there can be exceptions.
Indexes on expressions?  GIN?

> 5.     Is there a way to find out when REINDEX was last run on an
> INDEX/TABLE/DATABASE?

I don't think so.

> NOTE: I've also seen from my past experience that REINDEX
> database-wide greatly improves performance of the application.

I don't doubt that; if you've been shooting yourself in the foot by
running VACUUM FULL, then REINDEX would be a good bandage to
alleviate the pain.

My suggestion is to clean up your existing bloat by running CLUSTER
on all tables, configure autovacuum to aggressive values similar to
what you see in 8.3 or 8.4 and turn it on, run a nightly VACUUM
ANALYZE VERBOSE of the database and review the output to make sure
your fsm settings are adequate and to monitor bloat, and eliminate
all use of VACUUM FULL or REINDEX unless you've somehow slipped up
and allowed extreme bloat.  This will allow tables and indexes to
"settle in" to an efficient size where they are not constantly
giving up disk space to the OS at night and then having to reacquire
it from the OS when under heavy load during the day.

-Kevin