Re: REINDEXing database-wide daily - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: REINDEXing database-wide daily
Date
Msg-id 4BB1CCE302000025000301F5@gw.wicourts.gov
Whole thread Raw
In response to REINDEXing database-wide daily  ("Gnanakumar" <gnanam@zoniac.com>)
List pgsql-performance
"Gnanakumar" <gnanam@zoniac.com> wrote:

> We're using PostgreSQL 8.2.

Newer versions have much improved the VACUUM and CLUSTER features.
You might want to consider upgrading to a later major version.

> I have a question in connection to this question posted by me
> earlier:
>
>
http://archives.postgresql.org/pgsql-performance/2010-03/msg00343.php

I hope that you have stopped using VACUUM FULL on a regular basis,
based on the responses to that post.  The FULL option is only
intended as a means to recover from extreme heap bloat when there is
not room for a CLUSTER.  Any other use is going to cause problems.
If you continue to use it for other purposes, you may not get a lot
of sympathy when you inevitably experience those problems.

> I do not want to run pg_autovacuum daemon on a busy hour.

You would probably be surprised to see how much of a performance
boost you can get during your busy times by having a properly
configured autovacuum running.  My initial reaction to seeing
performance degradation during autovacuum was to make it less
aggressive, which lead to increasing bloat between autovacuum runs,
which degraded performance between runs and made things that much
worse when autovacuum finally kicked in.  It was only by using
aggressive maintenance to clean up the bloat and then configuring
autovacuum to be much more aggressive that I saw performance during
peak periods improve; although on some systems I had to introduce a
10 ms vacuum cost delay.

This is one of those areas where your initial intuitions can be
totally counter-productive.

> In case, if I can afford to take my database offline at low-usage
> time and perform REINDEX database-wide manually/linux cron, to
> boost up index performance, what is the community
> answer/suggestion on the following:
>
> 1.     Is it a good idea to perform this on a daily basis?

No.  It is generally not something to run on a routine basis, and if
you're not careful you could make performance worse, by making the
indexes so "tight" that most of your inserts or updates will cause
index page splits.

> 2.     Any implications of doing this on a daily basis?

We haven't found it necessary or useful, but if you have an
appropriate fill factor, I suppose it might not actually do any
damage.  There is some chance, based on your usage pattern, that a
daily CLUSTER of some tables might boost performance by reducing
random access, but daily REINDEX is unlikely to be a win.

> 3.     Is there a way to find out bloated indexes?

I don't have anything offhand, but you might poke around pg_class
looking at reltuples and relpages.

> 4.     Any other maintenance command, like ANALYZE, that has to be
> executed before/after REINDEX?

Not generally, but I seem to remember that there can be exceptions.
Indexes on expressions?  GIN?

> 5.     Is there a way to find out when REINDEX was last run on an
> INDEX/TABLE/DATABASE?

I don't think so.

> NOTE: I've also seen from my past experience that REINDEX
> database-wide greatly improves performance of the application.

I don't doubt that; if you've been shooting yourself in the foot by
running VACUUM FULL, then REINDEX would be a good bandage to
alleviate the pain.

My suggestion is to clean up your existing bloat by running CLUSTER
on all tables, configure autovacuum to aggressive values similar to
what you see in 8.3 or 8.4 and turn it on, run a nightly VACUUM
ANALYZE VERBOSE of the database and review the output to make sure
your fsm settings are adequate and to monitor bloat, and eliminate
all use of VACUUM FULL or REINDEX unless you've somehow slipped up
and allowed extreme bloat.  This will allow tables and indexes to
"settle in" to an efficient size where they are not constantly
giving up disk space to the OS at night and then having to reacquire
it from the OS when under heavy load during the day.

-Kevin

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: why does swap not recover?
Next
From: "Kevin Grittner"
Date:
Subject: Re: experiments in query optimization