estimating the need for VACUUM FULL and REINDEX - Mailing list pgsql-performance

From Guillaume Cottenceau
Subject estimating the need for VACUUM FULL and REINDEX
Date
Msg-id 87odkvmsav.fsf@meuh.mnc.lan
Whole thread Raw
Responses Re: estimating the need for VACUUM FULL and REINDEX  (Heikki Linnakangas <heikki@enterprisedb.com>)
Re: estimating the need for VACUUM FULL and REINDEX  (Bill Moran <wmoran@collaborativefusion.com>)
Re: estimating the need for VACUUM FULL and REINDEX  (Guillaume Cottenceau <gc@mnc.ch>)
List pgsql-performance
I'm trying to come up with a way to estimate the need for a
VACUUM FULL and/or a REINDEX on some tables.


According to documentation[1], VACUUM FULL's only benefit is
returning unused disk space to the operating system; am I correct
in assuming there's also the benefit of optimizing the
performance of scans, because rows are physically compacted on
the disk?

With that in mind, I've tried to estimate how much benefit would
be brought by running VACUUM FULL, with the output of VACUUM
VERBOSE. However, it seems that for example the "removable rows"
reported by each VACUUM VERBOSE run is actually reused by VACUUM,
so is not what I'm looking for.


Then according to documentation[2], REINDEX has some benefit when
all but a few index keys on a page have been deleted, because the
page remains allocated (thus, I assume it improves index scan
performance, am I correct?). However, again I'm unable to
estimate the expected benefit. With a slightly modified version
of a query found in documentation[3] to see the pages used by a
relation[4], I'm able to see that the index data from a given
table...

            relname         | relpages | reltuples
    ------------------------+----------+-----------
     idx_sessions_owner_key |       38 |      2166
     pk_sessions            |       25 |      2166

...is duly optimized after a REINDEX:

            relname         | relpages | reltuples
    ------------------------+----------+-----------
     idx_sessions_owner_key |       13 |      2166
     pk_sessions            |        7 |      2166

but what I'd need is really these 38-13 and 25-7 figures (or
estimates) prior to running REINDEX.


Thanks for any insight.


Ref:
[1] http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html

[2] http://www.postgresql.org/docs/8.2/interactive/routine-reindex.html

[3] http://www.postgresql.org/docs/8.2/interactive/disk-usage.html

[4] SELECT c2.relname, c2.relpages, c2.reltuples
      FROM pg_class c, pg_class c2, pg_index i
     WHERE c.relname = 'sessions'
       AND c.oid = i.indrelid
       AND c2.oid = i.indexrelid
     ORDER BY c2.relname;

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

pgsql-performance by date:

Previous
From: "Luke Lonergan"
Date:
Subject: Re: Best OS for Postgres 8.2
Next
From: Adam Tauno Williams
Date:
Subject: Re: [OT] Best OS for Postgres 8.2