Thread: Identifying bloated tables

Identifying bloated tables

From
Michal Taborsky - Internet Mall
Date:
I just put together a view, which helps us in indentifying which
database tables are suffering from space bloat, ie. they take up much
more space than they actually should. I though this might be useful for
some folk here, because the questions about bloat-related performance
degradation are quite common.

When using this view, you are interested in tables, which have the
"bloat" column higher that say 2.0 (in freshly dump/restored/analyzed
database they should all be around 1.0).

The bloat problem can be one-time fixed either by VACUUM FULL or
CLUSTER, but if the problem is coming back after while, you should
consider doing VACUUM more often or increasing you FSM settings in
postgresql.conf.

I hope I did the view right, it is more or less accurate, for our
purposes (for tables of just few pages the numbers may be off, but then
again, you are usually not much concerned about these tiny 5-page tables
performance-wise).

Hope this helps someone.

Here comes the view.


CREATE OR REPLACE VIEW "public"."relbloat" (
     nspname,
     relname,
     reltuples,
     relpages,
     avgwidth,
     expectedpages,
     bloat,
     wastedspace)
AS
SELECT pg_namespace.nspname, pg_class.relname, pg_class.reltuples,
     pg_class.relpages, rowwidths.avgwidth, ceil(((pg_class.reltuples *
     (rowwidths.avgwidth)::double precision) /
     (current_setting('block_size'::text))::double precision)) AS
expectedpages,
     ((pg_class.relpages)::double precision / ceil(((pg_class.reltuples *
     (rowwidths.avgwidth)::double precision) /
     (current_setting('block_size'::text))::double precision))) AS bloat,
     ceil(((((pg_class.relpages)::double precision *
     (current_setting('block_size'::text))::double precision) -
     ceil((pg_class.reltuples * (rowwidths.avgwidth)::double precision))) /
     (1024)::double precision)) AS wastedspace
FROM (((
     SELECT pg_statistic.starelid, sum(pg_statistic.stawidth) AS avgwidth
     FROM pg_statistic
     GROUP BY pg_statistic.starelid
     ) rowwidths JOIN pg_class ON ((rowwidths.starelid = pg_class.oid)))
JOIN
         pg_namespace ON ((pg_namespace.oid = pg_class.relnamespace)))
WHERE (pg_class.relpages > 1);


Bye.

--
Michal Táborský
IT operations chief
Internet Mall, a.s.
<http://www.MALL.cz>

Re: Identifying bloated tables

From
Brad Nicholson
Date:
On Mon, 2006-08-28 at 16:39 +0200, Michal Taborsky - Internet Mall
wrote:
> I just put together a view, which helps us in indentifying which
> database tables are suffering from space bloat, ie. they take up much
> more space than they actually should. I though this might be useful for
> some folk here, because the questions about bloat-related performance
> degradation are quite common.

Are you sure you haven't reinvented the wheel?  Have you checked out
contrib/pgstattuple ?

Brad.


Re: Identifying bloated tables

From
Michal Taborsky - Internet Mall
Date:
Brad Nicholson napsal(a):
>> I just put together a view, which helps us in indentifying which
>> database tables are suffering from space bloat, ie. they take up much

> Are you sure you haven't reinvented the wheel?  Have you checked out
> contrib/pgstattuple ?

Well, I wasn't aware of it, so I guess I did reinvent the wheel. I
Googled for a solution to this problem, but Googled poorly I suppose.

On the other hand, pgstattuple might be a bit difficult to use for
not-so-experienced users in answering the question "Which table should I
shrink?", as you have to first install it from contrib and then come up
with a select to pick the "worst" relations.

Anyway, if someone finds this view useful, good. If not, ignore it.

Bye.

--
Michal Táborský
IT operations chief
Internet Mall, a.s.
<http://www.MALL.cz>

Re: Identifying bloated tables

From
Alvaro Herrera
Date:
Brad Nicholson wrote:
> On Mon, 2006-08-28 at 16:39 +0200, Michal Taborsky - Internet Mall
> wrote:
> > I just put together a view, which helps us in indentifying which
> > database tables are suffering from space bloat, ie. they take up much
> > more space than they actually should. I though this might be useful for
> > some folk here, because the questions about bloat-related performance
> > degradation are quite common.
>
> Are you sure you haven't reinvented the wheel?  Have you checked out
> contrib/pgstattuple ?

Actually, pgstattuple needs to scan the whole table, so I think having a
cheap workaround that gives approximate figures is a good idea anyway.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Identifying bloated tables

From
Markus Schaber
Date:
Hi, Michal,

Michal Taborsky - Internet Mall wrote:

> When using this view, you are interested in tables, which have the
> "bloat" column higher that say 2.0 (in freshly dump/restored/analyzed
> database they should all be around 1.0).

I just noticed some columns in pg_catalog with a bloat value <1 and a
negative "wasted space" - is this due to the pseudo nature of them?

Markus


--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Identifying bloated tables

From
Michal Taborsky - Internet Mall
Date:
Markus Schaber napsal(a):
> Hi, Michal,
>
> Michal Taborsky - Internet Mall wrote:
>
>> When using this view, you are interested in tables, which have the
>> "bloat" column higher that say 2.0 (in freshly dump/restored/analyzed
>> database they should all be around 1.0).
>
> I just noticed some columns in pg_catalog with a bloat value <1 and a
> negative "wasted space" - is this due to the pseudo nature of them?

It is more likely due to the fact, that these numbers are just
estimates, based on collected table statistics, so for small or
non-standard tables the statistical error is greater that the actual
value. You are usually not interested in tables, which have wasted space
of 1000kB or -1000kB. Also the database must be ANALYZEd properly for
these numbers to carry any significance.

--
Michal Táborský
IT operations chief
Internet Mall, a.s.

Internet Mall - obchody, které si oblíbíte
<http://www.MALL.cz>

Re: Identifying bloated tables

From
"Peter Childs"
Date:
On 28/08/06, Michal Taborsky - Internet Mall <michal.taborsky@mall.cz> wrote:
> Markus Schaber napsal(a):
> > Hi, Michal,
> >
> > Michal Taborsky - Internet Mall wrote:
> >
> >> When using this view, you are interested in tables, which have the
> >> "bloat" column higher that say 2.0 (in freshly dump/restored/analyzed
> >> database they should all be around 1.0).
> >
> > I just noticed some columns in pg_catalog with a bloat value <1 and a
> > negative "wasted space" - is this due to the pseudo nature of them?
>
> It is more likely due to the fact, that these numbers are just
> estimates, based on collected table statistics, so for small or
> non-standard tables the statistical error is greater that the actual
> value. You are usually not interested in tables, which have wasted space
> of 1000kB or -1000kB. Also the database must be ANALYZEd properly for
> these numbers to carry any significance.
>

I was just playing around with this table and noticed it preforms the
badly in tables with very small record sizes. This seams to be because
it ignores the system overhead (oid, xmin ctid etc) which seams to be
about 28 bytes per a record this can be quite significate in small
record tables and can cause trouble even with a smal numbers of
record.  Hence I've got a table thats static and fresly "vacuum full"
which reads with a bloat of 4.

Easy to recreate problem to

Create table regionpostcode (area varchar(4), regionid int);

then insert 120000 records.

Peter.