Thread: Bloated Table

Bloated Table

From
Alexander Schöcke
Date:
Hello everybody.

I'm using a view (http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html) to display the bloat (unused disk space)
ofthe tables in a PostgreSQL database. Using this data I want to implement a database maintenance script automatically
exectutinga VACUUM FULL on these tables. 
Unfortunately I am finding a table to have bloat which can't be reclaimed. I have tried VACUUM, REINDEX, VACUUM FULL
ANALYZEwith REINDEX, and even dump and restore. The view always shows 375MB of bloat for the table. 

Is this normal? Here's the table structure:

                                    Table "public.foobar_log"
   Column   |           Type           |                         Modifiers
------------+--------------------------+--------------------------------
------------+--------------------------+---------------------------
 foorbarid  | integer                  | not null default nextval('foobar_log_id_seq'::regclass)
 created_at | timestamp with time zone | not null
 foo        | character varying(50)    | not null
 bar        | character varying(16)    | not null
 chit       | integer                  | not null
 chat       | boolean                  | not null default false
Indexes:
    "bar_index" btree (bar)
    "foobarid_foobar_log_key" btree (foobarid)
    "chit_foobar_log_key" btree (chit)


The table consists of approximately 2.4 million entries.


Any help is appreciated.
Kind regards, Alex

Re: Bloated Table

From
Grzegorz Jaśkiewicz
Date:
On Wed, May 27, 2009 at 3:54 PM, Alexander Schöcke
<asc@turtle-entertainment.de> wrote:
> Hello everybody.
>
> I'm using a view (http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html) to display the bloat (unused disk
space)of the tables in a PostgreSQL database. Using this data I want to implement a database maintenance script
automaticallyexectuting a VACUUM FULL on these tables. 
> Unfortunately I am finding a table to have bloat which can't be reclaimed. I have tried VACUUM, REINDEX, VACUUM FULL
ANALYZEwith REINDEX, and even dump and restore. The view always shows 375MB of bloat for the table. 
>
> Is this normal?
yes it is normal, if you do updates on it, and it doesn't use HOT (for
instance if you update indexed column),

now, in real life you shouldn't really ever need to vacuum full.
Reindex - yes, but not vacuum full.
Make sure, that you adjust fsm_* settings in your config, best thing
is tu run vacuum analyze verbose, and see if it warns you about it
being too small.




--
GJ

Re: Bloated Table

From
Bill Moran
Date:
In response to Alexander Schöcke <asc@turtle-entertainment.de>:

> Hello everybody.
>
> I'm using a view (http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html) to display the bloat (unused disk
space)of the tables in a PostgreSQL database. Using this data I want to implement a database maintenance script
automaticallyexectuting a VACUUM FULL on these tables. 
> Unfortunately I am finding a table to have bloat which can't be reclaimed. I have tried VACUUM, REINDEX, VACUUM FULL
ANALYZEwith REINDEX, and even dump and restore. The view always shows 375MB of bloat for the table. 
>
> Is this normal? Here's the table structure:
>
>                                     Table "public.foobar_log"
>    Column   |           Type           |                         Modifiers
> ------------+--------------------------+--------------------------------
> ------------+--------------------------+---------------------------
>  foorbarid  | integer                  | not null default nextval('foobar_log_id_seq'::regclass)
>  created_at | timestamp with time zone | not null
>  foo        | character varying(50)    | not null
>  bar        | character varying(16)    | not null
>  chit       | integer                  | not null
>  chat       | boolean                  | not null default false
> Indexes:
>     "bar_index" btree (bar)
>     "foobarid_foobar_log_key" btree (foobarid)
>     "chit_foobar_log_key" btree (chit)
>
>
> The table consists of approximately 2.4 million entries.
>
> Any help is appreciated.

What is the output of VACUUM VERBOSE foobar_log?

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: Bloated Table

From
Tom Lane
Date:
=?iso-8859-1?Q?Alexander_Sch=F6cke?= <asc@turtle-entertainment.de> writes:
> I'm using a view
> (http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html) to
> display the bloat (unused disk space) of the tables in a PostgreSQL
> database.

I wouldn't trust the calculations that view does in the least.
You might look at contrib/pgstattuple if you want numbers that
have some relationship to reality (and are correspondingly more
expensive to get :-()

            regards, tom lane

Re: Bloated Table

From
Alexander Schöcke
Date:
>What is the output of VACUUM VERBOSE foobar_log?

INFO:  vacuuming "public.foobar_log"
INFO:  index "bar_index" now contains 23832007 row versions in 118151 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.02s/0.00u sec elapsed 64.10 sec.
INFO:  index "foobarid_foobar_log_key" now contains 23832007 row versions in 65347 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.02s/0.00u sec elapsed 32.60 sec.
INFO:  index "foo_foobar_log_key" now contains 23832007 row versions in 65347 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.03s/0.00u sec elapsed 25.14 sec.
INFO:  "foobar_log": found 0 removable, 23832007 nonremovable row versions in 269835 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
1 pages contain useful free space.
0 pages are entirely empty.
CPU 0.16s/0.09u sec elapsed 299.55 sec.
VACUUM

Re: Bloated Table

From
Brad Nicholson
Date:
On Wed, 2009-05-27 at 11:15 -0400, Tom Lane wrote:
> =?iso-8859-1?Q?Alexander_Sch=F6cke?= <asc@turtle-entertainment.de> writes:
> > I'm using a view
> > (http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html) to
> > display the bloat (unused disk space) of the tables in a PostgreSQL
> > database.
>
> I wouldn't trust the calculations that view does in the least.
> You might look at contrib/pgstattuple if you want numbers that
> have some relationship to reality (and are correspondingly more
> expensive to get :-()
>
>             regards, tom lane

Is the referenced query reliable for even estimating, or is it flat our
wrong?

Co-workers that were PGCon are saying that this is becoming a
popular/accepted way to check for bloated tables.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



Re: Bloated Table

From
Alvaro Herrera
Date:
Brad Nicholson wrote:
> On Wed, 2009-05-27 at 11:15 -0400, Tom Lane wrote:
> > =?iso-8859-1?Q?Alexander_Sch=F6cke?= <asc@turtle-entertainment.de> writes:
> > > I'm using a view
> > > (http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html) to
> > > display the bloat (unused disk space) of the tables in a PostgreSQL
> > > database.
> >
> > I wouldn't trust the calculations that view does in the least.
> > You might look at contrib/pgstattuple if you want numbers that
> > have some relationship to reality (and are correspondingly more
> > expensive to get :-()
>
> Is the referenced query reliable for even estimating, or is it flat our
> wrong?
>
> Co-workers that were PGCon are saying that this is becoming a
> popular/accepted way to check for bloated tables.

If "ma" is supposed to be "maxalign", then this code is broken because
it only reports mingw32 as 8, all others as 4, which is wrong.

However I think the big problem is that it relies on pg_class.relpages
and reltuples which are only accurate just after VACUUM, only a
sample-based estimate just after ANALYZE, and wrong at any other time
(assuming the table has any movement).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Bloated Table

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Brad Nicholson wrote:
>> On Wed, 2009-05-27 at 11:15 -0400, Tom Lane wrote:
>>> I wouldn't trust the calculations that view does in the least.

> If "ma" is supposed to be "maxalign", then this code is broken because
> it only reports mingw32 as 8, all others as 4, which is wrong.

I didn't bother to go through every detail of the calculations, but
there are multiple small errors there.  (Handling alignment for the
tuple header and not anyplace else is pretty pointless, for instance,
even if you had the correct alignment number for the machine.)
However ...

> However I think the big problem is that it relies on pg_class.relpages
> and reltuples which are only accurate just after VACUUM, only a
> sample-based estimate just after ANALYZE, and wrong at any other time
> (assuming the table has any movement).

That's big problem number one, and big problem number two is that it has
no good idea of the width of variable-width fields.  (Should I even
mention TOAST?)

It's an interesting exercise in trying to estimate bloat without
groveling through the whole relation, but I seriously doubt you could
ever get numbers this way that are trustworthy enough to drive
maintenance decisions.

            regards, tom lane

Re: Bloated Table

From
Alvaro Herrera
Date:
Tom Lane wrote:

> It's an interesting exercise in trying to estimate bloat without
> groveling through the whole relation, but I seriously doubt you could
> ever get numbers this way that are trustworthy enough to drive
> maintenance decisions.

Well, pg_stattuple is way too expensive to be running every minute ...

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Bloated Table

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> It's an interesting exercise in trying to estimate bloat without
>> groveling through the whole relation, but I seriously doubt you could
>> ever get numbers this way that are trustworthy enough to drive
>> maintenance decisions.

> Well, pg_stattuple is way too expensive to be running every minute ...

Agreed.  I was idly wondering whether it'd be useful to have an option
in pg_stattuple to sample some small fraction of the pages in a
relation, rather than exhaustively examining each one.  It'd be subject
to all the normal ills of sampling, of course, but I'd still trust it
way more than this view.  (Especially noting that the view is already
dependent on sampling anyway, since it's looking at pg_stats...)

            regards, tom lane

Re: Bloated Table

From
Greg Smith
Date:
On Wed, 27 May 2009, Tom Lane wrote:

> It's an interesting exercise in trying to estimate bloat without
> groveling through the whole relation

Interesting and an extremely common request.  I just added an item to the
Vacuum section of the TODO list while you were listing issues and
potential solutions here:  "Provide more information in order to improve
user-side estimates of dead space bloat in relations" links to this
thread.

As pointed out upthread, there's a fair number of people who have picked
up on this particular query for bloat estimation.  It seemed accurate
enough for gross maintenance use when I did a quick check of its results
before, but those were tables without variable widths, TOAST, etc.  This
is one of those boring tasks that DBAs really want more
monitoring-friendly visibility into.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Bloated Table

From
Christophe
Date:
On May 27, 2009, at 3:25 PM, Greg Smith wrote:
> Interesting and an extremely common request.  I just added an item
> to the Vacuum section of the TODO list while you were listing issues
> and potential solutions here:  "Provide more information in order to
> improve user-side estimates of dead space bloat in relations" links
> to this thread.

Given this, is there is a recommended way of detecting bloat in
indices automatically, rather than just looking at them once in a
while and saying, "Hm, that looks pretty big"?

Re: Bloated Table

From
Tom Lane
Date:
Christophe <xof@thebuild.com> writes:
> Given this, is there is a recommended way of detecting bloat in
> indices automatically, rather than just looking at them once in a
> while and saying, "Hm, that looks pretty big"?

contrib/pgstattuple works; it's just too expensive to run every
few minutes ...

            regards, tom lane