Re: Massive table bloat - Mailing list pgsql-admin

From Steve Crawford
Subject Re: Massive table bloat
Date
Msg-id 50C78B06.9000606@pinpointresearch.com
Whole thread Raw
In response to Massive table bloat  (Michael Sawyers <msawyers@iii.com>)
Responses Re: Massive table bloat  (Michael Sawyers <msawyers@iii.com>)
List pgsql-admin
On 12/11/2012 08:11 AM, Michael Sawyers wrote:
> Our dba quit last week leaving me with an interesting problem.
>
> We have a table currently using 33gb worth of space for only 152mb worth of
> data because of bad processes or autovacuum not being aggressive enough. I
> was able to confirm the size difference by doing a create table as select
> along with a second test of restoring the table from the dump file to a dev
> machine.
>
>
>
> There is a very large list of foreign key relationships that I'm not
> including for the sake of brevity.
>
> The database version is 8.4.1
>
> The old DBA had said that vacuum full would take days to complete, and we
> don't have that much of a window.  So I was considering using the  to force
> a full table rewrite.  In testing on a dev machine it only took about five
> minutes.
>
> I do not have as much hands on experience with postgres so I wanted to get
> thoughts on what is considered the proper way to deal with this kind of
> situation.
>
> Any comments would be welcome.
It's true that vacuum-full has historically been very slow as it
rewrites data in-place but you can use CLUSTER which is *WAY* faster
since it rewrites the data to a new file which it then drops back in
place. CLUSTER also gives you new clean indexes. In fact the current
versions of PostgreSQL basically use the CLUSTER process to handle
VACUUM FULL. (I just don't recall which version applied that change.)

How long did your test copy/dump/restore steps take? This will give you
a *rough* idea of the time that CLUSTER will take.

Really the only downside of CLUSTER is the requirement that you have
enough available space to write the new copy of the table/indexes.

Another option if you can't tolerate downtime is pg_reorg
(http://pgfoundry.org/projects/reorg/) or its fork pg_repack
(https://github.com/reorg/pg_repack) which essentially do the copy to a
new table in the background then briefly lock the tables to apply any
recent changes from the master before dropping the copy into place.

Cheers,
Steve



pgsql-admin by date:

Previous
From: Craig James
Date:
Subject: Re: Massive table bloat
Next
From: Michael Sawyers
Date:
Subject: Re: Massive table bloat