Re: reclaiming diskspace bloat w/near-zero downtime - Mailing list pgsql-general

From Joshua D. Drake
Subject Re: reclaiming diskspace bloat w/near-zero downtime
Date
Msg-id 41B07AD4.1000903@commandprompt.com
Whole thread Raw
In response to reclaiming diskspace bloat w/near-zero downtime  ("Ed L." <pgsql@bluepolka.net>)
List pgsql-general
Ed L. wrote:

>I need to reclaim the diskspace from a heavily updated 7.3.4 table which has
>grown 95% bloated to 20gb of disk (indices and toast included), and I need
>to reclaim it while the table continues to get updates, and without
>interrupting/delaying access more than a few seconds.  This is an
>explanation of what I've tried so far along with some observations and a
>request for other ideas to try.
>
>Some constraints on my efforts:  Stopping the cluster for any reason is
>pretty undesirable, as is vacuum full.  Upgrading to 7.4.6 via slony is an
>option, but I'm curious if there is an easier way.
>
>

You need to run a vacuum. If you are lucky a normal vacuum should
suffice and you will not take down the machine.

Sincerely,

Joshua D. Drake


>I assume this bloat is basically due to the garbage generation outpacing the
>garbage collection.  The bloat does not appear to be in indices, so the
>index bloat issue does not seem to be in play much here.  The table gets
>about 5% INSERTs, 95% UPDATEs, and 0% DELETEs, and UPDATEs are not changing
>indexed values, so maybe that makes sense that indices are not bloated.
>The UPDATEs to the table consist of appending text to a text colum (UPDATE
>foo SET value = 'new text' || value).  I had max_fsm_pages at 4M (for upto
>32GB of disk?) and 8K (8000 tables, etc), both far more than I needed
>initially, but it still bloated. [SIDEBAR:  Vacuum + fsm is not working as
>I expected; it is clearly not reclaiming space resulting from the UPDATEs.
>If I UPDATE 10000 rows and then run vacuum, I was expecting/hoping that a
>subsequent UPDATE of 9000 rows would largely reuse the space reclaimed from
>the 10000-row UPDATE.]
>
>Anyway, I need to reclaim the space without stopping the cluster and without
>blocking access to the table for more than a few seconds.  I seem to have
>found a way to do that for any table ('foo') that doesn't have fkey/func
>dependencies:
>
>    create table foo_slim as exact schema of foo
>    begin
>        alter table foo rename foo_bloated
>        create view foo as select from foo_slim union all foo_bloated
>        create rules:
>            on insert to foo:  insert into foo_slim
>            on delete to foo:  delete from foo_bloated or foo_slim
>            on update to foo:  if in foo_bloated,
>                insert new values into foo_slim
>                delete from foo_bloated
>    commit;
>
>Once that's done, then the data begins to trickle into the new, slim table,
>and the xfer can be speeded up by doing no-op updates on the foo view or a
>select-for-update function on the foo_bloated table.  Once all the data has
>migrated over to foo_slim, the initial state can be restored with:
>
>    begin
>        drop view foo
>        alter table foo_slim rename to foo
>    commit;
>
>and then foo_bloated can be dropped.  That seems to work well enough.
>
>But how to do it on a table with foreign keys and triggers, etc?  I was
>wondering if I could use the same basic approach but manually reconstitute
>the oid linkages so that the triggers and functions stayed intact even
>while renaming/dropping/etc, but sounds a little dicey.  Any other ideas?
>
>Thanks,
>Ed
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match
>
>


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL


Attachment

pgsql-general by date:

Previous
From: "Ian Harding"
Date:
Subject: Re: relation does not exist error
Next
From: Alvaro Herrera
Date:
Subject: Re: table inheritance and DB design