Re: reclaiming diskspace bloat w/near-zero downtime - Mailing list pgsql-general
From | Joshua D. Drake |
---|---|
Subject | Re: reclaiming diskspace bloat w/near-zero downtime |
Date | |
Msg-id | 41B07AD4.1000903@commandprompt.com Whole thread Raw |
In response to | reclaiming diskspace bloat w/near-zero downtime ("Ed L." <pgsql@bluepolka.net>) |
List | pgsql-general |
Ed L. wrote: >I need to reclaim the diskspace from a heavily updated 7.3.4 table which has >grown 95% bloated to 20gb of disk (indices and toast included), and I need >to reclaim it while the table continues to get updates, and without >interrupting/delaying access more than a few seconds. This is an >explanation of what I've tried so far along with some observations and a >request for other ideas to try. > >Some constraints on my efforts: Stopping the cluster for any reason is >pretty undesirable, as is vacuum full. Upgrading to 7.4.6 via slony is an >option, but I'm curious if there is an easier way. > > You need to run a vacuum. If you are lucky a normal vacuum should suffice and you will not take down the machine. Sincerely, Joshua D. Drake >I assume this bloat is basically due to the garbage generation outpacing the >garbage collection. The bloat does not appear to be in indices, so the >index bloat issue does not seem to be in play much here. The table gets >about 5% INSERTs, 95% UPDATEs, and 0% DELETEs, and UPDATEs are not changing >indexed values, so maybe that makes sense that indices are not bloated. >The UPDATEs to the table consist of appending text to a text colum (UPDATE >foo SET value = 'new text' || value). I had max_fsm_pages at 4M (for upto >32GB of disk?) and 8K (8000 tables, etc), both far more than I needed >initially, but it still bloated. [SIDEBAR: Vacuum + fsm is not working as >I expected; it is clearly not reclaiming space resulting from the UPDATEs. >If I UPDATE 10000 rows and then run vacuum, I was expecting/hoping that a >subsequent UPDATE of 9000 rows would largely reuse the space reclaimed from >the 10000-row UPDATE.] > >Anyway, I need to reclaim the space without stopping the cluster and without >blocking access to the table for more than a few seconds. I seem to have >found a way to do that for any table ('foo') that doesn't have fkey/func >dependencies: > > create table foo_slim as exact schema of foo > begin > alter table foo rename foo_bloated > create view foo as select from foo_slim union all foo_bloated > create rules: > on insert to foo: insert into foo_slim > on delete to foo: delete from foo_bloated or foo_slim > on update to foo: if in foo_bloated, > insert new values into foo_slim > delete from foo_bloated > commit; > >Once that's done, then the data begins to trickle into the new, slim table, >and the xfer can be speeded up by doing no-op updates on the foo view or a >select-for-update function on the foo_bloated table. Once all the data has >migrated over to foo_slim, the initial state can be restored with: > > begin > drop view foo > alter table foo_slim rename to foo > commit; > >and then foo_bloated can be dropped. That seems to work well enough. > >But how to do it on a table with foreign keys and triggers, etc? I was >wondering if I could use the same basic approach but manually reconstitute >the oid linkages so that the triggers and functions stayed intact even >while renaming/dropping/etc, but sounds a little dicey. Any other ideas? > >Thanks, >Ed > > >---------------------------(end of broadcast)--------------------------- >TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
Attachment
pgsql-general by date: