Thread: reclaiming diskspace bloat w/near-zero downtime

reclaiming diskspace bloat w/near-zero downtime

From

"Ed L."

Date:

03 December 2004, 06:06:17

I need to reclaim the diskspace from a heavily updated 7.3.4 table which has
grown 95% bloated to 20gb of disk (indices and toast included), and I need
to reclaim it while the table continues to get updates, and without
interrupting/delaying access more than a few seconds.  This is an
explanation of what I've tried so far along with some observations and a
request for other ideas to try.

Some constraints on my efforts:  Stopping the cluster for any reason is
pretty undesirable, as is vacuum full.  Upgrading to 7.4.6 via slony is an
option, but I'm curious if there is an easier way.

I assume this bloat is basically due to the garbage generation outpacing the
garbage collection.  The bloat does not appear to be in indices, so the
index bloat issue does not seem to be in play much here.  The table gets
about 5% INSERTs, 95% UPDATEs, and 0% DELETEs, and UPDATEs are not changing
indexed values, so maybe that makes sense that indices are not bloated.
The UPDATEs to the table consist of appending text to a text colum (UPDATE
foo SET value = 'new text' || value).  I had max_fsm_pages at 4M (for upto
32GB of disk?) and 8K (8000 tables, etc), both far more than I needed
initially, but it still bloated. [SIDEBAR:  Vacuum + fsm is not working as
I expected; it is clearly not reclaiming space resulting from the UPDATEs.
If I UPDATE 10000 rows and then run vacuum, I was expecting/hoping that a
subsequent UPDATE of 9000 rows would largely reuse the space reclaimed from
the 10000-row UPDATE.]

Anyway, I need to reclaim the space without stopping the cluster and without
blocking access to the table for more than a few seconds.  I seem to have
found a way to do that for any table ('foo') that doesn't have fkey/func
dependencies:

    create table foo_slim as exact schema of foo
    begin
        alter table foo rename foo_bloated
        create view foo as select from foo_slim union all foo_bloated
        create rules:
            on insert to foo:  insert into foo_slim
            on delete to foo:  delete from foo_bloated or foo_slim
            on update to foo:  if in foo_bloated,
                insert new values into foo_slim
                delete from foo_bloated
    commit;

Once that's done, then the data begins to trickle into the new, slim table,
and the xfer can be speeded up by doing no-op updates on the foo view or a
select-for-update function on the foo_bloated table.  Once all the data has
migrated over to foo_slim, the initial state can be restored with:

    begin
        drop view foo
        alter table foo_slim rename to foo
    commit;

and then foo_bloated can be dropped.  That seems to work well enough.

But how to do it on a table with foreign keys and triggers, etc?  I was
wondering if I could use the same basic approach but manually reconstitute
the oid linkages so that the triggers and functions stayed intact even
while renaming/dropping/etc, but sounds a little dicey.  Any other ideas?

Thanks,
Ed

Re: reclaiming diskspace bloat w/near-zero downtime

From

Martijn van Oosterhout

Date:

03 December 2004, 08:55:58

I can't help you with your general enquiry, but some comments:

On Thu, Dec 02, 2004 at 11:06:08PM -0700, Ed L. wrote:
> If I UPDATE 10000 rows and then run vacuum, I was expecting/hoping that a
> subsequent UPDATE of 9000 rows would largely reuse the space reclaimed from
> the 10000-row UPDATE.]

For this, check the FSM settings. It can only keep track of a limited
number of pages. So if your tuples are large it may not be able to
track it all...

The FSM should be set big enough to cover all the space that might be
updated/deleted between two VACUUMs. So it might make to set the FSM to
be quite large and vacuum (not FULL) often. These should bring the
growth to a halt if the table really is quite empty.

As for the foreign key issue, the basic problem is that the functions
reference the OIDs of the tables. So it wil track the tables, even
through renames. I don't have an answer for you here...

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

msg-32091-88881.dat

Re: reclaiming diskspace bloat w/near-zero downtime

From

"Joshua D. Drake"

Date:

03 December 2004, 14:41:27

Ed L. wrote:

>I need to reclaim the diskspace from a heavily updated 7.3.4 table which has
>grown 95% bloated to 20gb of disk (indices and toast included), and I need
>to reclaim it while the table continues to get updates, and without
>interrupting/delaying access more than a few seconds.  This is an
>explanation of what I've tried so far along with some observations and a
>request for other ideas to try.
>
>Some constraints on my efforts:  Stopping the cluster for any reason is
>pretty undesirable, as is vacuum full.  Upgrading to 7.4.6 via slony is an
>option, but I'm curious if there is an easier way.
>
>

You need to run a vacuum. If you are lucky a normal vacuum should
suffice and you will not take down the machine.

Sincerely,

Joshua D. Drake


>I assume this bloat is basically due to the garbage generation outpacing the
>garbage collection.  The bloat does not appear to be in indices, so the
>index bloat issue does not seem to be in play much here.  The table gets
>about 5% INSERTs, 95% UPDATEs, and 0% DELETEs, and UPDATEs are not changing
>indexed values, so maybe that makes sense that indices are not bloated.
>The UPDATEs to the table consist of appending text to a text colum (UPDATE
>foo SET value = 'new text' || value).  I had max_fsm_pages at 4M (for upto
>32GB of disk?) and 8K (8000 tables, etc), both far more than I needed
>initially, but it still bloated. [SIDEBAR:  Vacuum + fsm is not working as
>I expected; it is clearly not reclaiming space resulting from the UPDATEs.
>If I UPDATE 10000 rows and then run vacuum, I was expecting/hoping that a
>subsequent UPDATE of 9000 rows would largely reuse the space reclaimed from
>the 10000-row UPDATE.]
>
>Anyway, I need to reclaim the space without stopping the cluster and without
>blocking access to the table for more than a few seconds.  I seem to have
>found a way to do that for any table ('foo') that doesn't have fkey/func
>dependencies:
>
>    create table foo_slim as exact schema of foo
>    begin
>        alter table foo rename foo_bloated
>        create view foo as select from foo_slim union all foo_bloated
>        create rules:
>            on insert to foo:  insert into foo_slim
>            on delete to foo:  delete from foo_bloated or foo_slim
>            on update to foo:  if in foo_bloated,
>                insert new values into foo_slim
>                delete from foo_bloated
>    commit;
>
>Once that's done, then the data begins to trickle into the new, slim table,
>and the xfer can be speeded up by doing no-op updates on the foo view or a
>select-for-update function on the foo_bloated table.  Once all the data has
>migrated over to foo_slim, the initial state can be restored with:
>
>    begin
>        drop view foo
>        alter table foo_slim rename to foo
>    commit;
>
>and then foo_bloated can be dropped.  That seems to work well enough.
>
>But how to do it on a table with foreign keys and triggers, etc?  I was
>wondering if I could use the same basic approach but manually reconstitute
>the oid linkages so that the triggers and functions stayed intact even
>while renaming/dropping/etc, but sounds a little dicey.  Any other ideas?
>
>Thanks,
>Ed
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match
>
>


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL

Attachment

jd.vcf

Re: reclaiming diskspace bloat w/near-zero downtime

From

"Ed L."

Date:

03 December 2004, 16:05:56

On Friday December 3 2004 1:54, Martijn van Oosterhout wrote:
>
> For this, check the FSM settings. It can only keep track of a limited
> number of pages. So if your tuples are large it may not be able to
> track it all...
>
> The FSM should be set big enough to cover all the space that might be
> updated/deleted between two VACUUMs. So it might make to set the FSM to
> be quite large and vacuum (not FULL) often. These should bring the
> growth to a halt if the table really is quite empty.

Our standard approach is to set max_fsm_pages to about 2x the largest
on-disk footprint we expect from the PGDATA directory, and to set
max_fsm_relations to 4x the sum of tables + indices of all databases (to
allow for a few duplicate ad hoc databases).  This bloated cluster had
max_fsm_pages = 4M.  I assume the relevant pagesize is the default
postgresql pagesize of 8K, so 4M would handle 32GB of data?  We have
autovac running and logging its actions to file, and it has been doing what
I would have thought would be overkill vacuuming/analyzing.

Ed

Re: reclaiming diskspace bloat w/near-zero downtime

From

Tom Lane

Date:

03 December 2004, 16:10:03

"Ed L." <pgsql@bluepolka.net> writes:
> [SIDEBAR:  Vacuum + fsm is not working as
> I expected; it is clearly not reclaiming space resulting from the UPDATEs.
> If I UPDATE 10000 rows and then run vacuum, I was expecting/hoping that a
> subsequent UPDATE of 9000 rows would largely reuse the space reclaimed from
> the 10000-row UPDATE.]

VACUUM can't reclaim rows until the last transaction that could
potentially see those rows is gone.  I'd venture that your above
disappointment occurs because there is some other transaction staying
open across the update/vacuum sequence.

As for the problem at hand, I think it's insoluble under the constraints
you've set yourself.  Bite the bullet and schedule some downtime.  It'd
probably be worth trying CLUSTER rather than VACUUM FULL to compact the
table (I *think* CLUSTER is okay in 7.3.* but don't recall for sure...)

            regards, tom lane

Re: reclaiming diskspace bloat w/near-zero downtime

From

"Ed L."

Date:

03 December 2004, 16:26:06

On Friday December 3 2004 9:09, Tom Lane wrote:
> "Ed L." <pgsql@bluepolka.net> writes:
> > [SIDEBAR:  Vacuum + fsm is not working as
> > I expected; it is clearly not reclaiming space resulting from the
> > UPDATEs. If I UPDATE 10000 rows and then run vacuum, I was
> > expecting/hoping that a subsequent UPDATE of 9000 rows would largely
> > reuse the space reclaimed from the 10000-row UPDATE.]
>
> VACUUM can't reclaim rows until the last transaction that could
> potentially see those rows is gone.  I'd venture that your above
> disappointment occurs because there is some other transaction staying
> open across the update/vacuum sequence.

That sounds very possible; these systems are under heavy load most all the
time.  Anytime vacuum starts, there are almost certainly other transactions
in process.  Wondering how to confirm/report this?  I know how to grab a
list of backends and their current queries, but is there a query to grab a
snapshot of open transactions for before and after comparisons?

I assume upgrading to 7.4.6 would not help this particular issue if that's
what's going on?

Sounds like 24x7x365 operations on a single cluster is maybe too tall of an
order under these loads.  Maybe time for slony & friends for maintenance
switchovers.

Ed

Re: reclaiming diskspace bloat w/near-zero downtime

From

"Ed L."

Date:

04 December 2004, 20:56:03

On Saturday December 4 2004 1:01, Ed L. wrote:
>
> ...I have a
> shell game using rules that goes like this for bloated table foo:
> ...
> begin
>     drop view foo
>     alter table slim_foo rename to foo
> commit
>
> Some basic testing for my particulars suggests this works, but anyone see
> any issues with this?

I guess this is not safe in 7.3.4. :( Dropping the view within a transaction
causes the following error if a triggered function is concurrently
accessing that view:

WARNING:  Error occurred while executing PL/pgSQL function f
WARNING:  line 16 at SQL statement
ERROR:  RelationClearRelation: relation 7372083 deleted while still in use

Ed

Re: reclaiming diskspace bloat w/near-zero downtime

From

Richard Ellis

Date:

06 December 2004, 19:39:18

On Fri, Dec 03, 2004 at 09:24:48AM -0700, Ed L. wrote:
> Sounds like 24x7x365 operations on a single cluster is maybe too
> tall of an order under these loads.  Maybe time for slony & friends
> for maintenance switchovers.

24x7x365 was never the true operation point.  All systems have/need
regularly scheduled downtime for service/maintence.  Even systems
that claimed "five nines" reliability did not include their regular
scheduled downtime in the "five nines" calculation.

http://www.bcr.com/bcrmag/2002/05/p22.php
See the "Myth vs. Reality" section.