tool for incrementally shrinking bloated tables - Mailing list pgsql-hackers

From Paul Tillotson
Subject tool for incrementally shrinking bloated tables
Date
Msg-id 41C7502F.7020905@shentel.net
Whole thread Raw
List pgsql-hackers
Goal: on a prduction server, to gradually shrink a table (no matter how 
large) back to < 10% free space without "noticeably" interrupting write 
access to it.  ("noticeably" = without taking any exclusive locks for 
more than a few seconds at a time.)

I am thinking about making this if it proves to be not to difficult.

To accomplish this, tuples need to be moved into free space in the 
beginning of the table, and the table must be shrunk using ftruncate().

It seems that I could make these two changes:

(a) Modifying the VACUUM command to take an option that means "pack the 
free space map with the pages that occur earliest in
the table rather than the pages with the most free space."

(b) Create a command that will take an exclusive lock, scan a table 
backwards until it comes to a tuple that cannot be removed (i.e., a 
tuple that is not HEAPTUPLE_DEAD (see scan_heap() in 
src/backend/commands/vacuum.c)) or until some preset amount of time has 
elapsed, and then ftruncate() the table.

To use this system one would do this:

(1) VACUUM KEEP_EARLY_FREE_PAGES mybloatedtable;               -- use 
item (a) discussed above
(2) UPDATE mybloatedtable SET foo = foo WHERE ctid > '(nnnnn, 0)';  -- 
move tuples in end of the table to the front.
(3) SHRINK TABLE 
mybloatedtable;                                                          
-- use item (b) discussed above

Then repeat as many times as necessary to accomplish the desired shrinking.

In defense of the need for this tool:  Although this is usually 
preventable by proper vacuuming and FSM configuration, often on the list 
I see people say that they have a "huge" multi-gigabyte table that is 
using up all their drive space, but they cannot afford the interruption 
that VACUUM FULL would entail.  Also, certain maintenance operations 
(e.g., adding a column and populating it within a transaction) can 
double the on-disk size of a table, not to mention user error such as 
running an unconstrained UPDATE command inside a transaction and then 
rolling it back.

Comments?  Am I missing some obvious way of accomplishing this goal?  Is 
anyone working on something like this?

Paul Tillotson


pgsql-hackers by date:

Previous
From: "Andrew Dunstan"
Date:
Subject: Re: Heads up: RC2 this evening
Next
From: Tom Lane
Date:
Subject: Re: Heads up: RC2 this evening