tool for incrementally shrinking bloated tables - Mailing list pgsql-general

From Paul Tillotson
Subject tool for incrementally shrinking bloated tables
Date
Msg-id 41C8BB43.5020501@shentel.net
Whole thread Raw
Responses Re: tool for incrementally shrinking bloated tables
Re: tool for incrementally shrinking bloated tables
List pgsql-general
(Originally posted on -hackers, but they were too busy or not interested.)

I am thinking about attempting to code this if it is not too difficult
for a newbie, but I wanted to see if anyone has any input or ideas first.

Goal: on a prduction server, to gradually shrink a table (no matter how
large) back to < 10% free space without "noticeably" interrupting write
access to it.  ("noticeably" = without taking any exclusive locks for
more than a few seconds at a time.)

To accomplish this, tuples need to be moved into free space in the
beginning of the table, and the table must be shrunk using ftruncate().

It seems that I could make these two changes:

(a) Modifying the VACUUM command to take an option that means "pack the
free space map with the pages that occur earliest in
the table rather than the pages with the most free space."

(b) Create a command that will take an exclusive lock, scan a table
backwards until it comes to a tuple that cannot be removed (i.e., a
tuple that is not HEAPTUPLE_DEAD (see scan_heap() in
src/backend/commands/vacuum.c)) or until some preset amount of time has
elapsed, and then ftruncate() the table.

To use this system one would do this:

(1) VACUUM KEEP_EARLY_FREE_PAGES mybloatedtable; -- item (a)

(2) UPDATE mybloatedtable SET foo = foo WHERE ctid > '(nnnnn, 0)';
--move tuples in end of the table to the front.

(3) SHRINK TABLE mybloatedtable; -- item (b)

Then repeat as many times as necessary to accomplish the desired shrinking.

Comments?  Am I missing some obvious way of accomplishing this goal?  Is
anyone working on something like this?

Paul Tillotson



P. S. Possible snags that I have thought of include:

- I don't know for sure that UPDATE will use the free space map (will it
put the new tuple right back in the same page if there is room?)

- There is currently no < or > operator for tid's, so WHERE ctid >
'(nnnn, 0)' doesn't work as mentioned.

pgsql-general by date:

Previous
From: "Frank D. Engel, Jr."
Date:
Subject: Re: Need help to organize database
Next
From: "Vladimir S. Petukhov"
Date:
Subject: Re: Need help to organize database