John, et al,
We too have an interest in reclustering large tables, but in our
case most of the transactions are spread throughout the table
(though in some cases not uniformly).
I have been pondering a program that selects all the rows in
the table in cluster order and then, as a single transaction,
deletes a block-full of rows and then re-inserts them. The program
would then move on to the next block-full and repeat the operation.
Note that if there are triggers on the table, this may have unintended
side-effects.
This would require having a pretty clear idea of the space required
for each row, and would probably require frequent vacuums during
the process. If there were a way to tell the block address of each
row, I suppose you could leave some rows where they are. In the
end, you might end up with the same space requirements (a full
copy as workspace), but I'm not sure.
Depending on the data, it may be possible to add new rows to the
table while this process is going on. Any new rows added by other
processes will certainly not be in order, and may interfere with
new rows being added in a contiguous fashion (I'm not sure of the
allocation algorithm used by PG).
Thoughts? Comments?
Ray
On Mon, Jan 05, 2004 at 01:54:16PM -0500, John Siracusa wrote:
> On 1/4/04 6:24 PM, Christopher Browne wrote:
> > The cluster operation potentially has to reorder all the tuples, and
> > the fact that the table is already _partially_ organized only
> > diminishes the potential. If the new data, generally added "at the
> > end," has values that are fairly uniformly distributed across the
> > index, then the operation really will have to reorder all of the
> > tuples...
>
> What about the special case of a table that is clustered on a column and all
> subsequent inserts will add rows with ever-increasing values of that column?
> This would be the case for creation dates or even a column created from a
> sequence. Basically, after clustering, it would be nice if you could tell
> the system to "only add to the end" and to "add in clustered order."
>
> Programming for special cases is annoying, but sometimes it really helps.
>
> -John
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
----------------------------------------------------------------------
Ray Ontko rayo@ontko.com Phone 1.765.935.4283 Fax 1.765.962.9788
Ray Ontko & Co. Software Consulting Services http://www.ontko.com/