On Wednesday 19 Feb 2003 9:05 pm, you wrote:
> Interestingly this could be used to create a speedy vacuum - that is,
> create a new table with a temporary name that is invisible to the
> database (similar to dropped columns), then taking into account the disk
> space left on the device, pick the last X pages from the old table and
> write to the new table. Then truncate the file containing the table at
> point X and repeat until finished. Finally kill the old table and make
> the new one visible. I appreciate there may be a couple of issues with
> oids/foreign keys but it sounds like a great solution to me! Why does
> vacuum bother with reordering rows? I thought that was what the CLUSTER
> command was for? Any developers care to comment on this? I imagine there
> must be a technical reason (prob to do with MVCC) as to why this hasn't
> been done before?
Well, One thing I can think of is the extra space required. The algo. looks
good but it would be very difficult to make sure that it works all the time
especially given that postgresql does not have sophisticated and/or tunable
storage handling( think of tablespaces ).
It is always space-time trade-off. On one hand we have vacuum which uses a
constant and may be negiliible space but takes time proportional to amount of
work. On other hand we have drop/recreate table which takes double the space
but is extremely fast i.e. proportinal to data size at max. I/O bandwidth
available..
It would be good if there is in between. Of course it would not be easy to do
it. But it has to start, isn't it?..:-)
> In fact, my colleague has just done a test with SELECT..INTO on our dev
> version and it compacted 600Mb -> 400Mb in just 40s(!). We then did a
> vacuum full on the same original 600Mb table which is still going after
> 20mins. Difficult choice! So even in a worse case scenario we could have
> a fully vacuumed table within a day.... we're looking at dropping some
> indexes in the db to reclaim enough space to be able to fit another copy
> of the table on the disk... this is looking very tempting at the
> moment....
I recommend this strategy of "vacuuming" be documented in standard
documentation and FAQ. Given that postgresql is routinely deployed for
databases >10GB which is greater than small/medium by any definition today, I
think this will be a good move.
Furthermore this strategy reduces the down time due to vacuum full locks
drastically. I would say it is worth buying a 80GB IDE disk for this purpose
if you have this big database..
Nice to see that my idea helped somebody..:-)
Shridhar