Thread: 7.3.1 takes long time to vacuum table?

7.3.1 takes long time to vacuum table?

From
"Mark Cave-Ayland"
Date:
Hi everyone,

Does anyone know of any issues with 7.3.1 which would cause it to take a
long time to do a vacuum full? We have a resource table consisting of
about 70M records and we have recently performed an update query on the
entire table and now I'm trying to do a vacuum full to reclaim back the
disk space. So far on a dual PIII 1.4GHz machine with hardware RAID5 and
2GB RAM, the vacuum full verbose is still running after 30 hours!

The reason I am concerned is that before performing the update, the
vacuum full would normally take about 12 hours and that was with a
trigger and 5 indexes on the table. This time, before initiating the
vacuum, all the indexes were dropped, and a single update performed on 1
field over the entire table. I understand that postgres has to compact
the valid tuples down to the front of the file after removing the
previous ones, but should it really take this long on such a powerful
machine? Or have I made a gross error somewhere in the configuration?

When I first configured the database, vacuuming the table took nearly 20
hours, but thanks to a post in the archives, I found some suggestions to
increase vacuum mem & FSM size and reduce the number of buffers which
got it down to its normal time of about 12 hours :) Here are the
settings I changed from the defaults based on that post:

Max_fsm_relations = 50000
Max_fsm_pages = 5000000
Vacuum_mem = 65535
Fsync = false

I have also set shmmax to 800Mb just to give things some breathing
space. One thing I have noticed is that the postmaster process running
the vacuum has now reached 1Gb of memory and looks like it is beginning
to touch swap(!) which is going to slow things even more. Can anyone
help me out and reduce the time it takes to do this vacuum?


Cheers,

Mark.


---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.


Re: 7.3.1 takes long time to vacuum table?

From
"Mark Cave-Ayland"
Date:
> -----Original Message-----
> From: Martijn van Oosterhout [mailto:kleptog@svana.org]
> Sent: 19 February 2003 02:14
> To: Mark Cave-Ayland
> Cc: PostgreSQL General
> Subject: Re: [GENERAL] 7.3.1 takes long time to vacuum table?

Hi Martijn,

Thanks for taking the time to look at this one.

> Ok, this is a hard one. Doing a vacuum full is very hard on the disk
> cache.
> It basically copies a lot of tuples around. In your case it's going to
be
> copying every tuple from somewhere near the end of the table to
somewhere
> near the beginning. This makes the pattern of disk access something
like:
>
> seek, read, seek, write, seek, read, seek, write, ...
>
> Which, not surprisingly, sucks (especially from RAID5 I think, but I'm
not
> sure about that). Note this varies a bit between versions of Unix,
> postgresql and your C library.

Hmmm.... so the news is not good then :( We timed an insert of about
800,000 records into the DB under 7.2.1 and that was just over 10mins...
so extrapolating that out it would take at least 15 hours just to write
the direct 70M records to disk without the seeking.... *sigh*

> My only advice is that you can use strace to work out approximatly
where
> it's up to. Use /proc/ to work out which file descriptor is the table
> you're
> working with and then strace the backend (the -p option) to work out
which
> part it is reading from. It'll look like:
>
> seek(<fd>,<offset>,SEEK_SET)    = <offset>
> read(<fd>,"lots of rubbish", 8192) = 8192
>
> It's the offset you want, it may jump around a bit but it should be
> increasing on the whole. If your table is split into multiple files
> because
> it's over 1GB, take this into account when working out how far it's
in.

Ok, I've managed to do that.... offset just seems to keep jumping around
though, not much of a specific pattern... although it does seem to be
reading from 2 separate files :( But what IS interesting is that between
each seek(), postgres does 2 reads of 8k (which I guess is a page) and
then 4 writes! This I don't understand? Surely given the memory
parameters then it should read as many pages into memory as possible,
sort them, then seek back and write them? What appears to be happening
is that it is only one or two pages are being moved at a time which
seems really inefficient.

Or is the assumption here that by limiting the pages being moved around,
more memory can be given to the OS so it can cache aggressively? I've
uploaded the strace file to
http://www.infomapper.com/strace/strace.log.txt if you (or anyone else)
would be interested in taking a look - I logged about 700ks worth.

> This is probably a good argument to have VACUUM emit a notice every 10
> minutes or so giving some indication of its progress. I don't know how
> hard
> this would be.

The vacuum has emitted a line of statistics within the first few hours
which reads the following:

INFO:  --Relation public.res--
INFO:  Pages 9167290: Changed 5366493, reaped 6000857, Empty 0, New 0;
Tup 72475389: Vac 105298234, Keep/VTL 0/0, UnUsed 221144, MinLen 72,
MaxLen 727; Re-using: Free/Avail. Space 48478844644/48455035400;
EndEmpty/Avail. Pages 0/8458521.
        CPU 733.62s/151.22u sec elapsed 4656.51 sec.

But now it's just sat there churning away.... I guess the above was the
result of marking which tuples were to be kept and now I'm guessing its
in the process of moving data around. So yes, some additional
notification during this phase would be very useful for large tables
like this.

> > I have also set shmmax to 800Mb just to give things some breathing
> > space. One thing I have noticed is that the postmaster process
running
> > the vacuum has now reached 1Gb of memory and looks like it is
beginning
> > to touch swap(!) which is going to slow things even more. Can anyone
> > help me out and reduce the time it takes to do this vacuum?
>
> The only thing I can suggest is that SELECT * INTO newtables FROM
table;
> may
> have been faster, though it shouldn't be.

My instinct would suggest this would be faster, given that the writing
would be guaranteed sequential, and a single sequential scan on the
table. The problem with this is that we don't have enough disk space to
maintain 2 copies of the table on one device at the moment as the table
takes about 40Gb space.

BTW the vacuum is of course still going as we enter the 50hr stage....


Cheers,

Mark.


---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

Re: 7.3.1 takes long time to vacuum table?

From
"Mark Cave-Ayland"
Date:
> -----Original Message-----
> From: Peter Childs [mailto:Blue.Dragon@blueyonder.co.uk]
> Sent: 19 February 2003 06:05
> To: PostgreSQL General
> Subject: Re: [GENERAL] 7.3.1 takes long time to vacuum table?
>


> > This is probably a good argument to have VACUUM emit a notice every
10
> > minutes or so giving some indication of its progress. I don't know
how
> hard
> > this would be.
> >


>     Try Vacuum Verbose; It does not print a message out every 10
minites
> but it
> should tell you that somthing is happerning, oh and whats happerning
if
> you
> can make any sence of the messages.....
>
> Peter Childs

Hi Peter,

I am running vacuum verbose, however after displaying an information
line a few hours in, it won't output any information now until the
vacuum is complete as I guess it's now just moving pages around.... :(


Cheers,

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

Re: 7.3.1 takes long time to vacuum table?

From
Martijn van Oosterhout
Date:
On Wed, Feb 19, 2003 at 11:03:12AM -0000, Mark Cave-Ayland wrote:
> Ok, I've managed to do that.... offset just seems to keep jumping around
> though, not much of a specific pattern... although it does seem to be
> reading from 2 separate files :( But what IS interesting is that between
> each seek(), postgres does 2 reads of 8k (which I guess is a page) and
> then 4 writes! This I don't understand? Surely given the memory
> parameters then it should read as many pages into memory as possible,
> sort them, then seek back and write them? What appears to be happening
> is that it is only one or two pages are being moved at a time which
> seems really inefficient.

Hmm, odd. The offset is going *backwards*. It's starting at 981934080 and
made it down to 969719808. It looks like it's reading each page. I don't
think I need to tell you that from a caching point of view, it's not probably
not as good as going forward.

> Or is the assumption here that by limiting the pages being moved around,
> more memory can be given to the OS so it can cache aggressively? I've
> uploaded the strace file to
> http://www.infomapper.com/strace/strace.log.txt if you (or anyone else)
> would be interested in taking a look - I logged about 700ks worth.

How long did it take to get that trace? Also, what are file descriptors 58,
97 and 114?

> The vacuum has emitted a line of statistics within the first few hours
> which reads the following:
>
> INFO:  --Relation public.res--
> INFO:  Pages 9167290: Changed 5366493, reaped 6000857, Empty 0, New 0;
> Tup 72475389: Vac 105298234, Keep/VTL 0/0, UnUsed 221144, MinLen 72,
> MaxLen 727; Re-using: Free/Avail. Space 48478844644/48455035400;
> EndEmpty/Avail. Pages 0/8458521.
>         CPU 733.62s/151.22u sec elapsed 4656.51 sec.

Ok, that means it has finished the vacuum stage (since that's printed at the
end of scan_heap). It's now going through the heap compacting. From that
output it appears that your table is around 75Gig (9 million 8K pages)!
You're into the repair_frag stage where indeed it scans through the table
backwards.

While I can see the argument for going backwards, from a caching perspective
I think it's terrible. Especially in the case where the entire table has
been replaced, the entire exercise becomes a very expensive copy operation.

> But now it's just sat there churning away.... I guess the above was the
> result of marking which tuples were to be kept and now I'm guessing its
> in the process of moving data around. So yes, some additional
> notification during this phase would be very useful for large tables
> like this.

Let me know what those file descriptors point to and we can probably work
out how far along it is.

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Support bacteria! They're the only culture some people have.

Attachment

Re: 7.3.1 takes long time to vacuum table?

From
"Mark Cave-Ayland"
Date:
Hi Martijn,

Thanks again for your reply.

> -----Original Message-----
> From: Martijn van Oosterhout [mailto:kleptog@svana.org]
> Sent: 19 February 2003 12:11
> To: Mark Cave-Ayland
> Cc: PostgreSQL General
> Subject: Re: [GENERAL] 7.3.1 takes long time to vacuum table?
>
> On Wed, Feb 19, 2003 at 11:03:12AM -0000, Mark Cave-Ayland wrote:
> > Ok, I've managed to do that.... offset just seems to keep jumping
around
> > though, not much of a specific pattern... although it does seem to
be
> > reading from 2 separate files :( But what IS interesting is that
between
> > each seek(), postgres does 2 reads of 8k (which I guess is a page)
and
> > then 4 writes! This I don't understand? Surely given the memory
> > parameters then it should read as many pages into memory as
possible,
> > sort them, then seek back and write them? What appears to be
happening
> > is that it is only one or two pages are being moved at a time which
> > seems really inefficient.
>
> Hmm, odd. The offset is going *backwards*. It's starting at 981934080
and
> made it down to 969719808. It looks like it's reading each page. I
don't
> think I need to tell you that from a caching point of view, it's not
> probably
> not as good as going forward.

Backwards? Ouch I guess that will break most caching strategies! But as
I said before, do you know why it only appears to be moving one or two
pages at a time before seeking again???

> How long did it take to get that trace? Also, what are file
descriptors
> 58,
> 97 and 114?

The trace lasted about a couple of minutes. I've listed the filesystems
from /proc and uploaded them to
http://www.infomapper.com/strace/fd.log.txt for you to look - I have a
feeling you would find it useful to see the total number of files open
in terms of their size and quantity....

> > The vacuum has emitted a line of statistics within the first few
hours
> > which reads the following:
> >
> > INFO:  --Relation public.res--
> > INFO:  Pages 9167290: Changed 5366493, reaped 6000857, Empty 0, New
0;
> > Tup 72475389: Vac 105298234, Keep/VTL 0/0, UnUsed 221144, MinLen 72,
> > MaxLen 727; Re-using: Free/Avail. Space 48478844644/48455035400;
> > EndEmpty/Avail. Pages 0/8458521.
> >         CPU 733.62s/151.22u sec elapsed 4656.51 sec.
>
> Ok, that means it has finished the vacuum stage (since that's printed
at
> the
> end of scan_heap). It's now going through the heap compacting. From
that
> output it appears that your table is around 75Gig (9 million 8K
pages)!
> You're into the repair_frag stage where indeed it scans through the
table
> backwards.

Hmmm, a little big bigger than I was expecting, although I guess that
contains old and new pages. Here is the select result:

SELECT relname, relpages FROM pg_class ORDER BY relpages DESC;
             relname             | relpages
---------------------------------+----------
 res                        |  5586167

...so at 8Kb/page then that's about 42Gb.


> While I can see the argument for going backwards, from a caching
> perspective
> I think it's terrible. Especially in the case where the entire table
has
> been replaced, the entire exercise becomes a very expensive copy
> operation.

Agreed! If it ruins any caching then in my view it's something that has
to change in order to keep performance. While there may be a penalty to
pay on smaller tables, the benefits of caching would more than make up
for the cost of going forwards - imagine how slow CPUs would be if
everything was a cache miss....

> Let me know what those file descriptors point to and we can probably
work
> out how far along it is.

Yes please, that would be really useful for us to know.


Cheers,

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.


Re: 7.3.1 takes long time to vacuum table?

From
"Shridhar Daithankar"
Date:
On Wednesday 19 Feb 2003 7:25 pm, you wrote:
> Agreed! If it ruins any caching then in my view it's something that has
> to change in order to keep performance. While there may be a penalty to
> pay on smaller tables, the benefits of caching would more than make up
> for the cost of going forwards - imagine how slow CPUs would be if
> everything was a cache miss....

I have an idea. Don't know what to call it, better or stupid.

If you are doing vacuum full, it means you are ready to lock the database. I
am just wondering if it would be fast to dump the database, drop it and
recreate it.

Good old defragmentation of ext2fs..;-) Of course, having a spare 80Gigs might
turn out to be a problem but I can almost bet that it would finish before 12
hours..

 Shridhar

Re: 7.3.1 takes long time to vacuum table?

From
Martijn van Oosterhout
Date:
On Wed, Feb 19, 2003 at 01:55:01PM -0000, Mark Cave-Ayland wrote:
> > How long did it take to get that trace? Also, what are file
> descriptors
> > 58,
> > 97 and 114?
>
> The trace lasted about a couple of minutes. I've listed the filesystems
> from /proc and uploaded them to
> http://www.infomapper.com/strace/fd.log.txt for you to look - I have a
> feeling you would find it useful to see the total number of files open
> in terms of their size and quantity....

58 = 55792/58153.15
97 = 55792/58153.54
114 = pg_xlog/000000A50000000E

So it's writing to WAL. I don't know if I'm sure about this but if it's
currently at file no 54 and you need to end up with 45 (each file 1GB) and
you started on file 69 that means you're over halfway. But I'm really not
sure.

> Agreed! If it ruins any caching then in my view it's something that has
> to change in order to keep performance. While there may be a penalty to
> pay on smaller tables, the benefits of caching would more than make up
> for the cost of going forwards - imagine how slow CPUs would be if
> everything was a cache miss....

I'd like to get one of the developers views on this.

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Support bacteria! They're the only culture some people have.

Attachment

Re: 7.3.1 takes long time to vacuum table?

From
"Mark Cave-Ayland"
Date:
Hi Martijn, Shridhar,

> > Agreed! If it ruins any caching then in my view it's something that
has
> > to change in order to keep performance. While there may be a penalty
to
> > pay on smaller tables, the benefits of caching would more than make
up
> > for the cost of going forwards - imagine how slow CPUs would be if
> > everything was a cache miss....
>
> I'd like to get one of the developers views on this.

Just looking at Shridhar's email, and Martin's earlier comment about
SELECT * INTO newtable FROM oldtable, I'm rapidly seeing that this would
be the better way go. Because instead of seeking around millions of
pages with the vacuum, it would be less work to do this because then it
is just a sequential read and a sequential write.

Interestingly this could be used to create a speedy vacuum - that is,
create a new table with a temporary name that is invisible to the
database (similar to dropped columns), then taking into account the disk
space left on the device, pick the last X pages from the old table and
write to the new table. Then truncate the file containing the table at
point X and repeat until finished. Finally kill the old table and make
the new one visible. I appreciate there may be a couple of issues with
oids/foreign keys but it sounds like a great solution to me! Why does
vacuum bother with reordering rows? I thought that was what the CLUSTER
command was for? Any developers care to comment on this? I imagine there
must be a technical reason (prob to do with MVCC) as to why this hasn't
been done before?

In fact, my colleague has just done a test with SELECT..INTO on our dev
version and it compacted 600Mb -> 400Mb in just 40s(!). We then did a
vacuum full on the same original 600Mb table which is still going after
20mins. Difficult choice! So even in a worse case scenario we could have
a fully vacuumed table within a day.... we're looking at dropping some
indexes in the db to reclaim enough space to be able to fit another copy
of the table on the disk... this is looking very tempting at the
moment....


Cheers,

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

Re: 7.3.1 takes long time to vacuum table?

From
Tom Lane
Date:
"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> Interestingly this could be used to create a speedy vacuum - that is,
> create a new table with a temporary name that is invisible to the
> database (similar to dropped columns), then taking into account the disk
> space left on the device, pick the last X pages from the old table and
> write to the new table. Then truncate the file containing the table at
> point X and repeat until finished. Finally kill the old table and make
> the new one visible.

And if you crash midway through?

> Why does vacuum bother with reordering rows?

It's designed to be fast when there's not very much data motion required
(ie, you only need to pull a relatively small number of rows off the end
to fill in the holes elsewhere).

I have not seen any actual evidence that doing it any other way would be
faster.  Yes, it's reading the source tuples backwards instead of
forwards, but that's at most a third of the total I/O load (you've also
got tuple output and WAL writes to think about).  It's not clear that
any kernel read-ahead optimization could get a chance to work anyhow.

> In fact, my colleague has just done a test with SELECT..INTO on our dev
> version and it compacted 600Mb -> 400Mb in just 40s(!). We then did a
> vacuum full on the same original 600Mb table which is still going after
> 20mins.

Are there indexes on the original table?  If so, this isn't a fair
comparison.

            regards, tom lane

Re: 7.3.1 takes long time to vacuum table?

From
"Mark Cave-Ayland"
Date:
Hi Tom,

Thanks for your reply.

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 19 February 2003 16:28
> To: Mark Cave-Ayland
> Cc: Martijn van Oosterhout; shridhar_daithankar@persistent.co.in;
> PostgreSQL General
> Subject: Re: [GENERAL] 7.3.1 takes long time to vacuum table?
>
> "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> > Interestingly this could be used to create a speedy vacuum - that
is,
> > create a new table with a temporary name that is invisible to the
> > database (similar to dropped columns), then taking into account the
disk
> > space left on the device, pick the last X pages from the old table
and
> > write to the new table. Then truncate the file containing the table
at
> > point X and repeat until finished. Finally kill the old table and
make
> > the new one visible.
>
> And if you crash midway through?

I don't know if I'm looking at this too simplistically but....

Each table could have an attribute to indicate that it is being vacuumed
(if one does not already exist). I imagine the rest could be based on
the existing transaction code i.e. mark the last X rows of the source
table as deleted and insert them (still invisible) into the destination
table. On transaction commit, the destination copies become visible and
source table is truncated at the file level. Or is truncation not a
transaction safe activity? If a crash occurs during the vacuum then
postgres can tell from the attribute that the table was in the process
of being vacuumed and then can use the WAL to carry on from where it
left off....

> > Why does vacuum bother with reordering rows?
>
> It's designed to be fast when there's not very much data motion
required
> (ie, you only need to pull a relatively small number of rows off the
end
> to fill in the holes elsewhere).
>
> I have not seen any actual evidence that doing it any other way would
be
> faster.  Yes, it's reading the source tuples backwards instead of
> forwards, but that's at most a third of the total I/O load (you've
also
> got tuple output and WAL writes to think about).  It's not clear that
> any kernel read-ahead optimization could get a chance to work anyhow.

I see, maybe I was a little premature in my 'vacuum bashing' :) So it's
optimized for the 'few holes' case while we are using it for a 'many
holes' case..... things make a bit more sense now.

> > In fact, my colleague has just done a test with SELECT..INTO on our
dev
> > version and it compacted 600Mb -> 400Mb in just 40s(!). We then did
a
> > vacuum full on the same original 600Mb table which is still going
after
> > 20mins.
>
> Are there indexes on the original table?  If so, this isn't a fair
> comparison.

Fair point actually, I should have made it a better comparison. The
source table has 5 btree indexes, each on a bigint field. However, it
has taken just under a minute to recreate the first! The vacuum full on
the original 600Mb table has finished after 100mins, so it looks as if I
used the SELECT..INTO method could be up and done in 10mins! I can
continue recreating the other indexes to get a proper final time
comparison if you are interested?


Cheers,

Mark.


---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

Re: 7.3.1 takes long time to vacuum table?

From
Tom Lane
Date:
"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> In fact, my colleague has just done a test with SELECT..INTO on our
> dev
> version and it compacted 600Mb -> 400Mb in just 40s(!). We then did
> a
> vacuum full on the same original 600Mb table which is still going
> after
> 20mins.
>>
>> Are there indexes on the original table?  If so, this isn't a fair
>> comparison.

> Fair point actually, I should have made it a better comparison. The
> source table has 5 btree indexes, each on a bigint field. However, it
> has taken just under a minute to recreate the first! The vacuum full on
> the original 600Mb table has finished after 100mins, so it looks as if I
> used the SELECT..INTO method could be up and done in 10mins! I can
> continue recreating the other indexes to get a proper final time
> comparison if you are interested?

Yeah.  Also, I don't suppose you made that a VACUUM VERBOSE and kept the
output?  It'd be interesting to see which stages took the most time.

            regards, tom lane

Re: 7.3.1 takes long time to vacuum table?

From
Martijn van Oosterhout
Date:
On Wed, Feb 19, 2003 at 11:28:13AM -0500, Tom Lane wrote:
> "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> > Interestingly this could be used to create a speedy vacuum - that is,
> > create a new table with a temporary name that is invisible to the
> > database (similar to dropped columns), then taking into account the disk
> > space left on the device, pick the last X pages from the old table and
> > write to the new table. Then truncate the file containing the table at
> > point X and repeat until finished. Finally kill the old table and make
> > the new one visible.
>
> And if you crash midway through?

Messy definitly.

> > Why does vacuum bother with reordering rows?
>
> It's designed to be fast when there's not very much data motion required
> (ie, you only need to pull a relatively small number of rows off the end
> to fill in the holes elsewhere).
>
> I have not seen any actual evidence that doing it any other way would be
> faster.  Yes, it's reading the source tuples backwards instead of
> forwards, but that's at most a third of the total I/O load (you've also
> got tuple output and WAL writes to think about).  It's not clear that
> any kernel read-ahead optimization could get a chance to work anyhow.

Well, consider that it's reading every single page in the table from the end
down to halfway (since every tuple was updated). If you went back in chunks
of 128K then the kernel may get a chance to cache the following blocks. On a
disk, once you are reading a sector, reading the next 128 sectors is
essentially free. The marginal cost of more sectors is almost zero.

It could be argued that the kernel should be noticing that you're scanning
backward and start its read a meg or two before where you asked. But there
are so many levels of cache (both hardware and software) that may have to
play along.

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Support bacteria! They're the only culture some people have.

Attachment

Re: 7.3.1 takes long time to vacuum table?

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> Well, consider that it's reading every single page in the table from the end
> down to halfway (since every tuple was updated). If you went back in chunks
> of 128K then the kernel may get a chance to cache the following
> blocks.

I fear this would be optimization with blinkers on :-(.  The big reason
that VACUUM FULL scans backwards is that at the very first (last?) page
where it cannot push all the tuples down to lower-numbered pages, it
can abandon any attempt to move more tuples.  The file can't be made
any shorter by internal shuffling, so we should stop.  If you back up
multiple pages and then scan forward, you would usually find yourself
moving the wrong tuples, ie ones that cannot help you shrink the file.

I suspect that what we really want here is a completely different
algorithm (viz copy into a new file, like CLUSTER) when the initial scan
reveals that there's more than X percent of free space in the file.

            regards, tom lane

Re: 7.3.1 takes long time to vacuum table?

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> You could do the jump-back-in-blocks only if more than 30% of the table is
> empty and table is over 1GB. For the example here, a simple defragging
> algorithm would suffice; start at beginning and pack each tuple into the
> beginning of the file. It will move *every* tuple but it's more cache
> friendly. It's pretty extreme though.

And your evidence that it will actually be faster is ... ?

            regards, tom lane

Re: 7.3.1 takes long time to vacuum table?

From
"Shridhar Daithankar"
Date:
On Wednesday 19 Feb 2003 9:05 pm, you wrote:
> Interestingly this could be used to create a speedy vacuum - that is,
> create a new table with a temporary name that is invisible to the
> database (similar to dropped columns), then taking into account the disk
> space left on the device, pick the last X pages from the old table and
> write to the new table. Then truncate the file containing the table at
> point X and repeat until finished. Finally kill the old table and make
> the new one visible. I appreciate there may be a couple of issues with
> oids/foreign keys but it sounds like a great solution to me! Why does
> vacuum bother with reordering rows? I thought that was what the CLUSTER
> command was for? Any developers care to comment on this? I imagine there
> must be a technical reason (prob to do with MVCC) as to why this hasn't
> been done before?

Well, One thing I can think of is the extra space required. The algo. looks
good but it would be very difficult to make sure that it works all the time
especially given that postgresql does not have sophisticated and/or tunable
storage handling( think of tablespaces ).

It is always space-time trade-off. On one hand we have vacuum which uses a
constant and may be negiliible space but takes time proportional to amount of
work. On other hand we have drop/recreate table which takes double the space
but is extremely fast i.e. proportinal to data size at max. I/O bandwidth
available..

It would be good if there is in between. Of course it would not be easy to do
it. But it has to start, isn't it?..:-)

> In fact, my colleague has just done a test with SELECT..INTO on our dev
> version and it compacted 600Mb -> 400Mb in just 40s(!). We then did a
> vacuum full on the same original 600Mb table which is still going after
> 20mins. Difficult choice! So even in a worse case scenario we could have
> a fully vacuumed table within a day.... we're looking at dropping some
> indexes in the db to reclaim enough space to be able to fit another copy
> of the table on the disk... this is looking very tempting at the
> moment....

I recommend this strategy of "vacuuming" be documented in standard
documentation and FAQ. Given that postgresql is routinely deployed for
databases >10GB which is greater than small/medium by any definition today, I
think this will be a good move.

Furthermore this strategy reduces the down time due to vacuum full locks
drastically. I would say it is worth buying a 80GB IDE disk for this purpose
if you have this big database..

Nice to see that my idea helped somebody..:-)

 Shridhar

Re: 7.3.1 takes long time to vacuum table?

From
"Mark Cave-Ayland"
Date:
Hi Tom, Martijn, Shridhar,

> Yeah.  Also, I don't suppose you made that a VACUUM VERBOSE and kept
the
> output?  It'd be interesting to see which stages took the most time.
>
>             regards, tom lane

I've got the complete set of timings for the table on our dev box. I'm
afraid the vacuum wasn't run in verbose mode so I can't give you any
more information :(. This information is based on the 600Mb table on our
dev system.

Time to vacuum full:
    Total:                            100m

Time to restore by copying to another table:
    SELECT INTO
40s
    CREATE INDEX 1 (bigint) using btree            59s
    CREATE INDEX 2 (int) using btree            27s
    CREATE INDEX 3 (bigint) using btree            32s
    CREATE INDEX 4 (bigint) using btree            24s
    CREATE INDEX 5 (varchar) using btree        3m 2s
    CREATE INDEX 6 (txtidx) using gist            12m 58s
    CREATE INDEX 7 (txtidx) using gist            31m 20s

    Total:                            51m 22s


OK, so my initial estimate of rebuilding in 10mins was way out because I
forgot about the 2 massive gist fti indexes I had on the table - doh.
However, I feel that the times are still meaningful in that I now have a
replica of the table at 400Mb (down by 200Mb) in just over half the time
that the vacuum full took to do the same job.

We stopped our vacuum on our live system at the 55hr stage whilst it was
still going - it took about another hour from sending the cancel request
until the vacuum stopped. We then deleted several indexes to claim back
enough Gb to hold a second copy of the table and we've just completed
the SELECT INTO into a new table.

And the result? It has taken a total of 1h 45m to generate a copy! Given
that we are rebuilding the table *WITHOUT* the large gist indexes on our
dev version, I guess that it would only be a matter of several hours
before we can rebuild the indexes back up on the table and be using it
again.

I hope that this goes some way to showing that implementing a new type
of vacuum feature, perhaps similar in working to the one suggested
earlier in the thread, would be greatly appreciated by people with
databases even more than 0.5G in size. I would gladly support/help out
anyone who felt they could implement such a feature in this way.


Cheers,

Sparks.


---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

Re: 7.3.1 takes long time to vacuum table?

From
"Shridhar Daithankar"
Date:
On 20 Feb 2003 at 13:03, Mark Cave-Ayland wrote:
> And the result? It has taken a total of 1h 45m to generate a copy! Given
> that we are rebuilding the table *WITHOUT* the large gist indexes on our
> dev version, I guess that it would only be a matter of several hours
> before we can rebuild the indexes back up on the table and be using it
> again.

OK. From last thread, there was one more bell of caution. Having foreign key
constraints.

What I would suggest you to do  is as follows.

beign

create new table as select into..
create any necessary indexes on new table.
rename old table as something else.
rename new table as original table

commit

drop old table.

It should take care of mos practical problems that I can think of, right now.

Bye
 Shridhar

--
Acceptance testing:    An unsuccessful attempt to find bugs.


Re: 7.3.1 takes long time to vacuum table?

From
"Mark Cave-Ayland"
Date:
Hi Shridhar,

> -----Original Message-----
> From: Shridhar Daithankar<shridhar_daithankar@persistent.co.in>
> [mailto:shridhar_daithankar@persistent.co.in]
> Sent: 20 February 2003 06:32
> To: PostgreSQL General
> Subject: Re: [GENERAL] 7.3.1 takes long time to vacuum table?
>
> On Wednesday 19 Feb 2003 9:05 pm, you wrote:
> > Interestingly this could be used to create a speedy vacuum - that
is,
> > create a new table with a temporary name that is invisible to the
> > database (similar to dropped columns), then taking into account the
disk
> > space left on the device, pick the last X pages from the old table
and
> > write to the new table. Then truncate the file containing the table
at
> > point X and repeat until finished. Finally kill the old table and
make
> > the new one visible. I appreciate there may be a couple of issues
with
> > oids/foreign keys but it sounds like a great solution to me! Why
does
> > vacuum bother with reordering rows? I thought that was what the
CLUSTER
> > command was for? Any developers care to comment on this? I imagine
there
> > must be a technical reason (prob to do with MVCC) as to why this
hasn't
> > been done before?
>
> Well, One thing I can think of is the extra space required. The algo.
> looks
> good but it would be very difficult to make sure that it works all the
> time
> especially given that postgresql does not have sophisticated and/or
> tunable
> storage handling( think of tablespaces ).

In some ways, extra space isn't a problem as long as you know about it.
On our dev system, we've had several occasions where the disk has filled
and we've had to get in and recover it. When the system was designed, it
was planned to have a couple of 10s of GB spare to store additional
data, but we did not plan to need to have a second copy of our largest
table @ 40Gb a copy! So given that postgres falls over anyway when the
disk is full, I would not see this as a reason to NOT develop additional
functionality which would make use of more disk space as long as users
can be made aware of this need....

> It is always space-time trade-off. On one hand we have vacuum which
uses a
> constant and may be negiliible space but takes time proportional to
amount
> of
> work. On other hand we have drop/recreate table which takes double the
> space
> but is extremely fast i.e. proportinal to data size at max. I/O
bandwidth
> available..
>
> It would be good if there is in between. Of course it would not be
easy to
> do
> it. But it has to start, isn't it?..:-)

The situation here is that to do a vacuum full requires locking this
particular table so our system becomes practically unusable anyway. So
having the process take a day as opposed to 3-4 days has been a great
benefit to us.

> I recommend this strategy of "vacuuming" be documented in standard
> documentation and FAQ. Given that postgresql is routinely deployed for
> databases >10GB which is greater than small/medium by any definition
> today, I
> think this will be a good move.

Yes, that would be very useful if it could documented in which
situations a SELECT INTO would be dramatically more efficient than a
vacuum.

> Furthermore this strategy reduces the down time due to vacuum full
locks
> drastically. I would say it is worth buying a 80GB IDE disk for this
> purpose
> if you have this big database..
>
> Nice to see that my idea helped somebody..:-)

:) Well, thank YOU very much! As I stated in the previous email, I hope
this information goes some way to showing that the facility should be
considered more important as people move towards larger postgresql
databases.


Cheers,

Mark.

P.S. Have just received your other email while writing this, and the
procedure you described is pretty close to what we're doing. However, we
need to manually add back various constraints/default values into the
table columns which is a bit of a pain... wish it could be a little more
automatic.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

Re: 7.3.1 takes long time to vacuum table?

From
"Shridhar Daithankar"
Date:
On 20 Feb 2003 at 13:23, Mark Cave-Ayland wrote:

> P.S. Have just received your other email while writing this, and the
> procedure you described is pretty close to what we're doing. However, we
> need to manually add back various constraints/default values into the
> table columns which is a bit of a pain... wish it could be a little more
> automatic.

Well, postgresql DDL's are transaction safe. So if you rename a table in a
transaction, that won't be visible until you commit transaction.

That is the reason I described transaction in my last mail. If you encapsulate
it that way, it should not affect any referential constraints anyway. At least
that is the theory. Try it and let us know..

Bye
 Shridhar

--
intoxicated, adj.:    When you feel sophisticated without being able to pronounce
it.


Re: 7.3.1 takes long time to vacuum table?

From
Stephan Szabo
Date:
On Thu, 20 Feb 2003, Shridhar Daithankar wrote:

> On 20 Feb 2003 at 13:03, Mark Cave-Ayland wrote:
> > And the result? It has taken a total of 1h 45m to generate a copy! Given
> > that we are rebuilding the table *WITHOUT* the large gist indexes on our
> > dev version, I guess that it would only be a matter of several hours
> > before we can rebuild the indexes back up on the table and be using it
> > again.
>
> OK. From last thread, there was one more bell of caution. Having foreign key
> constraints.
>
> What I would suggest you to do  is as follows.
>
> beign
>
> create new table as select into..
> create any necessary indexes on new table.
> rename old table as something else.
> rename new table as original table
>
> commit
>
> drop old table.
>
> It should take care of mos practical problems that I can think of, right now.

That won't copy foreign key constraints, unfortuntately.  Foreign keys
aren't to a name, they're to an object, so a constraint to the old table
is still to the old table no matter what you rename it to and if something
else is renamed to the the old table's old table.


Re: 7.3.1 takes long time to vacuum table?

From
Jean-Luc Lachance
Date:
Well, here is another case where partitioning would be usefull.

Lets all agree the that vaccuming a small table should be better done by
copying to a new one.
Now, if a larger table would be partitioned, it would allow vacuuming
one partition at a time.


JLL

P.S. Is there really a need to reorder the vaccumed table???


"Shridhar Daithankar" wrote:
>
[...]
> Well, One thing I can think of is the extra space required. The algo. looks
> good but it would be very difficult to make sure that it works all the time
> especially given that postgresql does not have sophisticated and/or tunable
> storage handling( think of tablespaces ).
>
> It is always space-time trade-off. On one hand we have vacuum which uses a
> constant and may be negiliible space but takes time proportional to amount of
> work. On other hand we have drop/recreate table which takes double the space
> but is extremely fast i.e. proportinal to data size at max. I/O bandwidth
> available..
>
> It would be good if there is in between. Of course it would not be easy to do
> it. But it has to start, isn't it?..:-)
>
> > In fact, my colleague has just done a test with SELECT..INTO on our dev
> > version and it compacted 600Mb -> 400Mb in just 40s(!). We then did a
> > vacuum full on the same original 600Mb table which is still going after
> > 20mins. Difficult choice! So even in a worse case scenario we could have
> > a fully vacuumed table within a day.... we're looking at dropping some
> > indexes in the db to reclaim enough space to be able to fit another copy
> > of the table on the disk... this is looking very tempting at the
> > moment....
>
> I recommend this strategy of "vacuuming" be documented in standard
> documentation and FAQ. Given that postgresql is routinely deployed for
> databases >10GB which is greater than small/medium by any definition today, I
> think this will be a good move.
>
> Furthermore this strategy reduces the down time due to vacuum full locks
> drastically. I would say it is worth buying a 80GB IDE disk for this purpose
> if you have this big database..
>
> Nice to see that my idea helped somebody..:-)
>
>  Shridhar
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: 7.3.1 takes long time to vacuum table?

From
"Mark Cave-Ayland"
Date:
Hi Stephan, Sridhar,

> > OK. From last thread, there was one more bell of caution. Having
foreign
> key
> > constraints.
> >
> > What I would suggest you to do  is as follows.
> >
> > beign
> >
> > create new table as select into..
> > create any necessary indexes on new table.
> > rename old table as something else.
> > rename new table as original table
> >
> > commit
> >
> > drop old table.
> >
> > It should take care of mos practical problems that I can think of,
right
> now.
>
> That won't copy foreign key constraints, unfortuntately.  Foreign keys
> aren't to a name, they're to an object, so a constraint to the old
table
> is still to the old table no matter what you rename it to and if
something
> else is renamed to the the old table's old table.

Aha I didn't realize the significance of the transaction block in
Sridhar's previous email. In this case we should be ok since we're not
using inheritance or foreign key constraints - I was just thinking about
the more generic case when trying to come up with a better way to
vacuum. But it would have been an interesting hack if it had worked :)
Progress on clearing up the database is going well, we're getting real
close now....


Cheers,

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

Re: 7.3.1 takes long time to vacuum table?

From
"Mark Cave-Ayland"
Date:
Hi Tom,

> I suspect that what we really want here is a completely different
> algorithm (viz copy into a new file, like CLUSTER) when the initial
scan
> reveals that there's more than X percent of free space in the file.

Yes! Sorry Tom, I missed the detail of this the first time around.
However from the figures here it would need to be a very small
percentage of free space, perhaps only upto 10-15% where the existing
vacuum would be a better method. Hope they were useful to you.


Cheers,

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

Re: 7.3.1 takes long time to vacuum table?

From
"Mark Cave-Ayland"
Date:
Yes! That sounds like a good idea that could be used to implement 'live'
table vacuuming.....

Would it be possible to partition each postgres table across 2 files?
You define a stripe size, say 1000, which is the number of pages that
are written in turn to each file. I guess this would be chosen so as to
not penalise sequential table scans too much. So the first 1000 pages
would get written to file A, the next 1000 to file B, the next 1000 to
file A again.... and so on.

Any queries against the table must search both files to return the rows
required. The immediate downside is that I guess indexes would have to
be updated so they were aware that data was stored across two different
files.... but let's carry on for a moment....

So now if you want to vacuum the whole table, you first lock file A and
begin vacuuming it using an appropriate method. While this is running,
if someone tries to delete a row from file A, the row is simply marked
as deleted and moved to the end of the file during the vacuum. A
deletion from file B is simply marked as deleted as normal. If someone
does an insert or an update on any row in the table then the changed
rows are written to file B while file A is locked and vice-versa.

Assuming that each file contains roughly the same number of rows then
any new updates/inserts to the table should be distributed evenly across
both files since the vacuum time of each file should be roughly equal.
(If this is not the case then it may be necessary to perform some form
of 'balancing' by taking some rows from the larger file and moving them
to the smaller one... I guess some details are missing here.).

Another downside of this would be that the last < 1000 pages of each
file wouldn't be vacuumed - but would people really mind if a table
wasn't fully vacuumed? I don't think they would. However, while there
are probably several flaws in the example I gave above, I think
Jean-Luc's idea of relating partitioning to this thread could be used to
eliminate many of the problems of vacuuming that currently exist....


Food for thought,


Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

> -----Original Message-----
> From: Jean-Luc Lachance [mailto:jllachan@nsd.ca]
> Sent: 20 February 2003 16:43
> To: Shridhar Daithankar<shridhar_daithankar@persistent.co.in>
> Cc: PostgreSQL General
> Subject: Re: [GENERAL] 7.3.1 takes long time to vacuum table?
>
> Well, here is another case where partitioning would be usefull.
>
> Lets all agree the that vaccuming a small table should be better done
by
> copying to a new one.
> Now, if a larger table would be partitioned, it would allow vacuuming
> one partition at a time.
>
>
> JLL
>
> P.S. Is there really a need to reorder the vaccumed table???


Re: 7.3.1 takes long time to vacuum table?

From
Martijn van Oosterhout
Date:
On Wed, Feb 19, 2003 at 08:53:42PM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > Well, consider that it's reading every single page in the table from the end
> > down to halfway (since every tuple was updated). If you went back in chunks
> > of 128K then the kernel may get a chance to cache the following
> > blocks.
>
> I fear this would be optimization with blinkers on :-(.  The big reason
> that VACUUM FULL scans backwards is that at the very first (last?) page
> where it cannot push all the tuples down to lower-numbered pages, it
> can abandon any attempt to move more tuples.  The file can't be made
> any shorter by internal shuffling, so we should stop.  If you back up
> multiple pages and then scan forward, you would usually find yourself
> moving the wrong tuples, ie ones that cannot help you shrink the file.

I agree with the general idea. However, in this case there are 40GB+ of tuples
to move; if you moved backwards in steps of 2MB it would make no significant
difference on the resulting table. It would only be a problem near the end
of the compacting. Then you can stop, the remaining pages can surely be kept
track of in the FSM.

Next time you do a vacuum you can go back and do the compacting properly. On
tables of the size that matter here, I don't think anyone will care if the
last 2MB (=0.0044% of table) isn't optimally packed the first time round.

Does vacuum full have to produce the optimum result the first time?

> I suspect that what we really want here is a completely different
> algorithm (viz copy into a new file, like CLUSTER) when the initial scan
> reveals that there's more than X percent of free space in the file.

You could do the jump-back-in-blocks only if more than 30% of the table is
empty and table is over 1GB. For the example here, a simple defragging
algorithm would suffice; start at beginning and pack each tuple into the
beginning of the file. It will move *every* tuple but it's more cache
friendly. It's pretty extreme though.

It does preserve table order though whereas the current algorithm will
reverse the order of all the tuples in the table, possibly causing similar
backward-scan problems later with your index-scans.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Support bacteria! They're the only culture some people have.

Attachment

Re: 7.3.1 takes long time to vacuum table?

From
Martijn van Oosterhout
Date:
On Wed, Feb 19, 2003 at 10:37:45PM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > You could do the jump-back-in-blocks only if more than 30% of the table is
> > empty and table is over 1GB. For the example here, a simple defragging
> > algorithm would suffice; start at beginning and pack each tuple into the
> > beginning of the file. It will move *every* tuple but it's more cache
> > friendly. It's pretty extreme though.
>
> And your evidence that it will actually be faster is ... ?

Will, in this guy's example, the first stage of the vacuum (scan_heap)
finished in a few hours but the second stage is still going after 50 hours
and estimating a bit over half done. The major difference between the two is
that the latter goes backwards through the table and the other forwards.

Sure, it's anecdotal and a bit hand wavey but I think there's something to
it. Come to think of it, if that strace had used -tt it would have been
more interesting.

Anyway, the test would be to implement it and them time it. The vacuum code
looks scary though. I'm not sure if I want to mess with it...
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Support bacteria! They're the only culture some people have.

Attachment

Re: 7.3.1 takes long time to vacuum table?

From
"Shridhar Daithankar"
Date:
On 20 Feb 2003 at 11:42, Jean-Luc Lachance wrote:

> Well, here is another case where partitioning would be usefull.

Not really. But point is well understood.

> Lets all agree the that vaccuming a small table should be better done by
> copying to a new one.
> Now, if a larger table would be partitioned, it would allow vacuuming
> one partition at a time.

postgresql splits data files at 1 GB. If DBA assures postgresql that there is
at least 1 gig of free space on data directory, vacuum full can use these data
files as partition and compact the space pretty quickly.

I think adding a flag to vacuum full to effect of use temporary table space for
compaction aggresively, I think it should be possible to implement this pretty
quickly.

On the other hand, I think partitioining table should be pretty easy since the
logic for splitting things across files is already there. Only if we add  logic
to splitting file location as well.


Bye
 Shridhar

--
Air Force Inertia Axiom:    Consistency is always easier to defend than
correctness.