Re: 7.3.1 takes long time to vacuum table? - Mailing list pgsql-general
From | Mark Cave-Ayland |
---|---|
Subject | Re: 7.3.1 takes long time to vacuum table? |
Date | |
Msg-id | C1379626F9C09A4C821D6977AA6A54570632A0@webbased8.wb8.webbased.co.uk Whole thread Raw |
In response to | 7.3.1 takes long time to vacuum table? ("Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk>) |
Responses |
Re: 7.3.1 takes long time to vacuum table?
|
List | pgsql-general |
Hi Tom, Thanks for your reply. > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 19 February 2003 16:28 > To: Mark Cave-Ayland > Cc: Martijn van Oosterhout; shridhar_daithankar@persistent.co.in; > PostgreSQL General > Subject: Re: [GENERAL] 7.3.1 takes long time to vacuum table? > > "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes: > > Interestingly this could be used to create a speedy vacuum - that is, > > create a new table with a temporary name that is invisible to the > > database (similar to dropped columns), then taking into account the disk > > space left on the device, pick the last X pages from the old table and > > write to the new table. Then truncate the file containing the table at > > point X and repeat until finished. Finally kill the old table and make > > the new one visible. > > And if you crash midway through? I don't know if I'm looking at this too simplistically but.... Each table could have an attribute to indicate that it is being vacuumed (if one does not already exist). I imagine the rest could be based on the existing transaction code i.e. mark the last X rows of the source table as deleted and insert them (still invisible) into the destination table. On transaction commit, the destination copies become visible and source table is truncated at the file level. Or is truncation not a transaction safe activity? If a crash occurs during the vacuum then postgres can tell from the attribute that the table was in the process of being vacuumed and then can use the WAL to carry on from where it left off.... > > Why does vacuum bother with reordering rows? > > It's designed to be fast when there's not very much data motion required > (ie, you only need to pull a relatively small number of rows off the end > to fill in the holes elsewhere). > > I have not seen any actual evidence that doing it any other way would be > faster. Yes, it's reading the source tuples backwards instead of > forwards, but that's at most a third of the total I/O load (you've also > got tuple output and WAL writes to think about). It's not clear that > any kernel read-ahead optimization could get a chance to work anyhow. I see, maybe I was a little premature in my 'vacuum bashing' :) So it's optimized for the 'few holes' case while we are using it for a 'many holes' case..... things make a bit more sense now. > > In fact, my colleague has just done a test with SELECT..INTO on our dev > > version and it compacted 600Mb -> 400Mb in just 40s(!). We then did a > > vacuum full on the same original 600Mb table which is still going after > > 20mins. > > Are there indexes on the original table? If so, this isn't a fair > comparison. Fair point actually, I should have made it a better comparison. The source table has 5 btree indexes, each on a bigint field. However, it has taken just under a minute to recreate the first! The vacuum full on the original 600Mb table has finished after 100mins, so it looks as if I used the SELECT..INTO method could be up and done in 10mins! I can continue recreating the other indexes to get a proper final time comparison if you are interested? Cheers, Mark. --- Mark Cave-Ayland Webbased Ltd. Tamar Science Park Derriford Plymouth PL6 8BX England Tel: +44 (0)1752 764445 Fax: +44 (0)1752 764446 This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person.
pgsql-general by date: