Re: 7.3.1 takes long time to vacuum table? - Mailing list pgsql-general

From Mark Cave-Ayland
Subject Re: 7.3.1 takes long time to vacuum table?
Date
Msg-id C1379626F9C09A4C821D6977AA6A54570632A0@webbased8.wb8.webbased.co.uk
Whole thread Raw
In response to 7.3.1 takes long time to vacuum table?  ("Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk>)
Responses Re: 7.3.1 takes long time to vacuum table?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hi Tom,

Thanks for your reply.

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 19 February 2003 16:28
> To: Mark Cave-Ayland
> Cc: Martijn van Oosterhout; shridhar_daithankar@persistent.co.in;
> PostgreSQL General
> Subject: Re: [GENERAL] 7.3.1 takes long time to vacuum table?
>
> "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> > Interestingly this could be used to create a speedy vacuum - that
is,
> > create a new table with a temporary name that is invisible to the
> > database (similar to dropped columns), then taking into account the
disk
> > space left on the device, pick the last X pages from the old table
and
> > write to the new table. Then truncate the file containing the table
at
> > point X and repeat until finished. Finally kill the old table and
make
> > the new one visible.
>
> And if you crash midway through?

I don't know if I'm looking at this too simplistically but....

Each table could have an attribute to indicate that it is being vacuumed
(if one does not already exist). I imagine the rest could be based on
the existing transaction code i.e. mark the last X rows of the source
table as deleted and insert them (still invisible) into the destination
table. On transaction commit, the destination copies become visible and
source table is truncated at the file level. Or is truncation not a
transaction safe activity? If a crash occurs during the vacuum then
postgres can tell from the attribute that the table was in the process
of being vacuumed and then can use the WAL to carry on from where it
left off....

> > Why does vacuum bother with reordering rows?
>
> It's designed to be fast when there's not very much data motion
required
> (ie, you only need to pull a relatively small number of rows off the
end
> to fill in the holes elsewhere).
>
> I have not seen any actual evidence that doing it any other way would
be
> faster.  Yes, it's reading the source tuples backwards instead of
> forwards, but that's at most a third of the total I/O load (you've
also
> got tuple output and WAL writes to think about).  It's not clear that
> any kernel read-ahead optimization could get a chance to work anyhow.

I see, maybe I was a little premature in my 'vacuum bashing' :) So it's
optimized for the 'few holes' case while we are using it for a 'many
holes' case..... things make a bit more sense now.

> > In fact, my colleague has just done a test with SELECT..INTO on our
dev
> > version and it compacted 600Mb -> 400Mb in just 40s(!). We then did
a
> > vacuum full on the same original 600Mb table which is still going
after
> > 20mins.
>
> Are there indexes on the original table?  If so, this isn't a fair
> comparison.

Fair point actually, I should have made it a better comparison. The
source table has 5 btree indexes, each on a bigint field. However, it
has taken just under a minute to recreate the first! The vacuum full on
the original 600Mb table has finished after 100mins, so it looks as if I
used the SELECT..INTO method could be up and done in 10mins! I can
continue recreating the other indexes to get a proper final time
comparison if you are interested?


Cheers,

Mark.


---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Concurrency and locks
Next
From: Dave Smith
Date:
Subject: Removing spaces