Re: 7.3.1 takes long time to vacuum table? - Mailing list pgsql-general

From Tom Lane
Subject Re: 7.3.1 takes long time to vacuum table?
Date
Msg-id 11914.1045672093@sss.pgh.pa.us
Whole thread Raw
In response to Re: 7.3.1 takes long time to vacuum table?  ("Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk>)
Responses Re: 7.3.1 takes long time to vacuum table?  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-general
"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> Interestingly this could be used to create a speedy vacuum - that is,
> create a new table with a temporary name that is invisible to the
> database (similar to dropped columns), then taking into account the disk
> space left on the device, pick the last X pages from the old table and
> write to the new table. Then truncate the file containing the table at
> point X and repeat until finished. Finally kill the old table and make
> the new one visible.

And if you crash midway through?

> Why does vacuum bother with reordering rows?

It's designed to be fast when there's not very much data motion required
(ie, you only need to pull a relatively small number of rows off the end
to fill in the holes elsewhere).

I have not seen any actual evidence that doing it any other way would be
faster.  Yes, it's reading the source tuples backwards instead of
forwards, but that's at most a third of the total I/O load (you've also
got tuple output and WAL writes to think about).  It's not clear that
any kernel read-ahead optimization could get a chance to work anyhow.

> In fact, my colleague has just done a test with SELECT..INTO on our dev
> version and it compacted 600Mb -> 400Mb in just 40s(!). We then did a
> vacuum full on the same original 600Mb table which is still going after
> 20mins.

Are there indexes on the original table?  If so, this isn't a fair
comparison.

            regards, tom lane

pgsql-general by date:

Previous
From: Greg Copeland
Date:
Subject: Re: Table Partitioning in Postgres:
Next
From: Steve Crawford
Date:
Subject: Re: continuous data from stdin