Home > mailing lists

Re: 7.3.1 takes long time to vacuum table? - Mailing list pgsql-general

From	Martijn van Oosterhout
Subject	Re: 7.3.1 takes long time to vacuum table?
Date	February 20, 2003 18:20:05
Msg-id	20030220023316.GE10807@svana.org Whole thread Raw
In response to	Re: 7.3.1 takes long time to vacuum table? (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-general

Tree view

On Wed, Feb 19, 2003 at 08:53:42PM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > Well, consider that it's reading every single page in the table from the end
> > down to halfway (since every tuple was updated). If you went back in chunks
> > of 128K then the kernel may get a chance to cache the following
> > blocks.
>
> I fear this would be optimization with blinkers on :-(.  The big reason
> that VACUUM FULL scans backwards is that at the very first (last?) page
> where it cannot push all the tuples down to lower-numbered pages, it
> can abandon any attempt to move more tuples.  The file can't be made
> any shorter by internal shuffling, so we should stop.  If you back up
> multiple pages and then scan forward, you would usually find yourself
> moving the wrong tuples, ie ones that cannot help you shrink the file.

I agree with the general idea. However, in this case there are 40GB+ of tuples
to move; if you moved backwards in steps of 2MB it would make no significant
difference on the resulting table. It would only be a problem near the end
of the compacting. Then you can stop, the remaining pages can surely be kept
track of in the FSM.

Next time you do a vacuum you can go back and do the compacting properly. On
tables of the size that matter here, I don't think anyone will care if the
last 2MB (=0.0044% of table) isn't optimally packed the first time round.

Does vacuum full have to produce the optimum result the first time?

> I suspect that what we really want here is a completely different
> algorithm (viz copy into a new file, like CLUSTER) when the initial scan
> reveals that there's more than X percent of free space in the file.

You could do the jump-back-in-blocks only if more than 30% of the table is
empty and table is over 1GB. For the example here, a simple defragging
algorithm would suffice; start at beginning and pack each tuple into the
beginning of the file. It will move *every* tuple but it's more cache
friendly. It's pretty extreme though.

It does preserve table order though whereas the current algorithm will
reverse the order of all the tuples in the table, possibly causing similar
backward-scan problems later with your index-scans.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Support bacteria! They're the only culture some people have.

Attachment

msg-32091-55364.dat

pgsql-general by date:

From: Steve Crawford
Date: 20 February 2003, 18:19:34
Subject: Re: What is the quickest query in the database?

From: Martijn van Oosterhout
Date: 20 February 2003, 18:20:29
Subject: Re: 7.3.1 takes long time to vacuum table?

Re: 7.3.1 takes long time to vacuum table? - Mailing list pgsql-general

Attachment

Previous

Next