Re: Still recommending daily vacuum... - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Still recommending daily vacuum...
Date
Msg-id 20070704221052.GL5035@alvh.no-ip.org
Whole thread Raw
In response to Re: Still recommending daily vacuum...  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: Still recommending daily vacuum...  (Michael Paesold <mpaesold@gmx.at>)
List pgsql-hackers
Gregory Stark wrote:
> 
> "Alvaro Herrera" <alvherre@commandprompt.com> writes:
> 
> >> FWIW, I normally go with the 8.2 defaults, though I could see dropping
> >> vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
> >> could be decreased further, maybe divide by 10.
> >
> > How about pushing thresholds all the way down to 0?
> 
> My intuition is that the thresholds should be lowered to about 5%.
> 
> I may be biased by the TPC-C schema where the largest table, stock, gets a
> little over 20 records per page so 5% represents an average of one update per
> page. But there's nothing unusual about a table like that. Waiting until 20%
> of the table is potentially dead --four dead tuples out of 20 per page in the
> stock table case-- seems extravagantly wasteful.
> 
> I find the idea of lowering the thresholds to 0 sort of intriguing though.
> That makes the vacuum delay parameters the primary method to control how
> frequently vacuum runs.

I think you are mixing thresholds with scale factors.

vacuum tuples = threshold + reltuples * scale factor

If dead tuples are more than vacuum tuples, autovac does a vacuum.

So what you are proposing above amounts to setting scale factor = 0.05.
The threshold is unimportant -- in the case of a big table it matters
not if it's 0 or 1000, it will be almost irrelevant in calculations.  In
the case of small tables, then the table will be vacuumed in almost
every iteration if the threshold is 0, which is fine because the table
is small anyway.  So why not let the threshold be 0 and be done with it?

I, too, find a 0.2 scale factor a bit high.  But since I don't run any
database, I fear I would be picking numbers out of thin air.

> In an ideal world autovacuum would be able to set the delay settings based on
> how many updates had happened since the last run started. If more than 5% of
> the table was cleaned by vacuum then decrease the delay settings to get this
> vacuum to finish sooner and allow fewer updates. If less than 5% of the table
> was cleaned by vacuum then increase the delay settings to reduce the
> unnecessary impact of vacuum. But that just leaves us back where we started.

Maybe we can construct some smarts based on something like this.  The
equations we currently use are just inherited from contrib autovac,
which didn't have access to much other info.  Integrated autovac can do
much better, I think.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Still recommending daily vacuum...
Next
From: Gregory Stark
Date:
Subject: Re: ACM Paper relevant to our buffer algorithm