Re: Partial vacuum versus pg_class.reltuples - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Partial vacuum versus pg_class.reltuples
Date
Msg-id 4136ffa0906071137t1fb08b66hd717cec73d4cc7fd@mail.gmail.com
Whole thread Raw
In response to Re: Partial vacuum versus pg_class.reltuples  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sun, Jun 7, 2009 at 7:11 PM, Robert Haas<robertmhaas@gmail.com> wrote:
> Am I wrong to be frightened by the implications of updating this value
> only once in a blue moon?  Doesn't this have the potential to result
> in really bad plans?  Do we have any reasonable manual way of forcing
> VACUUM to scan the entire heap?

It shouldn't be necessary to scan the entire heap to get a reasonable
estimate for the number of tuples. Analyze doesn't have to, for
example. Perhaps we should just make autovacuum periodically run
analyze even if it has run a vacuum recently -- i.e. not make vacuum
count as a recent analyze.

Actually it should be possible to get a more accurate estimate out of
ANALYZE than we used to as well. It could very quickly scan the entire
FSM and use that and the average tuple size to get a much more
accurate estimate for the number of tuples.

For VACUUM ANALYZE we could have it count the actual number of tuples
in the vacuumable pages and separately take a sample of non-vacuumable
pages and calculate an estimate based on the FSM and the average tuple
size in those non-vacuumable pages and add those two values together.
Since it just looked at every vacuumable page those FSM values are
precisely accurate and the estimate for average tuple size ought to be
pretty reliable.

--
greg
http://mit.edu/~gsstark/resume.pdf


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Partial vacuum versus pg_class.reltuples
Next
From: Peter Eisentraut
Date:
Subject: Re: Managing multiple branches in git