Re: Partial vacuum versus pg_class.reltuples - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Partial vacuum versus pg_class.reltuples
Date
Msg-id 17119.1244402651@sss.pgh.pa.us
Whole thread Raw
In response to Re: Partial vacuum versus pg_class.reltuples  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Partial vacuum versus pg_class.reltuples  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> Am I wrong to be frightened by the implications of updating this value
> only once in a blue moon?

It's not great, but I think it's probably not catastrophic either.
Keep in mind that all we need from reltuples is that the ratio
reltuples/relpages be a reasonable estimate of the density of live
tuples, because what the planner actually uses is GetRelationNumberOfBlocks() * reltuples / relpages.
So for example an append-only table isn't a big problem, even if it's
been quite a while since we updated reltuples and relpages.

There was some mention of having a partial vacuum extrapolate a value of
reltuples and update pg_class with that.  I'm afraid that that could be
a seriously bad idea; because there is no very good reason to suppose
that the subset of recently-modified pages forms a good sample of the
whole table as far as live-tuple density goes.

[ thinks a bit and reads the code some more ... ]  There is a
considerably safer alternative, which is to let ANALYZE update the
reltuples estimate based on the pages it sampled; which should be a
considerably less biased sample than the pages a partial vacuum would
have looked at.  And we have already got the code doing that, either
in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned
out to be partial.

So actually I think we are in better shape than I realized, so far as
the heap reltuples numbers go.  The case that's a bit nasty is where
we are propagating the heap reltuples number to the index reltuples
number for a GIN index.  (Remember this only matters for a partial
index.)  As the code stands today, what we'll be propagating is the
reltuples estimate from the most recent ANALYZE, not the ANALYZE that
we might be about to conduct.  This is not great; but considering that
we are completely ignoring the first-order problem of the partial index
predicate's selectivity, quibbling about a second-order effect like the
estimate being out of date is pretty pointless.

> Do we have any reasonable manual way of forcing
> VACUUM to scan the entire heap?

You could use VACUUM FREEZE, for instance.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: pg_migrator issue with contrib
Next
From: Stefan Kaltenbrunner
Date:
Subject: Re: pg_migrator issue with contrib