Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Oct 12, 2011 at 2:50 AM, Jeff Davis <pgsql@j-davis.com> wrote:
>> On Tue, 2011-10-11 at 13:22 -0400, Robert Haas wrote:
>>> The real issue is that the costing estimates need to be accurate, and
>>> that's where the rubber hits the road.
>> Can you send stats messages to keep track when you unset a bit in the
>> VM? That might allow it to be more up-to-date.
> In theory, that seems like it would work, although I'm a little
> worried about the overhead.
I think it's overkill, and possibly unpleasantly unstable as well.
For the initial attack on this we should just have VACUUM and ANALYZE
count the number of all-visible blocks and store that in pg_class along
with the tuple-count statistics. There's no reason to think that this
will be any worse than the way we deal with dead tuple counts, for
instance.
What bothers me considerably more is the issue about how specific
queries might see an all-visible fraction that's very substantially
different from the table's overall ratio, especially in examples such as
historical-data tables where most of the update and query activity has
to do with recently-added rows. I don't see any practical way to attack
that problem with statistics; we're just going to have to adopt some
estimation rule. What I suggest as a first cut for that is: simply
derate the visibility fraction as the fraction of the table expected to
be scanned gets smaller. That is, if the query fetches nearly all of
the table, take the stored visibility ratio at face value; if it fetches
only one block, never believe that that will be an all-visible block;
and in general if we're expecting to read a fraction f of the pages,
multiply the whole-table visibility ratio by f before using it in the
cost estimate. This amounts to assuming that the historical-data case
is the usual case, but I'm not sure that's unfair.
regards, tom lane