Re: COUNT(*) and index-only scans - Mailing list pgsql-hackers

From Tom Lane
Subject Re: COUNT(*) and index-only scans
Date
Msg-id 12693.1318427529@sss.pgh.pa.us
Whole thread Raw
In response to Re: COUNT(*) and index-only scans  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: COUNT(*) and index-only scans
Re: COUNT(*) and index-only scans
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Oct 12, 2011 at 2:50 AM, Jeff Davis <pgsql@j-davis.com> wrote:
>> On Tue, 2011-10-11 at 13:22 -0400, Robert Haas wrote:
>>> The real issue is that the costing estimates need to be accurate, and
>>> that's where the rubber hits the road.

>> Can you send stats messages to keep track when you unset a bit in the
>> VM? That might allow it to be more up-to-date.

> In theory, that seems like it would work, although I'm a little
> worried about the overhead.

I think it's overkill, and possibly unpleasantly unstable as well.
For the initial attack on this we should just have VACUUM and ANALYZE
count the number of all-visible blocks and store that in pg_class along
with the tuple-count statistics.  There's no reason to think that this
will be any worse than the way we deal with dead tuple counts, for
instance.

What bothers me considerably more is the issue about how specific
queries might see an all-visible fraction that's very substantially
different from the table's overall ratio, especially in examples such as
historical-data tables where most of the update and query activity has
to do with recently-added rows.  I don't see any practical way to attack
that problem with statistics; we're just going to have to adopt some
estimation rule.  What I suggest as a first cut for that is: simply
derate the visibility fraction as the fraction of the table expected to
be scanned gets smaller.  That is, if the query fetches nearly all of
the table, take the stored visibility ratio at face value; if it fetches
only one block, never believe that that will be an all-visible block;
and in general if we're expecting to read a fraction f of the pages,
multiply the whole-table visibility ratio by f before using it in the
cost estimate.  This amounts to assuming that the historical-data case
is the usual case, but I'm not sure that's unfair.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [BUGS] *.sql contrib files contain unresolvable MODULE_PATHNAME
Next
From: Tom Lane
Date:
Subject: Re: [BUGS] *.sql contrib files contain unresolvable MODULE_PATHNAME