Home > mailing lists

Re: COUNT(*) and index-only scans - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: COUNT(*) and index-only scans
Date	October 12, 2011 13:52:20
Msg-id	12693.1318427529@sss.pgh.pa.us Whole thread Raw
In response to	Re: COUNT(*) and index-only scans (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: COUNT() and index-only scans Re: COUNT() and index-only scans
List	pgsql-hackers

Tree view

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Oct 12, 2011 at 2:50 AM, Jeff Davis <pgsql@j-davis.com> wrote:
>> On Tue, 2011-10-11 at 13:22 -0400, Robert Haas wrote:
>>> The real issue is that the costing estimates need to be accurate, and
>>> that's where the rubber hits the road.

>> Can you send stats messages to keep track when you unset a bit in the
>> VM? That might allow it to be more up-to-date.

> In theory, that seems like it would work, although I'm a little
> worried about the overhead.

I think it's overkill, and possibly unpleasantly unstable as well.
For the initial attack on this we should just have VACUUM and ANALYZE
count the number of all-visible blocks and store that in pg_class along
with the tuple-count statistics.  There's no reason to think that this
will be any worse than the way we deal with dead tuple counts, for
instance.

What bothers me considerably more is the issue about how specific
queries might see an all-visible fraction that's very substantially
different from the table's overall ratio, especially in examples such as
historical-data tables where most of the update and query activity has
to do with recently-added rows.  I don't see any practical way to attack
that problem with statistics; we're just going to have to adopt some
estimation rule.  What I suggest as a first cut for that is: simply
derate the visibility fraction as the fraction of the table expected to
be scanned gets smaller.  That is, if the query fetches nearly all of
the table, take the stored visibility ratio at face value; if it fetches
only one block, never believe that that will be an all-visible block;
and in general if we're expecting to read a fraction f of the pages,
multiply the whole-table visibility ratio by f before using it in the
cost estimate.  This amounts to assuming that the historical-data case
is the usual case, but I'm not sure that's unfair.
        regards, tom lane

pgsql-hackers by date:

From: Robert Haas
Date: 12 October 2011, 13:34:58
Subject: Re: [BUGS] *.sql contrib files contain unresolvable MODULE_PATHNAME

From: Tom Lane
Date: 12 October 2011, 14:12:49
Subject: Re: [BUGS] *.sql contrib files contain unresolvable MODULE_PATHNAME

Re: COUNT(*) and index-only scans - Mailing list pgsql-hackers

Previous

Next