Re: COUNT(*) and index-only scans - Mailing list pgsql-hackers

From Robert Haas
Subject Re: COUNT(*) and index-only scans
Date
Msg-id CA+TgmoY70NrR==uf4evussiPn+EWKaHTJVUDZR7bB+DesxnkvA@mail.gmail.com
Whole thread Raw
In response to Re: COUNT(*) and index-only scans  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: COUNT(*) and index-only scans
List pgsql-hackers
On Wed, Oct 12, 2011 at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What bothers me considerably more is the issue about how specific
> queries might see an all-visible fraction that's very substantially
> different from the table's overall ratio, especially in examples such as
> historical-data tables where most of the update and query activity has
> to do with recently-added rows.  I don't see any practical way to attack
> that problem with statistics; we're just going to have to adopt some
> estimation rule.  What I suggest as a first cut for that is: simply
> derate the visibility fraction as the fraction of the table expected to
> be scanned gets smaller.  That is, if the query fetches nearly all of
> the table, take the stored visibility ratio at face value; if it fetches
> only one block, never believe that that will be an all-visible block;
> and in general if we're expecting to read a fraction f of the pages,
> multiply the whole-table visibility ratio by f before using it in the
> cost estimate.  This amounts to assuming that the historical-data case
> is the usual case, but I'm not sure that's unfair.

I don't think that's an unfair assumption -- in fact I think it's
exactly the right assumption -- but I'm worried about how the math
works out with that specific proposal.

- Suppose VACUUM processes the table and makes it all-visible.  Then,
somebody comes along and updates one tuple on every page, making them
all not-all-visible, but not trigger VACUUM because we're nowhere
close the 20% threshold.  Now COUNT(*) will think it should use an
index-scan, but really... not so much.  In fact, even if it's only
that a tuple has been updated on 25% of the pages, we're probably in
trouble.

- Suppose the table has a million rows and we're going to read 100 of
them, or 0.01%.  Now it might appear that a covering index has a
negligible advantage over a non-covering index, but in fact I think we
still want to err on the side of trying to use the covering index.  In
fact, even if we're only reading a single row, we probably still
generally want to pick up the covering index, to cater to the case
where someone is doing primary key fetches against a gigantic table
and hoping that index-only scans will save them from random I/O hell.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: COUNT(*) and index-only scans
Next
From: Tom Lane
Date:
Subject: Re: COUNT(*) and index-only scans