On Wed, Oct 12, 2011 at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What bothers me considerably more is the issue about how specific
> queries might see an all-visible fraction that's very substantially
> different from the table's overall ratio, especially in examples such as
> historical-data tables where most of the update and query activity has
> to do with recently-added rows. I don't see any practical way to attack
> that problem with statistics; we're just going to have to adopt some
> estimation rule. What I suggest as a first cut for that is: simply
> derate the visibility fraction as the fraction of the table expected to
> be scanned gets smaller. That is, if the query fetches nearly all of
> the table, take the stored visibility ratio at face value; if it fetches
> only one block, never believe that that will be an all-visible block;
> and in general if we're expecting to read a fraction f of the pages,
> multiply the whole-table visibility ratio by f before using it in the
> cost estimate. This amounts to assuming that the historical-data case
> is the usual case, but I'm not sure that's unfair.
I don't think that's an unfair assumption -- in fact I think it's
exactly the right assumption -- but I'm worried about how the math
works out with that specific proposal.
- Suppose VACUUM processes the table and makes it all-visible. Then,
somebody comes along and updates one tuple on every page, making them
all not-all-visible, but not trigger VACUUM because we're nowhere
close the 20% threshold. Now COUNT(*) will think it should use an
index-scan, but really... not so much. In fact, even if it's only
that a tuple has been updated on 25% of the pages, we're probably in
trouble.
- Suppose the table has a million rows and we're going to read 100 of
them, or 0.01%. Now it might appear that a covering index has a
negligible advantage over a non-covering index, but in fact I think we
still want to err on the side of trying to use the covering index. In
fact, even if we're only reading a single row, we probably still
generally want to pick up the covering index, to cater to the case
where someone is doing primary key fetches against a gigantic table
and hoping that index-only scans will save them from random I/O hell.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company