Robert Haas <robertmhaas@gmail.com> writes:
> I'm not concerned about an index scan vs. a sequential scan here. I'm
> concerned about it being impossible for the DBA to get an index-only
> scan when s/he wants it very badly. The current (stupid) formula
> handles this case just about perfectly - it will prefer a smaller
> index over a larger one, except when a covering index is available, in
> which case it will prefer the smallest covering index. That sounds
> exactly right to me. We get that behavior because the 10% of heap
> fetches that we're assuming we'll get to skip is larger than the
> penalty for using a bigger index. If we take out 10% and replace it
> by all_visible_percentage * fraction_of_tuples_fetched, then that 10%
> is going to drop to some infinitesmally small value on single row
> fetches from giant tables. But that's exactly one of the cases for
> which people want index-only scans in the first place. It's no better
> to be overly pessimistic here than it is to be overly optimistic. If
> the table is 90% all-visible, the probability of our finding an
> all-visible row is probably not 90%. But it's probably not 0.01% or
> 0.0001% either.
I think you're overstating the size of the problem. Given that fact
pattern, the thing will choose an indexscan anyway, because it's the
cheapest alternative even under traditional costing rules. And it will
choose to use an index-only scan if the index is covering. It doesn't
matter what the exact cost estimate is.
The place where the decision is actually somewhat hard, IMO, is where
you're pulling a small part of the table but significantly more than one
row, and the traditional best choice would be a bitmap scan. Now we
have to guess whether possibly avoiding heap fetches is better than
batching the fetches, and it doesn't seem open-and-shut to me.
But having said that, I see some credibility in Aidan's suggestion that
pages that actually have to be fetched from disk are disproportionately
likely to be all-visible. That would counteract the history-table
effect of correlation between current reads and recent changes,
probably not perfectly, but to a considerable extent. So right at the
moment I'm inclined to just apply the most-recently-measured visibility
fraction at face value. We shouldn't complicate matters more than that
until we have more field experience to tell us what really happens.
regards, tom lane