Jeff Hoffmann <jeff@propertykey.com> writes:
> close, but no cigar. i kept on dropping that constant until it worked
> for all of my tables. i ended up at 0.0002, but i still haven't tried
> it on my biggest tables.
Urgh. That seems way too low to put in as a default estimate,
especially considering that that same estimator routine is used for
several different operators. We need to look at this some more.
> i assumed a fairly linear relationship between # of records and the
> value of that constant so that value should work into the low 1
> million record range at least.
Actually, the cost estimator for indexscans is deliberately not linear,
since it's trying to model the effects of hits in a buffer cache ...
perhaps that's a pointless refinement when we have no accurate idea of
the size of the kernel's buffer cache, but certainly the real-world
behavior is not going to be linear.
> why did it change so much from 6.5.3? IIRC, it was somewhere around
> 0.25 in 6.5.3.
The old code had a *drastic* underestimate of the costs of indexscans
versus sequential scans, so it would tend to choose an indexscan even
for a query with a very large selectivity ratio. Believe me, if you
were running a query that actually returned a quarter of the rows in
your table, you would not want an indexscan --- but 6.5 would give you
one. 7.0 won't, which means that there's now a premium on making a
selectivity estimate that has something to do with reality.
> without understanding how selectivity functions work, would it even be
> possible to come up with meaningful functions for geometric types &
> rtrees?
Good question. I haven't looked at the literature at all, but a first
thought is that you might be able to do something useful given the
bounding box of all data in the table ... which is a stat that VACUUM
does *not* compute, but perhaps could be taught to.
regards, tom lane