On Fri, Mar 16, 2012 at 9:39 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, 2012-03-16 at 18:25 +0000, Simon Riggs wrote:
>> Any time we apply a LIMIT clause to a plan with a SeqScan or
>> unqualified IndexScan, we shouldn't assume the scan will do less than
>> say 10% of the table. It might, but its an unsafe assumption because
>> as the selectivity decreases so does the safety of the assumption that
>> rows are uniformly distributed.
>
> Just trying to follow along. You mean "as the selectivity _increases_
> the safety of the assumption that the rows are uniformly distributed
> decreases", right?
Selectivity meaning the value between 0 and 1 that describes, in the
planner, the fraction of rows we will get back from a scan. 1.0 means
100% of rows. When selectivity is low, that means very few rows will
come back. I think you are using "high selectivity" as meaning
"returns few of the rows", so you understand me, but just flip the
meaning of the words.
When you have lots of rows, its a good assumption they are spread out
and a scan will find some of them quickly.
When you have very few rows, assuming they are evenly spaced is just
weird. Clumpiness of some kind seems much more likely. Much more
easily understood if the values are dates, for example.
Given the estimated number of rows is deliberately a worst case (i,e.
high), that sounds like the scan will work. Yet the reality is that
doing the scan is incredibly costly when those assumptions break,
which they do, often. Especially when the values don't exist at all
because the table is sparse.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services