Re: Incorrect assumptions with low LIMITs - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Incorrect assumptions with low LIMITs
Date
Msg-id CA+U5nMLSvw5wuickWAoHu629BQHUt26ws=3MEAe+0x_788PwUw@mail.gmail.com
Whole thread Raw
In response to Re: Incorrect assumptions with low LIMITs  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Fri, Mar 16, 2012 at 9:39 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, 2012-03-16 at 18:25 +0000, Simon Riggs wrote:
>> Any time we apply a LIMIT clause to a plan with a SeqScan or
>> unqualified IndexScan, we shouldn't assume the scan will do less than
>> say 10% of the table. It might, but its an unsafe assumption because
>> as the selectivity decreases so does the safety of the assumption that
>> rows are uniformly distributed.
>
> Just trying to follow along. You mean "as the selectivity _increases_
> the safety of the assumption that the rows are uniformly distributed
> decreases", right?

Selectivity meaning the value between 0 and 1 that describes, in the
planner, the fraction of rows we will get back from a scan. 1.0 means
100% of rows. When selectivity is low, that means very few rows will
come back. I think you are using "high selectivity" as meaning
"returns few of the rows", so you understand me, but just flip the
meaning of the words.

When you have lots of rows, its a good assumption they are spread out
and a scan will find some of them quickly.

When you have very few rows, assuming they are evenly spaced is just
weird. Clumpiness of some kind seems much more likely. Much more
easily understood if the values are dates, for example.

Given the estimated number of rows is deliberately a worst case (i,e.
high), that sounds like the scan will work. Yet the reality is that
doing the scan is incredibly costly when those assumptions break,
which they do, often. Especially when the values don't exist at all
because the table is sparse.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Command Triggers, patch v11
Next
From: Tareq Aljabban
Date:
Subject: Re: Storage Manager crash at mdwrite()