Re: SQL select query becomes slow when using limit (with no offset) - Mailing list pgsql-performance

From Robert Haas
Subject Re: SQL select query becomes slow when using limit (with no offset)
Date
Msg-id 603c8f070908072037k1f09c6fdxc04aeb0e07f87208@mail.gmail.com
Whole thread Raw
In response to Re: SQL select query becomes slow when using limit (with no offset)  (Scott Carey <scott@richrelevance.com>)
Responses Re: SQL select query becomes slow when using limit (with no offset)  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-performance
On Fri, Aug 7, 2009 at 5:09 PM, Scott Carey<scott@richrelevance.com> wrote:
> On 8/7/09 5:53 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
>
>> On Fri, Aug 7, 2009 at 4:00 AM, Kees van Dieren<keesvandieren@gmail.com>
>> wrote:
>>> Would it get attention if I submit this to
>>> http://www.postgresql.org/support/submitbug ? (in fact it is not really a
>>> bug, but an improvement request).
>>
>> I think that many of the people who read that mailing list also read
>> this one, including, critically, Tom Lane, and you're not the first
>> person to run into a problem caused by lack of cross-column
>> statistics.
>
> Critically, it should be understood that this general problem is not just
> born from lack of cross-column statistics.
>
> It is also one of using the statistical expected value to calculate cost
> without consideration of the variance.  Often, the cost of a plan varies
> widely and nonlinearly with a small change in the expected value the stats
> used to estimate cost.
>
> The problem is acute with LIMIT and various other boundary conditions where
> there is a steep 'cost cliff' for certain plan types.  When LIMIT is
> applied, the planner changes its estimates, but does not take into account
> the _greatly_ increased uncertainty of those estimates.
>
> Imagine a case where the planner's estimate is 100% correct, and on average
> one side of a join will have 2 tuples.  The planner chooses nested loops to
> do that join.  But the distribution of the number of tuples at this node is
> skewed, so although the expected value is 2, a values of 10 and 0 are both
> common.  When values of 10 occur, the execution time goes up significantly.
> An alternate plan might be slower for the case where the actual values in
> execution equal the expected values, but faster in the average case!
> The average cost of a plan is NOT that cost of the query with average
> statistics, due to variance, nonlinearity, and skew.  And even if they were
> equal, it might not be preferable to choose the plan that is best on average
> over the one that is best at the 90th percentile.
>
> Getting back to the case above with the nestloop -- if the planner estimated
> the cost of the nestloop join versus other joins with some idea of the
> variance in the estimations it could favor more 'stable' execution plans.
>
> So in short, cross-column statistics don't have to be gathered and used to
> make this problem less acute.  The planner just has to be more aware of
> variance and the things that lead to it, such as column correlation.
> Thus with a LIMIT applied, the expected value for the number of tuples
> scanned before a match will shrink, but the uncertainty of this estimate
> grows significantly as well, so the right plan to choose is one that hedges
> against the uncertainty, not the one that assumes the expected value is
> correct.

This is a good analysis.  I think I proposed some kind of idea about
tracking uncertainty in the planner a while back, but I never got very
far with it.  The problem is, first, figuring out how to estimate the
uncertainty, and second, figuring out what to do with the result once
you've estimated it.  The concept of hedging against uncertainty is
the right one, I think, but it's not obvious how to fit that into the
cost-comparison algorithm that the planner uses.  A sticking point
here too is that the comparison of path costs is already a hot spot;
making it more complex will likely have a noticeable negative impact
on query planning time.  For queries against huge databases that may
not matter much, but for OLTP queries it certainly does.

There are a couple of methods that have been proposed to deal with
this in the past.  The most obvious one that's been talked about a
couple of times is switching a nested loop to a hash join if the
number of iterations exceeds some bound, which would require some
executor support.

Empirically, it seems to me that the planner generally follows a
pretty consistent pattern.  If the inner relation is tiny or the
number of loops is estimated to be very small, it uses a nested loop.
When the inner rel is a bit bigger, or the number of iterations is
nontrivial, it switches to a hash join.  When the inner relation gets
too big to fit in work_mem, or just big enough that hashing it looks
too slow, it switches to a nested loop with inner index-scan or,
especially if a useful sort is available, a merge join.

Just handling better the case where we pick a straight nested loop
rather than a hash join would help a lot of people.  Some basic
conservatism about the number of outer rows would be useful here (in
particular, we should probably assume that there will be at least 2
when costing a nest loop, unless the outer side is known unique), and
it's also worth thinking about the fact that a hash join won't build
the table unless there is at least 1 outer row, which I don't think
the current costing algorithm takes into account.  Or maybe we should
estimate the amount by which the nested loop figures to beat out the
hash join and only accepted the nested loop plan if the win exceeds
some threshold percentage.

...Robert

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: PG-related ACM Article: "The Pathologies of Big Data"
Next
From: Bruce Momjian
Date:
Subject: Re: [BUGS] BUG #4919: CREATE USER command slows down system performance