Re: Yet another abort-early plan disaster on 9.3 - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: Yet another abort-early plan disaster on 9.3
Date
Msg-id CAHyXU0yjuCwH47qv=Lnx3BaSu69CfdKadc11oveTcB9T4K0hGQ@mail.gmail.com
Whole thread Raw
In response to Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Yet another abort-early plan disaster on 9.3
List pgsql-performance
On Fri, Dec 5, 2014 at 12:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 30 September 2014 at 05:53, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 29 September 2014 16:00, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> The problem, as I see it, is different. We assume that if there are
>>>> 100 distinct values and you use LIMIT 1 that you would only need to
>>>> scan 1% of rows. We assume that the data is arranged in the table in a
>>>> very homogenous layout. When data is not, and it seldom is, we get
>>>> problems.
>>>
>>> Hm, good point -- 'data proximity'.  At least in theory, can't this be
>>> measured and quantified?  For example, given a number of distinct
>>> values, you could estimate the % of pages read (or maybe non
>>> sequential seeks relative to the number of pages) you'd need to read
>>> all instances of a particular value in the average (or perhaps the
>>> worst) case.   One way of trying to calculate that would be to look at
>>> proximity of values in sampled pages (and maybe a penalty assigned for
>>> high update activity relative to table size).  Data proximity would
>>> then become a cost coefficient to the benefits of LIMIT.
>>
>> The necessary first step to this is to realise that we can't simply
>> apply the LIMIT as a reduction in query cost, in all cases.
>>
>> The way I'm seeing it, you can't assume the LIMIT will apply to any
>> IndexScan that doesn't have an index condition. If it has just a
>> filter, or nothing at all, just an ordering then it could easily scan
>> the whole index if the stats are wrong.
>>
>> So plans like this could be wrong, by assuming the scan will end
>> earlier because of the LIMIT than it actually will.
>>
>> Limit
>>   IndexScan (no index cond)
>>
>> Limit
>>   NestJoin
>>     IndexScan (no index cond)
>>     SomeScan
>>
>> Limit
>>   NestJoin
>>     NestJoin
>>       IndexScan (no index cond)
>>       SomeScan
>>    SomeScan
>>
>> and deeper...
>>
>> I'm looking for a way to identify and exclude such plans, assuming
>> that this captures at least some of the problem plans.
>
> After looking at this for some time I now have a patch that solves this.
>
> It relies on the observation that index scans with no bounded quals
> don't play nicely with LIMIT. The solution relies upon the point that
> LIMIT does not reduce the startup cost of plans, only the total cost.
> So we can solve the problem by keeping the total cost estimate, just
> move some of that into startup cost so LIMIT does not reduce costs as
> much as before.
>
> It's a simple patch, but it solves the test cases I know about and
> does almost nothing to planning time.
>
> I tried much less subtle approaches involving direct prevention of
> LIMIT pushdown but the code was much too complex for my liking.

Neat -- got any test cases (would this have prevented OP's problem)?

merlin


pgsql-performance by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Yet another abort-early plan disaster on 9.3
Next
From: Simon Riggs
Date:
Subject: Re: Yet another abort-early plan disaster on 9.3