Re: Yet another abort-early plan disaster on 9.3 - Mailing list pgsql-performance

From Simon Riggs
Subject Re: Yet another abort-early plan disaster on 9.3
Date
Msg-id CA+U5nMKnwZVkvdaQP7Sm15-Emdwt+TgnW7-0xQJvZPtHu2Wsvw@mail.gmail.com
Whole thread Raw
In response to Re: Yet another abort-early plan disaster on 9.3  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: Yet another abort-early plan disaster on 9.3
Re: Yet another abort-early plan disaster on 9.3
List pgsql-performance
On 29 September 2014 16:00, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> The problem, as I see it, is different. We assume that if there are
>> 100 distinct values and you use LIMIT 1 that you would only need to
>> scan 1% of rows. We assume that the data is arranged in the table in a
>> very homogenous layout. When data is not, and it seldom is, we get
>> problems.
>
> Hm, good point -- 'data proximity'.  At least in theory, can't this be
> measured and quantified?  For example, given a number of distinct
> values, you could estimate the % of pages read (or maybe non
> sequential seeks relative to the number of pages) you'd need to read
> all instances of a particular value in the average (or perhaps the
> worst) case.   One way of trying to calculate that would be to look at
> proximity of values in sampled pages (and maybe a penalty assigned for
> high update activity relative to table size).  Data proximity would
> then become a cost coefficient to the benefits of LIMIT.

The necessary first step to this is to realise that we can't simply
apply the LIMIT as a reduction in query cost, in all cases.

The way I'm seeing it, you can't assume the LIMIT will apply to any
IndexScan that doesn't have an index condition. If it has just a
filter, or nothing at all, just an ordering then it could easily scan
the whole index if the stats are wrong.

So plans like this could be wrong, by assuming the scan will end
earlier because of the LIMIT than it actually will.

Limit
  IndexScan (no index cond)

Limit
  NestJoin
    IndexScan (no index cond)
    SomeScan

Limit
  NestJoin
    NestJoin
      IndexScan (no index cond)
      SomeScan
   SomeScan

and deeper...

I'm looking for a way to identify and exclude such plans, assuming
that this captures at least some of the problem plans.

Comments? Test Cases?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-performance by date:

Previous
From: "Graeme B. Bell"
Date:
Subject: Re: Very slow postgreSQL 9.3.4 query
Next
From: Josh Berkus
Date:
Subject: Re: Yet another abort-early plan disaster on 9.3