Re: Yet another abort-early plan disaster on 9.3 - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: Yet another abort-early plan disaster on 9.3
Date
Msg-id CAHyXU0wwGNcDVE8pcXeYPr9+mdUG0Yy1pbgDg6hWQMF+er4uog@mail.gmail.com
Whole thread Raw
In response to Re: Yet another abort-early plan disaster on 9.3  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
On Tue, Sep 30, 2014 at 11:54 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Mon, Sep 29, 2014 at 7:12 PM, Gavin Flower
> <GavinFlower@archidevsys.co.nz> wrote:
>>
>>
>> Would it be feasible to get a competent statistician to advise what data
>> to collect, and to analyze it?  Maybe it is possible to get a better
>> estimate on how much of a table needs to be scanned, based on some fairly
>> simple statistics.  But unless research is done, it is probably impossible
>> to determine what statistics might be useful, and how effective a better
>> estimate could be.
>>
>> I have a nasty feeling that assuming a uniform distribution, may still end
>> up being the best we can do - but I maybe being unduly pessimistic!.
>
> As a semi-competent statistician, my gut feeling is that our best bet would
> be not to rely on the competence of statisticians for too much, and instead
> try to give the executor the ability to abandon a fruitless path and pick a
> different plan instead. Of course this option is foreclosed once a tuple is
> returned to the client (unless the ctid is also cached, so we can make sure
> not to send it again on the new plan).
>
> I think that the exponential explosion of possibilities is going to be too
> great to analyze in any rigorous way.

Call it the 'Parking in Manhattan' strategy -- you know when it's time
to pull forward when you've smacked into the car behind you.

Kidding aside, this might be the path forward since it's A. more
general and can catch all kinds of problem cases that our statistics
system won't/can't catch and B. At least in my case it seems like more
complicated plans tend to not return much data until the inner most
risky parts have been involved.  Even if that wasn't the case,
withholding data to the client until a user configurable time
threshold had been passed (giving the planner time to back up if
necessary) would be a reasonable user facing tradeoff via GUC:
'max_planner_retry_time'.

merlin


pgsql-performance by date:

Previous
From: Gavin Flower
Date:
Subject: Re: Yet another abort-early plan disaster on 9.3
Next
From: Simon Riggs
Date:
Subject: Re: Yet another abort-early plan disaster on 9.3