Re: An Idea for planner hints - Mailing list pgsql-hackers

From Florian G. Pflug
Subject Re: An Idea for planner hints
Date
Msg-id 44D9BFA3.1090800@phlo.org
Whole thread Raw
In response to Re: An Idea for planner hints  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: An Idea for planner hints
Re: An Idea for planner hints
List pgsql-hackers
Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
>> ISTM theat the easiest way would be to introduce a sort of predicate
>> like so:
> 
>> SELECT * FROM foo, bar WHERE pg_selectivity(foo.a = bar.a, 0.1);
> 
> The one saving grace of Florian's proposal was that you could go hack
> the statistics *without* changing your queries.  This throws that away
> again.
I think for this to be really effective, you'd actually need both - a
query-independent way specifying selectivities, and a way to influence
the estimates for a _single_ query.

Image a complex, autogenerated query with looks something like this
select ....
from t1
join t2 on ...
join t3 on ...
join t4 on ...
...
...
where  <big, complicated expression derived from some user input>.

This big, complicated expression looks different for every query - and
currently, postgres often vastly overestimates the selectivity of this
expression. This leads to weird join orders, and generally very bad 
performance. Of course, *I* don't know the selectivity of this 
expression myself - but experience tells me that on average it's 
something like 50%, and not 1% as postgres believes. So, in that case,
being able to write

select ... join .... where pg_selectivity(<expression>, 0.5)
would be a big win.

> The thing I object to about the "I want to decorate my queries with
> planner hints" mindset is that it's coming at it from the wrong
> direction.  You should never be thinking in terms of "fix this one
> query", because that just leads back into the same dead end that your
> fix doesn't work tomorrow.  What you *should* be thinking about is "why
> did the planner get this wrong, and how do I fix the generic problem?".
> If you attack it that way then your fix is much more likely to work on
> the next slightly-different query.

Fixing the generic problem is surely the best _if_ there is a fix for
the generic problem at all. But if your where-conditions involves fields
from 10 different tables, then IMHO there is no way to _ever_ guarantee
that postgres will get correct selectivity estimates. But since (at 
least for me) overestimating selectivity hurts fare more than 
underestimating it, forcing postgres to just assume a certain 
selectivity could help.

I'm not in any way saying that there should _only_ be selectivity
annotations inside the query - a query-independent mechanism would
be a very nice thing to have. But a query-independent mechanism
wont be sufficient in all cases IMHO.

greetings, Florian Pflug


pgsql-hackers by date:

Previous
From: "Pavel Stehule"
Date:
Subject: Re: proposal for PL packages for 8.3.
Next
From: Simon Riggs
Date:
Subject: Re: [PATCHES] Restartable Recovery