Re: An Idea for planner hints - Mailing list pgsql-hackers

From Arturo Pérez
Subject Re: An Idea for planner hints
Date
Msg-id 32EA4E1D-7005-4ED4-8D49-C7A172EFF3F5@ethicist.net
Whole thread Raw
In response to Re: An Idea for planner hints  ("Jim C. Nasby" <jnasby@pervasive.com>)
Responses Re: An Idea for planner hints
List pgsql-hackers
On Aug 15, 2006, at 10:40 AM, Jim C. Nasby wrote:

> On Mon, Aug 14, 2006 at 11:41:29PM +0300, Hannu Krosing wrote:
>> ??hel kenal p??eval, E, 2006-08-14 kell 18:21, kirjutas Peter  
>> Eisentraut:
>>> Perez wrote:
>>>> I thought, from watching the list for a while, that the planner
>>>> statistics needed were known but that how to gather the statistics
>>>> was not?
>>>
>>> I think over the course of the discussion we have figured out  
>>> that we
>>> would like to have cross-column correlation statistics.  The precise
>>> mathematical incarnation hasn't been determined yet, as far as I can
>>> see.  Collecting the statistics thereafter isn't that hard, but  
>>> there
>>> needs to be a way to not collect an exponential volume of  
>>> statistics on
>>> all column combinations.
>>
>> I understood that the proposal was to collect only the stats where
>> needed (determined by user/dba) and use some rule-of-thumb values  
>> if no
>> collected stats were available.
>
> Yeah, unless someone comes up with some kind of 'magic', I think  
> trying
> to handle every cross-column possibility is a non-starter. IIRC, that
> argument is what's stalled cross-column stats every time in the  
> past. It
> makes a lot more sense to allow defining what combinations of  
> columns we
> need stats for.
>
> After that's done, it'd be easy to then write a script that will tell
> the database to collect stats on all multi-column indexes, RI, etc.  
> Down
> the road, the planner could even be made to log (in a machine-readable
> format) every time it needs cross-column stats, and that data could be
> used to add stats that are needed.

If we're talking about my random neuron firing then I think the  
responses have gone off
a bit.  My thought was to just tell the planner the statistics that  
are of interest.

An example of what I'm thinking would probably be helpful.  Let's say  
that the
DBA knows, through whatever means at his/her disposal (heck! the  
magic you mention)
that column a & column b have some sort of correlation that the  
planner can't determine
on its own but can use if it had it.  The DBA therefore pokes the  
right information into
the planner's statistical tables (or, perhaps, a more human- 
manageable one that gets
"compiled" into the planner's stats).

For this to work we'd have to
1.  Define the types of statistics that the planner could use in its  
planning that
it cannot currently (or ever) collect itself.  Cross-column  
correlations, suitable
join selectivity, anything that would be useful to the planner.
2. Create a table or other data structure to contain this planner  
information.  Modify
the planner to use this information.
3. Document what these stats are, and the influence they have in a  
format suitable
for use by DBAs, and how to add the stats to the above table.  Mere  
mortals can tinker
with this feature at their own peril :-)

Now, when a DBA has information that could steer the planner in the  
right direction
he/she has a mechanism to do so that does not involve hinting the  
specific query.  My
hope would be that this information wouldn't go stale as fast as a  
query hint would.
Furthermore, the DBA can improve an application's performance without  
having to go
into every query it executes.

The planner would look in that table and say Ah! there's information  
in here that says
that when a is joined to be it's going to eliminate 90% of my I/O.

Seems to me that such a feature would be a cool knob and address most/ 
all of the need for
query hints.

One other possibility for the above information would be just to have  
a place for
the planner to save information for itself when it finds a plan to be  
either horribly
over-optimistic or pessimistic.

Hope this blathering makes some kind of sense...
-arturo




pgsql-hackers by date:

Previous
From: Kenneth Marshall
Date:
Subject: Re: BugTracker (Was: Re: 8.2 features status)
Next
From: Jens-Wolfhard Schicke
Date:
Subject: Re: insert/update/delete returning and rules