Re: An Idea for planner hints - Mailing list pgsql-hackers

From Florian G. Pflug
Subject Re: An Idea for planner hints
Date
Msg-id 44E4B10B.4090409@phlo.org
Whole thread Raw
In response to Re: An Idea for planner hints  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
Peter Eisentraut wrote:
> Arturo Pérez wrote:
>> The DBA therefore pokes the
>> right information into
>> the planner's statistical tables (or, perhaps, a more human-
>> manageable one that gets
>> "compiled" into the planner's stats).
> 
> I think we're perfectly capable of producing a system that can collect 
> the statistics.  We just don't want to collect every possible 
> statistic, but just those that someone declared to be interesting 
> beforehand.  There need not be any manual poking.  Just manual 
> declaring.
> 
> But we need to work this from the other end anyway.  We need to 
> determine first, what sort of statistics the planner could make use of.  
> Then we can figure out the difficulties in collecting them.

I've been told that oracle has an interesting feature regarding
materialized views that gave me an idea how to declare what statistics
to gather. It seems as if oracle is able to figure out that it can
use a certain materialized view to speed up execution of a certain
query, even if the query doesn't use that view explicitly. So, e.g.
if you do

1) create materialized view v as select * from t1 join t2 on t1.t2_id = 
t2.id.
2) select * from t1 join t2 on t1.t2_id = t2.id join t3 on t3.t2_id = t2.id

then oracle seems to be able to use the already-joined tuples in v, and
only needs to join t3 to those, instead of having to rejoin t1 and t2.

That gave me the idea that something similar could be used to declare
what statistics to gather, in a very general way. Imagine that I could
do.

1) create statistics for select * from t1 join t2 on t1.t2_id and 
t1.flag = TRUE.
2) select * from t1 join t2 on t1.t2_id and t1.flag = TRUE join t3 on 
...  join t4 on ...

The command 1) would basically gather the same statistics for the result
of the query as it would gather for a normal table with the same signature.
When planning 2), postgres would recognize that it can use those
statistics (similar to how oracle recognizes that it can use a certain
materialized view), and would thus. know the selectivity of that
particular join very accurately.

I think there might even be a way to do (1) without actually executing 
the (whole) query. If every access-method in the query plan could be
told to deliver only say 10% of the rows it would deliver "normally",
but the rest of the plan was executed normally, then the result should
have the same statistical properties as the complete result would have.

greetings, Florian Pflug



pgsql-hackers by date:

Previous
From: Chris Mair
Date:
Subject: Re: [PATCHES] selecting large result sets in psql using
Next
From: "Jim C. Nasby"
Date:
Subject: Re: BugTracker (Was: Re: 8.2 features status)