Re: Design: Escort info from WHERE clause to executor? - Mailing list pgsql-hackers

From peter.trautmeier@gmx.de
Subject Re: Design: Escort info from WHERE clause to executor?
Date
Msg-id 20070725165646.77860@gmx.net
Whole thread Raw
In response to Re: Design: Escort info from WHERE clause to executor?  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-hackers
Thanks Heikki,

Von: Heikki Linnakangas <heikki@enterprisedb.com>
> > I am implementing a technique that sorts a result set according to
> weight annotations in the WHERE.
> > 
> > The query
> > 
> > SELECT * FROM cars 
> > WHERE (cdChanger=1){2} 
> >    OR (mp3player=1){1} 
> > 
> > would be sorted according to partial conditions that hold.
> 
> You could do that like this, with no need to hack the backend:
> 
> SELECT * FROM cars
> WHERE (cdChanger=1)
>    OR (mp3player=1)
> ORDER BY (CASE WHEN cdchanger=1 THEN 2 ELSE 0 END) +
>          (CASE WHEN mp3player=1 THEN 1 ELSE 0 END) DESC;

that is certainly a nice, straigthforward solution, however, I forgot to mention my most important goal: I want to
optimizethose weighted queries within pg.
 

As I believe these weights add - in certain cases - additional info that can be exploited to gain nice performance
improvementsin the face of top-k queries.
 

Just 2 examples for top-k query optimization:

1) Consider a query like

SELECT * FROM cars 
WHERE (cdChanger=1){1}   OR (mp3player=1){42}   OR (resemblesBatmobile=1){4711}
LIMIT 10;

where the weights differ considerably. Here it might suffice to consider tuples with (resemblesBatmobile=1){4711} at
firstand filter with that condition. If this pre filtering yields (at least) 10 tuples, we're left off with the task to
sortjust the remaining set according to the remaining conditions and their weights (cdChanger=1){1} and
(mp3player=1){42}.
 

In a case like 
...
WHERE (cdChanger=1){1}   OR (mp3player=1){42}   OR (resemblesBatmobile=1){4711}  OR (resemblesKITT=1){4711}

where two weights are equal the selectivities of resemblesBatmobile=1 and resemblesKITT=1 could aid the optimizers
decisionwhich condition nicest for a pre filtering.
 
(Consider how great the benefit is in case of expensive predicates like subqueries, contrary to these toy comparison
opsof mine.)
 

And so on - your idea here.

2) The second example is a form of join optimization. Consider a query with a simple join

SELECT *
FROM   R,S
WHERE  R.foobar = S.foobar  AND ((R.foo = 1){10}    OR (S.bar = 6){20})
LIMIT  10;

where again just 10 tuples are needed. Normally, the planner would generate a join that evaluates the R.foo = 1 or
S.bar= 6 - there's no other choice.
 

However by rewriting the condition in an equivalent form we can exploit the weights by creating 3 Joins instead of a
singleone:
 

WHERE  R.foobar = S.foobar  AND ((R.foo  = 1){10}    AND (S.bar = 6){20})     --> cream of the crop: weight 30ORii  AND
((R.foo<> 1){10}    AND (S.bar = 6){20})     --> second class citizens: weight 20OR  AND ((R.foo = 1){10}    AND (S.bar
<>6){20})    --> still ok: weight 10
 

By rewriting we gain 2 nice properties:

1. The conditions can be pushed from the (now 3) joins to the selection at the leaf-relation.

2. The tuples are disjunctive regarding their weights: The first join yields the best tuples with weight 30. If it
gainsthe needed 10 tuples, we're set - voila. Otherwise, well, let's consider the next join with the w20-tuples.
Iterate.

I believe there are many more ways to speed up execution with the help of weights.

That's why I need the weights around in the executor, desperately :)

Regards,
Peter



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Machine available for community use
Next
From: Stefan Kaltenbrunner
Date:
Subject: Re: Machine available for community use