Thread: Design: Escort info from WHERE clause to executor?

Design: Escort info from WHERE clause to executor?

From
peter.trautmeier@gmx.de
Date:
Hi all,

I want to pass additional weight info from the WHERE clause to the executor and I hope someone can help me with this.

I accept clauses like the following

WHERE (foo='a'){1}
WHERE (foo='a'){1} OR (bar='b'){2}
WHERE ((foo='a'){1} OR (bar='b'){2})){42} OR (baz='c'){3}

where the {} takes an integer as a weight that is attached to the preceding (partial) condition.

In the executor, I need to access (1) the logical value of and (2) the weight associated with _each_ subexpression that
wasentered. (Getting the weight from the parser to the executor is in itself a journey it seems, as some expression
typesare created anew - and not copied - and lose their annotated weight over and over again.)
 

Furthermore I need the structure of OR to be preserved; the OR-of-OR structure from the last WHERE must be preserved or
atleast be  reconstructible and must not be folded into a 3-valued OR (as canonicalize_qual and friends do.)
 

To sum up, I am looking for a (decently efficient) scheme that is able to

(1) pass arbitrary conditional expressions from WHERE to the executor in a structure preserving way. 
(2) annotate arbitrary expressions with weights that survive on its way from the parser to the executor.
(3) access the logical value of particular subexpressions.

I have some basic ideas how at least some of the requirements might be achieved. But as I am not totally satisfied with
myideas I hope you can provide me with some fresh input.
 

ANY ideas are welcome.

Regards,
Peter


Re: Design: Escort info from WHERE clause to executor?

From
imad
Date:
It looks like you need a customized version of AExpr Node.
In the backend parser, an AExpr Node is constructed against each given
WHERE expression. You can store the weight along with the expression.
Further, don't forget to upgrade the copy functions and equal
functions for AExpr if you want to take this weight value all the way
upto the executor.


--Imad
www.EnterpriseDB.com


On 7/24/07, peter.trautmeier@gmx.de <peter.trautmeier@gmx.de> wrote:
> Hi all,
>
> I want to pass additional weight info from the WHERE clause to the executor and I hope someone can help me with
this.
>
> I accept clauses like the following
>
> WHERE (foo='a'){1}
> WHERE (foo='a'){1} OR (bar='b'){2}
> WHERE ((foo='a'){1} OR (bar='b'){2})){42} OR (baz='c'){3}
>
> where the {} takes an integer as a weight that is attached to the preceding (partial) condition.
>
> In the executor, I need to access (1) the logical value of and (2) the weight associated with _each_ subexpression
thatwas entered. (Getting the weight from the parser to the executor is in itself a journey it seems, as some
expressiontypes are created anew - and not copied - and lose their annotated weight over and over again.)
 
>
> Furthermore I need the structure of OR to be preserved; the OR-of-OR structure from the last WHERE must be preserved
orat least be  reconstructible and must not be folded into a 3-valued OR (as canonicalize_qual and friends do.)
 
>
> To sum up, I am looking for a (decently efficient) scheme that is able to
>
> (1) pass arbitrary conditional expressions from WHERE to the executor in a structure preserving way.
> (2) annotate arbitrary expressions with weights that survive on its way from the parser to the executor.
> (3) access the logical value of particular subexpressions.
>
> I have some basic ideas how at least some of the requirements might be achieved. But as I am not totally satisfied
withmy ideas I hope you can provide me with some fresh input.
 
>
> ANY ideas are welcome.
>
> Regards,
> Peter
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                 http://www.postgresql.org/about/donate
>


Re: Design: Escort info from WHERE clause to executor?

From
Heikki Linnakangas
Date:
peter.trautmeier@gmx.de wrote:
> To sum up, I am looking for a (decently efficient) scheme that is able to
> 
> (1) pass arbitrary conditional expressions from WHERE to the executor in a structure preserving way. 
> (2) annotate arbitrary expressions with weights that survive on its way from the parser to the executor.
> (3) access the logical value of particular subexpressions.
> 
> I have some basic ideas how at least some of the requirements might be achieved. But as I am not totally satisfied
withmy ideas I hope you can provide me with some fresh input.
 

Why? What are you trying to achieve?

-- Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Design: Escort info from WHERE clause to executor?

From
peter.trautmeier@gmx.de
Date:
Thanks imad,

Von: imad <immaad@gmail.com>
> It looks like you need a customized version of AExpr Node.
> In the backend parser, an AExpr Node is constructed against each given
> WHERE expression. You can store the weight along with the expression.
> Further, don't forget to upgrade the copy functions and equal
> functions for AExpr if you want to take this weight value all the way
> upto the executor.

I have already done that, alas, it doesn't suffice: During parse analysis transformExpr() turns the AExpr into other
types,e.g. an OpExpr in case of foo = 42.
 

These newly created OpExprs won't last until they reach the executor in every case, though: They are sometimes
recreatedanew during planning 'manually', i.e. not by using the standard copy functions but by creating a new OpExpr
nodeand copying the fields on an as-seen-before basis - these are the places where my weight gets lost.
 

To be honest, I consider not using a special copy function a minor design flaw :) *sigh*

Anyway, thanks again. But giving a weight to AExpr is just the tip of the iceberg when it comes to my final goal - let
myweights enter the executor ;)
 

Regards,
Peter


Re: Design: Escort info from WHERE clause to executor?

From
peter.trautmeier@gmx.de
Date:
Von: Heikki Linnakangas <heikki@enterprisedb.com>
> peter.trautmeier@gmx.de wrote:
> > To sum up, I am looking for a (decently efficient) scheme that is able
> to
> > 
> > (1) pass arbitrary conditional expressions from WHERE to the executor in
> a structure preserving way. 
> > (2) annotate arbitrary expressions with weights that survive on its way
> from the parser to the executor.
> > (3) access the logical value of particular subexpressions.
> > 
> > I have some basic ideas how at least some of the requirements might be
> achieved. But as I am not totally satisfied with my ideas I hope you can
> provide me with some fresh input.
> 
> Why? What are you trying to achieve?

I am implementing a technique that sorts a result set according to weight annotations in the WHERE.

The query

SELECT * FROM cars 
WHERE (cdChanger=1){2}   OR (mp3player=1){1} 

would be sorted according to partial conditions that hold.

Cars that have both a CD changer AND a MP3 player get a weight of 3, i.e. (2+1).
Cars that only have a CD changer get a weight of 2.
Cars that only have a MP3 player get a weight of 1.
Cars that have neither a CD changer nor a MP3 player do not belong to the result set anyway.

I have to sort the tuples according to their individual weight - that is why I need to annotate arbitrary expressions
withweights.
 

This is a simple example, but in case of cascaded ORs and ANDs the semantics gets slightly trickier - that is why I
needthe structure of the original WHERE clause preserved.
 

The executor as it stands now is evaluating quals in a short circuit manner, and that is totally feasible: As soon as a
singlesubexpression of and ANDed qual is false, it stops. 
 

However, in order to sort the tuples in the result set I generally need to know the logical values of all
subexpressions,or at least more of them as the executor needs for its rather coarsely grained decision
'belongs-to-result-set'or 'does-not-belong'.
 

In general, I have to evaluate the original WHERE expr tree level by level, starting at the top(root) that represents
thewhole condition and possibly visiting and evaluating every single subexpression in the tree to every leaf, until I
haveenough information to compare two tuples with each other.
 

I hope I got the basic idea across, but please don't hesitate to ask for more details if I failed to paint the picture
clearlyenough. I appreciate your interest.
 

(I know that seems complicated, and indeed I am getting the feeling that this _is_ complicated. But hey, then again
it'smy duty ;) )
 

Regards,
Peter


Re: Design: Escort info from WHERE clause to executor?

From
Heikki Linnakangas
Date:
peter.trautmeier@gmx.de wrote:
> Von: Heikki Linnakangas <heikki@enterprisedb.com>
>> peter.trautmeier@gmx.de wrote:
>>> To sum up, I am looking for a (decently efficient) scheme that is able
>> to
>>> (1) pass arbitrary conditional expressions from WHERE to the executor in
>> a structure preserving way. 
>>> (2) annotate arbitrary expressions with weights that survive on its way
>> from the parser to the executor.
>>> (3) access the logical value of particular subexpressions.
>>>
>>> I have some basic ideas how at least some of the requirements might be
>> achieved. But as I am not totally satisfied with my ideas I hope you can
>> provide me with some fresh input.
>>
>> Why? What are you trying to achieve?
> 
> I am implementing a technique that sorts a result set according to weight annotations in the WHERE.
> 
> The query
> 
> SELECT * FROM cars 
> WHERE (cdChanger=1){2} 
>    OR (mp3player=1){1} 
> 
> would be sorted according to partial conditions that hold.

You could do that like this, with no need to hack the backend:

SELECT * FROM cars
WHERE (cdChanger=1)  OR (mp3player=1)
ORDER BY (CASE WHEN cdchanger=1 THEN 2 ELSE 0 END) +        (CASE WHEN mp3player=1 THEN 1 ELSE 0 END) DESC;

Or a bit shorter version, exploiting the cast from boolean to int:

SELECT * FROM cars
WHERE (cdChanger=1)  OR (mp3player=1)
ORDER BY (cdchanger=1)*2 + (mp3player=1)*1 DESC;

-- Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Design: Escort info from WHERE clause to executor?

From
"Dawid Kuroczko"
Date:
On 7/25/07, peter.trautmeier@gmx.de <peter.trautmeier@gmx.de> wrote:
> >
> > Why? What are you trying to achieve?
>
> I am implementing a technique that sorts a result set according to weight annotations in the WHERE.
>
> The query
>
> SELECT * FROM cars
> WHERE (cdChanger=1){2}
>    OR (mp3player=1){1}
>
> would be sorted according to partial conditions that hold.
>
> Cars that have both a CD changer AND a MP3 player get a weight of 3, i.e. (2+1).
> Cars that only have a CD changer get a weight of 2.
> Cars that only have a MP3 player get a weight of 1.

Hmm, any particular reason why not doing it this way: ?

SELECT * FROM carsWHERE cdChanger=1 OR mp3player=1ORDER BY CASE WHEN cdChanger=1 THEN 2 ELSE 0 END           + CASE
WHENmp3player=1 THEN 1 ELSE 0 END DESC;
 

...perhaps wrapping the CASE into something like:
CREATE FUNCTION weight_if(boolean,int) RETURNS int AS $$ SELECT CASE
WHEN $1  THEN $2 ELSE 0 END $$ IMMUTABLE LANGUAGE SQL;

...and using it like:

SELECT * FROM carsWHERE cdChanger=1 OR mp3player=1ORDER BY weight_if(cdChanger=1,2) + weight_if(mp3player=1, 1) DESC;
  Regards,    Dawid


Re: Design: Escort info from WHERE clause to executor?

From
Gregory Stark
Date:
<peter.trautmeier@gmx.de> writes:

>> Why? What are you trying to achieve?
>
> I am implementing a technique that sorts a result set according to weight annotations in the WHERE.
>
> The query
>
> SELECT * FROM cars 
> WHERE (cdChanger=1){2} 
>    OR (mp3player=1){1} 
>
> would be sorted according to partial conditions that hold.
>
> Cars that have both a CD changer AND a MP3 player get a weight of 3, i.e. (2+1).
> Cars that only have a CD changer get a weight of 2.
> Cars that only have a MP3 player get a weight of 1.
> Cars that have neither a CD changer nor a MP3 player do not belong to the result set anyway.

You're trying to do it by annotating the clauses early on in the parse stage,
and then looking at the annotations in the executor. But I think you'll find
that there are too many steps in between those two which risk destroying or
making it impossible to make heads or tails of the annotations.

An alternative approach would be to notice the annotations early in the parse
stage and construct copies of those expressions to put into a constructed
ORDER BY clause. Then allow the planner, optimizer, and executor to proceed as
normal. This means it may have to re-evaluate those expressions twice, once
for the query and once for the ordering.

So in the above case it would construct an ORDER BY clause like:
ORDER BY((CASE WHEN cdChanger=1 THEN 2 ELSE 0 END) +  (CASE WHEN mp3Player=1 THEN 1 ELSE 0 END))

Then it can forget about the annotations and allow them to get destroyed.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Design: Escort info from WHERE clause to executor?

From
"Jonah H. Harris"
Date:
On 7/25/07, Gregory Stark <stark@enterprisedb.com> wrote:
> You're trying to do it by annotating the clauses early on in the parse stage,
> and then looking at the annotations in the executor. But I think you'll find
> that there are too many steps in between those two which risk destroying or
> making it impossible to make heads or tails of the annotations.

Ditto.

-- 
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 3rd Floor            | jharris@enterprisedb.com
Iselin, New Jersey 08830            | http://www.enterprisedb.com/


Re: Design: Escort info from WHERE clause to executor?

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> I am implementing a technique that sorts a result set according to weight annotations in the WHERE.
>> 
>> The query
>> 
>> SELECT * FROM cars 
>> WHERE (cdChanger=1){2} 
>> OR (mp3player=1){1} 
>> 
>> would be sorted according to partial conditions that hold.

> You could do that like this, with no need to hack the backend:

> SELECT * FROM cars
> WHERE (cdChanger=1)
>    OR (mp3player=1)
> ORDER BY (CASE WHEN cdchanger=1 THEN 2 ELSE 0 END) +
>          (CASE WHEN mp3player=1 THEN 1 ELSE 0 END) DESC;

If that's an accurate description of the goal, then the OP is nuts to
insist on doing it in the executor.  A reasonable implementation would
be to generate the required ORDER BY at rewrite time, which is before
the planner starts to fold spindle & mutilate the WHERE clause, so the
problem of needing access to the original WHERE structure is solved.
I see no need to change either the planner or the executor one bit.

As for the tree representation of this, rather than modifying AExpr
or OpExpr and trying to deal with the resulting fallout, it might be
better to invent a new expression node type: something{w} becomes
struct OrderingWeight {    Node *something;    int   weight;}

Or maybe weight is another Node --- are run-time-variable weights
allowed?  Anyway, plugging a new node type into the system is a
straightforward if tedious exercise, and there are plenty of past
patches to look at as models.
        regards, tom lane


Re: Updated tsearch documentation

From
"Erikjan"
Date:
In
http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT

it says:

"A document is any text file that can be opened, read, and modified."

Is this an openfts docs relic? tsearch2 is not meant to be be reading
out-of-database *files*, or is it?

If it is actually the case that the present tsearch2 implementation (for
8.3) is going to be able to store pointers into external files, maybe this
should be made more explicitly clear?


oh, and another little derussification (russians don't seem to like
articles, be they definite or indefinite):
"is seen as different function" should be "is seen as a different function"


Thanks,

Erik Rijkers










Re: Updated tsearch documentation

From
Oleg Bartunov
Date:
On Wed, 25 Jul 2007, Erikjan wrote:

> In
> http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT
>
> it says:
>
> "A document is any text file that can be opened, read, and modified."

OOps, in my original documentation it was:
"Document, in usual meaning, is a text file, that one could open, read and modify."
I stress that in database document is something another.

http://www.sai.msu.su/~megera/postgres/fts/doc/fts-whatdb.html


>
> Is this an openfts docs relic? tsearch2 is not meant to be be reading
> out-of-database *files*, or is it?
>
> If it is actually the case that the present tsearch2 implementation (for
> 8.3) is going to be able to store pointers into external files, maybe this
> should be made more explicitly clear?
>
>
> oh, and another little derussification (russians don't seem to like
> articles, be they definite or indefinite):
> "is seen as different function" should be "is seen as a different function"
>
>
> Thanks,
>
> Erik Rijkers
>
>
>
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: Design: Escort info from WHERE clause to executor?

From
peter.trautmeier@gmx.de
Date:
Thanks Heikki,

Von: Heikki Linnakangas <heikki@enterprisedb.com>
> > I am implementing a technique that sorts a result set according to
> weight annotations in the WHERE.
> > 
> > The query
> > 
> > SELECT * FROM cars 
> > WHERE (cdChanger=1){2} 
> >    OR (mp3player=1){1} 
> > 
> > would be sorted according to partial conditions that hold.
> 
> You could do that like this, with no need to hack the backend:
> 
> SELECT * FROM cars
> WHERE (cdChanger=1)
>    OR (mp3player=1)
> ORDER BY (CASE WHEN cdchanger=1 THEN 2 ELSE 0 END) +
>          (CASE WHEN mp3player=1 THEN 1 ELSE 0 END) DESC;

that is certainly a nice, straigthforward solution, however, I forgot to mention my most important goal: I want to
optimizethose weighted queries within pg.
 

As I believe these weights add - in certain cases - additional info that can be exploited to gain nice performance
improvementsin the face of top-k queries.
 

Just 2 examples for top-k query optimization:

1) Consider a query like

SELECT * FROM cars 
WHERE (cdChanger=1){1}   OR (mp3player=1){42}   OR (resemblesBatmobile=1){4711}
LIMIT 10;

where the weights differ considerably. Here it might suffice to consider tuples with (resemblesBatmobile=1){4711} at
firstand filter with that condition. If this pre filtering yields (at least) 10 tuples, we're left off with the task to
sortjust the remaining set according to the remaining conditions and their weights (cdChanger=1){1} and
(mp3player=1){42}.
 

In a case like 
...
WHERE (cdChanger=1){1}   OR (mp3player=1){42}   OR (resemblesBatmobile=1){4711}  OR (resemblesKITT=1){4711}

where two weights are equal the selectivities of resemblesBatmobile=1 and resemblesKITT=1 could aid the optimizers
decisionwhich condition nicest for a pre filtering.
 
(Consider how great the benefit is in case of expensive predicates like subqueries, contrary to these toy comparison
opsof mine.)
 

And so on - your idea here.

2) The second example is a form of join optimization. Consider a query with a simple join

SELECT *
FROM   R,S
WHERE  R.foobar = S.foobar  AND ((R.foo = 1){10}    OR (S.bar = 6){20})
LIMIT  10;

where again just 10 tuples are needed. Normally, the planner would generate a join that evaluates the R.foo = 1 or
S.bar= 6 - there's no other choice.
 

However by rewriting the condition in an equivalent form we can exploit the weights by creating 3 Joins instead of a
singleone:
 

WHERE  R.foobar = S.foobar  AND ((R.foo  = 1){10}    AND (S.bar = 6){20})     --> cream of the crop: weight 30ORii  AND
((R.foo<> 1){10}    AND (S.bar = 6){20})     --> second class citizens: weight 20OR  AND ((R.foo = 1){10}    AND (S.bar
<>6){20})    --> still ok: weight 10
 

By rewriting we gain 2 nice properties:

1. The conditions can be pushed from the (now 3) joins to the selection at the leaf-relation.

2. The tuples are disjunctive regarding their weights: The first join yields the best tuples with weight 30. If it
gainsthe needed 10 tuples, we're set - voila. Otherwise, well, let's consider the next join with the w20-tuples.
Iterate.

I believe there are many more ways to speed up execution with the help of weights.

That's why I need the weights around in the executor, desperately :)

Regards,
Peter



Re: Updated tsearch documentation

From
Bruce Momjian
Date:
Oleg Bartunov wrote:
> On Wed, 25 Jul 2007, Erikjan wrote:
> 
> > In
> > http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT
> >
> > it says:
> >
> > "A document is any text file that can be opened, read, and modified."
> 
> OOps, in my original documentation it was:
> "Document, in usual meaning, is a text file, that one could open, read and modify."
> I stress that in database document is something another.
> 
> http://www.sai.msu.su/~megera/postgres/fts/doc/fts-whatdb.html

I have updated the documentation:
http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Updated tsearch documentation

From
Oleg Bartunov
Date:
On Thu, 26 Jul 2007, Bruce Momjian wrote:

> Oleg Bartunov wrote:
>> On Wed, 25 Jul 2007, Erikjan wrote:
>>
>>> In
>>> http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT
>>>
>>> it says:
>>>
>>> "A document is any text file that can be opened, read, and modified."
>>
>> OOps, in my original documentation it was:
>> "Document, in usual meaning, is a text file, that one could open, read and modify."
>> I stress that in database document is something another.
>>
>> http://www.sai.msu.su/~megera/postgres/fts/doc/fts-whatdb.html
>
> I have updated the documentation:
>
>     http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT
>

Is't worth to reference OpenFTS which used for indexing file system ?

    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: Updated tsearch documentation

From
Bruce Momjian
Date:
Oleg Bartunov wrote:
> On Thu, 26 Jul 2007, Bruce Momjian wrote:
> 
> > Oleg Bartunov wrote:
> >> On Wed, 25 Jul 2007, Erikjan wrote:
> >>
> >>> In
> >>> http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT
> >>>
> >>> it says:
> >>>
> >>> "A document is any text file that can be opened, read, and modified."
> >>
> >> OOps, in my original documentation it was:
> >> "Document, in usual meaning, is a text file, that one could open, read and modify."
> >> I stress that in database document is something another.
> >>
> >> http://www.sai.msu.su/~megera/postgres/fts/doc/fts-whatdb.html
> >
> > I have updated the documentation:
> >
> >     http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT
> >
> 
> Is't worth to reference OpenFTS which used for indexing file system ?

Uh, not sure.  I don't think so but we can add a URL to it if you can
find the right place.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +