Thread: Design: Escort info from WHERE clause to executor?
Hi all, I want to pass additional weight info from the WHERE clause to the executor and I hope someone can help me with this. I accept clauses like the following WHERE (foo='a'){1} WHERE (foo='a'){1} OR (bar='b'){2} WHERE ((foo='a'){1} OR (bar='b'){2})){42} OR (baz='c'){3} where the {} takes an integer as a weight that is attached to the preceding (partial) condition. In the executor, I need to access (1) the logical value of and (2) the weight associated with _each_ subexpression that wasentered. (Getting the weight from the parser to the executor is in itself a journey it seems, as some expression typesare created anew - and not copied - and lose their annotated weight over and over again.) Furthermore I need the structure of OR to be preserved; the OR-of-OR structure from the last WHERE must be preserved or atleast be reconstructible and must not be folded into a 3-valued OR (as canonicalize_qual and friends do.) To sum up, I am looking for a (decently efficient) scheme that is able to (1) pass arbitrary conditional expressions from WHERE to the executor in a structure preserving way. (2) annotate arbitrary expressions with weights that survive on its way from the parser to the executor. (3) access the logical value of particular subexpressions. I have some basic ideas how at least some of the requirements might be achieved. But as I am not totally satisfied with myideas I hope you can provide me with some fresh input. ANY ideas are welcome. Regards, Peter
It looks like you need a customized version of AExpr Node. In the backend parser, an AExpr Node is constructed against each given WHERE expression. You can store the weight along with the expression. Further, don't forget to upgrade the copy functions and equal functions for AExpr if you want to take this weight value all the way upto the executor. --Imad www.EnterpriseDB.com On 7/24/07, peter.trautmeier@gmx.de <peter.trautmeier@gmx.de> wrote: > Hi all, > > I want to pass additional weight info from the WHERE clause to the executor and I hope someone can help me with this. > > I accept clauses like the following > > WHERE (foo='a'){1} > WHERE (foo='a'){1} OR (bar='b'){2} > WHERE ((foo='a'){1} OR (bar='b'){2})){42} OR (baz='c'){3} > > where the {} takes an integer as a weight that is attached to the preceding (partial) condition. > > In the executor, I need to access (1) the logical value of and (2) the weight associated with _each_ subexpression thatwas entered. (Getting the weight from the parser to the executor is in itself a journey it seems, as some expressiontypes are created anew - and not copied - and lose their annotated weight over and over again.) > > Furthermore I need the structure of OR to be preserved; the OR-of-OR structure from the last WHERE must be preserved orat least be reconstructible and must not be folded into a 3-valued OR (as canonicalize_qual and friends do.) > > To sum up, I am looking for a (decently efficient) scheme that is able to > > (1) pass arbitrary conditional expressions from WHERE to the executor in a structure preserving way. > (2) annotate arbitrary expressions with weights that survive on its way from the parser to the executor. > (3) access the logical value of particular subexpressions. > > I have some basic ideas how at least some of the requirements might be achieved. But as I am not totally satisfied withmy ideas I hope you can provide me with some fresh input. > > ANY ideas are welcome. > > Regards, > Peter > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate >
peter.trautmeier@gmx.de wrote: > To sum up, I am looking for a (decently efficient) scheme that is able to > > (1) pass arbitrary conditional expressions from WHERE to the executor in a structure preserving way. > (2) annotate arbitrary expressions with weights that survive on its way from the parser to the executor. > (3) access the logical value of particular subexpressions. > > I have some basic ideas how at least some of the requirements might be achieved. But as I am not totally satisfied withmy ideas I hope you can provide me with some fresh input. Why? What are you trying to achieve? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Thanks imad, Von: imad <immaad@gmail.com> > It looks like you need a customized version of AExpr Node. > In the backend parser, an AExpr Node is constructed against each given > WHERE expression. You can store the weight along with the expression. > Further, don't forget to upgrade the copy functions and equal > functions for AExpr if you want to take this weight value all the way > upto the executor. I have already done that, alas, it doesn't suffice: During parse analysis transformExpr() turns the AExpr into other types,e.g. an OpExpr in case of foo = 42. These newly created OpExprs won't last until they reach the executor in every case, though: They are sometimes recreatedanew during planning 'manually', i.e. not by using the standard copy functions but by creating a new OpExpr nodeand copying the fields on an as-seen-before basis - these are the places where my weight gets lost. To be honest, I consider not using a special copy function a minor design flaw :) *sigh* Anyway, thanks again. But giving a weight to AExpr is just the tip of the iceberg when it comes to my final goal - let myweights enter the executor ;) Regards, Peter
Von: Heikki Linnakangas <heikki@enterprisedb.com> > peter.trautmeier@gmx.de wrote: > > To sum up, I am looking for a (decently efficient) scheme that is able > to > > > > (1) pass arbitrary conditional expressions from WHERE to the executor in > a structure preserving way. > > (2) annotate arbitrary expressions with weights that survive on its way > from the parser to the executor. > > (3) access the logical value of particular subexpressions. > > > > I have some basic ideas how at least some of the requirements might be > achieved. But as I am not totally satisfied with my ideas I hope you can > provide me with some fresh input. > > Why? What are you trying to achieve? I am implementing a technique that sorts a result set according to weight annotations in the WHERE. The query SELECT * FROM cars WHERE (cdChanger=1){2} OR (mp3player=1){1} would be sorted according to partial conditions that hold. Cars that have both a CD changer AND a MP3 player get a weight of 3, i.e. (2+1). Cars that only have a CD changer get a weight of 2. Cars that only have a MP3 player get a weight of 1. Cars that have neither a CD changer nor a MP3 player do not belong to the result set anyway. I have to sort the tuples according to their individual weight - that is why I need to annotate arbitrary expressions withweights. This is a simple example, but in case of cascaded ORs and ANDs the semantics gets slightly trickier - that is why I needthe structure of the original WHERE clause preserved. The executor as it stands now is evaluating quals in a short circuit manner, and that is totally feasible: As soon as a singlesubexpression of and ANDed qual is false, it stops. However, in order to sort the tuples in the result set I generally need to know the logical values of all subexpressions,or at least more of them as the executor needs for its rather coarsely grained decision 'belongs-to-result-set'or 'does-not-belong'. In general, I have to evaluate the original WHERE expr tree level by level, starting at the top(root) that represents thewhole condition and possibly visiting and evaluating every single subexpression in the tree to every leaf, until I haveenough information to compare two tuples with each other. I hope I got the basic idea across, but please don't hesitate to ask for more details if I failed to paint the picture clearlyenough. I appreciate your interest. (I know that seems complicated, and indeed I am getting the feeling that this _is_ complicated. But hey, then again it'smy duty ;) ) Regards, Peter
peter.trautmeier@gmx.de wrote: > Von: Heikki Linnakangas <heikki@enterprisedb.com> >> peter.trautmeier@gmx.de wrote: >>> To sum up, I am looking for a (decently efficient) scheme that is able >> to >>> (1) pass arbitrary conditional expressions from WHERE to the executor in >> a structure preserving way. >>> (2) annotate arbitrary expressions with weights that survive on its way >> from the parser to the executor. >>> (3) access the logical value of particular subexpressions. >>> >>> I have some basic ideas how at least some of the requirements might be >> achieved. But as I am not totally satisfied with my ideas I hope you can >> provide me with some fresh input. >> >> Why? What are you trying to achieve? > > I am implementing a technique that sorts a result set according to weight annotations in the WHERE. > > The query > > SELECT * FROM cars > WHERE (cdChanger=1){2} > OR (mp3player=1){1} > > would be sorted according to partial conditions that hold. You could do that like this, with no need to hack the backend: SELECT * FROM cars WHERE (cdChanger=1) OR (mp3player=1) ORDER BY (CASE WHEN cdchanger=1 THEN 2 ELSE 0 END) + (CASE WHEN mp3player=1 THEN 1 ELSE 0 END) DESC; Or a bit shorter version, exploiting the cast from boolean to int: SELECT * FROM cars WHERE (cdChanger=1) OR (mp3player=1) ORDER BY (cdchanger=1)*2 + (mp3player=1)*1 DESC; -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 7/25/07, peter.trautmeier@gmx.de <peter.trautmeier@gmx.de> wrote: > > > > Why? What are you trying to achieve? > > I am implementing a technique that sorts a result set according to weight annotations in the WHERE. > > The query > > SELECT * FROM cars > WHERE (cdChanger=1){2} > OR (mp3player=1){1} > > would be sorted according to partial conditions that hold. > > Cars that have both a CD changer AND a MP3 player get a weight of 3, i.e. (2+1). > Cars that only have a CD changer get a weight of 2. > Cars that only have a MP3 player get a weight of 1. Hmm, any particular reason why not doing it this way: ? SELECT * FROM carsWHERE cdChanger=1 OR mp3player=1ORDER BY CASE WHEN cdChanger=1 THEN 2 ELSE 0 END + CASE WHENmp3player=1 THEN 1 ELSE 0 END DESC; ...perhaps wrapping the CASE into something like: CREATE FUNCTION weight_if(boolean,int) RETURNS int AS $$ SELECT CASE WHEN $1 THEN $2 ELSE 0 END $$ IMMUTABLE LANGUAGE SQL; ...and using it like: SELECT * FROM carsWHERE cdChanger=1 OR mp3player=1ORDER BY weight_if(cdChanger=1,2) + weight_if(mp3player=1, 1) DESC; Regards, Dawid
<peter.trautmeier@gmx.de> writes: >> Why? What are you trying to achieve? > > I am implementing a technique that sorts a result set according to weight annotations in the WHERE. > > The query > > SELECT * FROM cars > WHERE (cdChanger=1){2} > OR (mp3player=1){1} > > would be sorted according to partial conditions that hold. > > Cars that have both a CD changer AND a MP3 player get a weight of 3, i.e. (2+1). > Cars that only have a CD changer get a weight of 2. > Cars that only have a MP3 player get a weight of 1. > Cars that have neither a CD changer nor a MP3 player do not belong to the result set anyway. You're trying to do it by annotating the clauses early on in the parse stage, and then looking at the annotations in the executor. But I think you'll find that there are too many steps in between those two which risk destroying or making it impossible to make heads or tails of the annotations. An alternative approach would be to notice the annotations early in the parse stage and construct copies of those expressions to put into a constructed ORDER BY clause. Then allow the planner, optimizer, and executor to proceed as normal. This means it may have to re-evaluate those expressions twice, once for the query and once for the ordering. So in the above case it would construct an ORDER BY clause like: ORDER BY((CASE WHEN cdChanger=1 THEN 2 ELSE 0 END) + (CASE WHEN mp3Player=1 THEN 1 ELSE 0 END)) Then it can forget about the annotations and allow them to get destroyed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On 7/25/07, Gregory Stark <stark@enterprisedb.com> wrote: > You're trying to do it by annotating the clauses early on in the parse stage, > and then looking at the annotations in the executor. But I think you'll find > that there are too many steps in between those two which risk destroying or > making it impossible to make heads or tails of the annotations. Ditto. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 33 Wood Ave S, 3rd Floor | jharris@enterprisedb.com Iselin, New Jersey 08830 | http://www.enterprisedb.com/
Heikki Linnakangas <heikki@enterprisedb.com> writes: >> I am implementing a technique that sorts a result set according to weight annotations in the WHERE. >> >> The query >> >> SELECT * FROM cars >> WHERE (cdChanger=1){2} >> OR (mp3player=1){1} >> >> would be sorted according to partial conditions that hold. > You could do that like this, with no need to hack the backend: > SELECT * FROM cars > WHERE (cdChanger=1) > OR (mp3player=1) > ORDER BY (CASE WHEN cdchanger=1 THEN 2 ELSE 0 END) + > (CASE WHEN mp3player=1 THEN 1 ELSE 0 END) DESC; If that's an accurate description of the goal, then the OP is nuts to insist on doing it in the executor. A reasonable implementation would be to generate the required ORDER BY at rewrite time, which is before the planner starts to fold spindle & mutilate the WHERE clause, so the problem of needing access to the original WHERE structure is solved. I see no need to change either the planner or the executor one bit. As for the tree representation of this, rather than modifying AExpr or OpExpr and trying to deal with the resulting fallout, it might be better to invent a new expression node type: something{w} becomes struct OrderingWeight { Node *something; int weight;} Or maybe weight is another Node --- are run-time-variable weights allowed? Anyway, plugging a new node type into the system is a straightforward if tedious exercise, and there are plenty of past patches to look at as models. regards, tom lane
In http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT it says: "A document is any text file that can be opened, read, and modified." Is this an openfts docs relic? tsearch2 is not meant to be be reading out-of-database *files*, or is it? If it is actually the case that the present tsearch2 implementation (for 8.3) is going to be able to store pointers into external files, maybe this should be made more explicitly clear? oh, and another little derussification (russians don't seem to like articles, be they definite or indefinite): "is seen as different function" should be "is seen as a different function" Thanks, Erik Rijkers
On Wed, 25 Jul 2007, Erikjan wrote: > In > http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT > > it says: > > "A document is any text file that can be opened, read, and modified." OOps, in my original documentation it was: "Document, in usual meaning, is a text file, that one could open, read and modify." I stress that in database document is something another. http://www.sai.msu.su/~megera/postgres/fts/doc/fts-whatdb.html > > Is this an openfts docs relic? tsearch2 is not meant to be be reading > out-of-database *files*, or is it? > > If it is actually the case that the present tsearch2 implementation (for > 8.3) is going to be able to store pointers into external files, maybe this > should be made more explicitly clear? > > > oh, and another little derussification (russians don't seem to like > articles, be they definite or indefinite): > "is seen as different function" should be "is seen as a different function" > > > Thanks, > > Erik Rijkers > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Thanks Heikki, Von: Heikki Linnakangas <heikki@enterprisedb.com> > > I am implementing a technique that sorts a result set according to > weight annotations in the WHERE. > > > > The query > > > > SELECT * FROM cars > > WHERE (cdChanger=1){2} > > OR (mp3player=1){1} > > > > would be sorted according to partial conditions that hold. > > You could do that like this, with no need to hack the backend: > > SELECT * FROM cars > WHERE (cdChanger=1) > OR (mp3player=1) > ORDER BY (CASE WHEN cdchanger=1 THEN 2 ELSE 0 END) + > (CASE WHEN mp3player=1 THEN 1 ELSE 0 END) DESC; that is certainly a nice, straigthforward solution, however, I forgot to mention my most important goal: I want to optimizethose weighted queries within pg. As I believe these weights add - in certain cases - additional info that can be exploited to gain nice performance improvementsin the face of top-k queries. Just 2 examples for top-k query optimization: 1) Consider a query like SELECT * FROM cars WHERE (cdChanger=1){1} OR (mp3player=1){42} OR (resemblesBatmobile=1){4711} LIMIT 10; where the weights differ considerably. Here it might suffice to consider tuples with (resemblesBatmobile=1){4711} at firstand filter with that condition. If this pre filtering yields (at least) 10 tuples, we're left off with the task to sortjust the remaining set according to the remaining conditions and their weights (cdChanger=1){1} and (mp3player=1){42}. In a case like ... WHERE (cdChanger=1){1} OR (mp3player=1){42} OR (resemblesBatmobile=1){4711} OR (resemblesKITT=1){4711} where two weights are equal the selectivities of resemblesBatmobile=1 and resemblesKITT=1 could aid the optimizers decisionwhich condition nicest for a pre filtering. (Consider how great the benefit is in case of expensive predicates like subqueries, contrary to these toy comparison opsof mine.) And so on - your idea here. 2) The second example is a form of join optimization. Consider a query with a simple join SELECT * FROM R,S WHERE R.foobar = S.foobar AND ((R.foo = 1){10} OR (S.bar = 6){20}) LIMIT 10; where again just 10 tuples are needed. Normally, the planner would generate a join that evaluates the R.foo = 1 or S.bar= 6 - there's no other choice. However by rewriting the condition in an equivalent form we can exploit the weights by creating 3 Joins instead of a singleone: WHERE R.foobar = S.foobar AND ((R.foo = 1){10} AND (S.bar = 6){20}) --> cream of the crop: weight 30ORii AND ((R.foo<> 1){10} AND (S.bar = 6){20}) --> second class citizens: weight 20OR AND ((R.foo = 1){10} AND (S.bar <>6){20}) --> still ok: weight 10 By rewriting we gain 2 nice properties: 1. The conditions can be pushed from the (now 3) joins to the selection at the leaf-relation. 2. The tuples are disjunctive regarding their weights: The first join yields the best tuples with weight 30. If it gainsthe needed 10 tuples, we're set - voila. Otherwise, well, let's consider the next join with the w20-tuples. Iterate. I believe there are many more ways to speed up execution with the help of weights. That's why I need the weights around in the executor, desperately :) Regards, Peter
Oleg Bartunov wrote: > On Wed, 25 Jul 2007, Erikjan wrote: > > > In > > http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT > > > > it says: > > > > "A document is any text file that can be opened, read, and modified." > > OOps, in my original documentation it was: > "Document, in usual meaning, is a text file, that one could open, read and modify." > I stress that in database document is something another. > > http://www.sai.msu.su/~megera/postgres/fts/doc/fts-whatdb.html I have updated the documentation: http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Thu, 26 Jul 2007, Bruce Momjian wrote: > Oleg Bartunov wrote: >> On Wed, 25 Jul 2007, Erikjan wrote: >> >>> In >>> http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT >>> >>> it says: >>> >>> "A document is any text file that can be opened, read, and modified." >> >> OOps, in my original documentation it was: >> "Document, in usual meaning, is a text file, that one could open, read and modify." >> I stress that in database document is something another. >> >> http://www.sai.msu.su/~megera/postgres/fts/doc/fts-whatdb.html > > I have updated the documentation: > > http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT > Is't worth to reference OpenFTS which used for indexing file system ? Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov wrote: > On Thu, 26 Jul 2007, Bruce Momjian wrote: > > > Oleg Bartunov wrote: > >> On Wed, 25 Jul 2007, Erikjan wrote: > >> > >>> In > >>> http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT > >>> > >>> it says: > >>> > >>> "A document is any text file that can be opened, read, and modified." > >> > >> OOps, in my original documentation it was: > >> "Document, in usual meaning, is a text file, that one could open, read and modify." > >> I stress that in database document is something another. > >> > >> http://www.sai.msu.su/~megera/postgres/fts/doc/fts-whatdb.html > > > > I have updated the documentation: > > > > http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT > > > > Is't worth to reference OpenFTS which used for indexing file system ? Uh, not sure. I don't think so but we can add a URL to it if you can find the right place. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +