Re: subselects - Mailing list pgsql-hackers
From | Vadim B. Mikheev |
---|---|
Subject | Re: subselects |
Date | |
Msg-id | 34B63DCD.73AA70C7@sable.krasnoyarsk.su Whole thread Raw |
In response to | subselects (Bruce Momjian <maillist@candle.pha.pa.us>) |
Responses |
Re: subselects
|
List | pgsql-hackers |
Bruce Momjian wrote: > > Vadim, I know you are still thinking about subselects, but I have some > more clarification that may help. > > We have to add phantom range table entries to correlated subselects so > they will pass the parser. We might as well add those fields to the > target list of the subquery at the same time: > > select * > from taba > where col1 = (select col2 > from tabb > where taba.col3 = tabb.col4) > > becomes: > > select * > from taba > where col1 = (select col2, tabb.col4 <--- > from tabb, taba <--- > where taba.col3 = tabb.col4) > > We add a field to TargetEntry and RangeTblEntry to mark the fact that it > was entered as a correlation entry: > > bool isCorrelated; No, I don't like to add anything in parser. Example: select * from tabA where col1 = (select col2 from tabB where tabA.col3 = tabB.col4 and exists (select * from tabC where tabB.colX = tabC.colX and tabC.colY = tabA.col2) ) : a column of tabA is referenced in sub-subselect (is it allowable by standards ?) - in this case it's better to don't add tabA to 1st subselect but add tabA to second one and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table - this gives us 2-tables join in 1st subquery instead of 3-tables join. (And I'm still not sure that using temp tables is best of what can be done in all cases...) Instead of using isCorrelated in TE & RTE we can add Index varlevel; to Var node to reflect (sub)query from where this Var is come (where is range table to find var's relation using varno). Upmost query will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on. ^^^ ^^^^^^^^^^^^ (I don't see problems with distinguishing Vars of different children on the same level...) > > Second, we need to hook the subselect to the main query. I recommend we > add two fields to Query for this: > > Query *parentQuery; > List *subqueries; Agreed. And maybe Index queryLevel. > In the parent query, to parse the WHERE clause, we create a new operator > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the ^^^^^^^^^^^^^^^^^^ No. We have to handle (a,b,c) OP (select x, y, z ...) and '_a_constant_' OP (select ...) - I don't know is last in standards, Sybase has this. Well, typedef enum OpType { OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR + OP_EXISTS, OP_ALL, OP_ANY } OpType; typedef struct Expr { NodeTag type; Oid typeOid; /* oid of the type of this expr */ OpType opType; /* type of the op */ Node *oper; /* could be Oper or Func */ List *args; /* list of argument nodes */ } Expr; OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries List, following your suggestion) OP_ALL, OP_ANY: oper is List of Oper nodes. We need in list because of data types of a, b, c (above) can be different and so Oper nodes will be different too. lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) - left side of subquery' operator. lsecond(args) is SubSelect. Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in IN, NOTIN in A_Expr (parser node), but both of them have to be transferred by parser into corresponding ANY and ALL. At the moment we can do: IN --> = ANY, NOT IN --> <> ALL but this will be "known bug": this breaks OO-nature of Postgres, because of operators can be overrided and '=' can mean s o m e t h i n g (not equality). Example: box data type. For boxes, = means equality of _areas_ and =~ means that boxes are the same ==> =~ ANY should be used for IN. > right side is an index to a slot in the subqueries List. Vadim
pgsql-hackers by date: