Thread: Join syntax
I've been playing with selects using explicit join syntax, and have some initial results for inner joins (haven't done anything more about outer joins yet; inner joins are tough enough for now). It's a real pita to flatten the join expressions into the traditional Postgres query tree. It would be nice to start thinking about how to represent general subqueries or intermediate queries in the parse tree. Anyway, some examples below... - Thomas postgres=> select * from t1; i| j -+-- 1|10 2|20 3|30 (3 rows) postgres=> select * from t2; i| x -+--- 1|100 3|300 (2 rows) postgres=> select * from t1 natural join t2; i| j| x -+--+--- 1|10|100 3|30|300 (2 rows) postgres=> select * from t1 join t2 using (i); i| j| x -+--+--- 1|10|100 3|30|300 (2 rows) postgres=> select * from t1 join t2 on (t1.i = t2.i); i| j|i| x -+--+-+--- 1|10|1|100 3|30|3|300 (2 rows) postgres=> select * from t1 natural join t2 natural join t1; pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. We have lost the connection to the backend, so further processing is impossible. Terminating. Oh well. Was on a roll 'til then ;) -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > It's a real pita to flatten the join expressions into the traditional > Postgres query tree. It would be nice to start thinking about how to > represent general subqueries or intermediate queries in the parse > tree. Yes. Jan has been saying for some time that he needs that for rules. Also, I have found some squirrely cases in INSERT ... SELECT ... that can't really be done right unless the INSERT and SELECT targetlists are kept separate, which seems to mean a two-level parsetree structure. The UNION/INTERSECT/EXCEPT code has a really klugy approach to multi-query parse trees, which maybe could be cleaned up if we supported them in a more general fashion. Maybe it's time to bite the bullet and do it. You have any thoughts on what the representation should look like? regards, tom lane
> > ... represent general subqueries or intermediate queries in the > > parse tree. > Maybe it's time to bite the bullet and do it. You have any thoughts > on what the representation should look like? I was hoping you would tell me ;) I don't have a good feel for the current parse tree (which of course hasn't kept me from fooling around with it). But I'll definitely need something extra or different to implement outer joins. If I were keeping everything else the same, I was thinking of propagating a "join expression" into the planner/optimizer in the same area as the existing qualification nodes. One of the differences would be that the JE marks a node around which the optimizer is not allowed to reorder the plan (since outer joins must be evaluated in a specific order to get the right result). But I could just as easily represent this as a subquery node somewhere else in the parse tree. afaik the planner/optimizer already has the notion of merging/joining/scanning intermediate results, so teaching it to invoke these explicitly from the query tree rather than just implicitly may not be a huge stretch. btw I'm currently rewriting the join syntax in gram.y to conform better to a closer reading of the SQL92 standard. One annoyance is that the standard allows table *and* column aliasing *everywhere*. e.g. select * from (t1 as x1 (i,j,k) join t2 using (i)) as r1 (a,b,c,d) is (apparently) legal syntax, resulting in rows labeled a-d. Ugh. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart wrote: > > > > ... represent general subqueries or intermediate queries in the > > > parse tree. > > Maybe it's time to bite the bullet and do it. You have any thoughts > > on what the representation should look like? > > I was hoping you would tell me ;) > > I don't have a good feel for the current parse tree (which of course > hasn't kept me from fooling around with it). But I'll definitely need > something extra or different to implement outer joins. If I were > keeping everything else the same, I was thinking of propagating a > "join expression" into the planner/optimizer in the same area as the > existing qualification nodes. One of the differences would be that the > JE marks a node around which the optimizer is not allowed to reorder ^^^^^^^^^^^^^^^^^^^^^^^^^ > the plan (since outer joins must be evaluated in a specific order to ^^^^^^^^ > get the right result). But I could just as easily represent this as a And this is what we need to have subqueries in FROM!.. (One a great thing which I want to have so much -:)) > subquery node somewhere else in the parse tree. Vadim
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> Maybe it's time to bite the bullet and do it. You have any thoughts >> on what the representation should look like? > I was thinking of propagating a "join expression" into the > planner/optimizer in the same area as the existing qualification > nodes. I think it would be best to keep it out of the regular expression-tree stuff, for a number of reasons including the issue of not allowing reordering. The thing I was visualizing was a tree of Query nodes, not queries as items in ordinary expressions within queries. Essentially, we'd allow a sub-Query as an entry in the rangetable (the FROM list) of another Query. I *think* this is what Jan has been saying he wants in order to do view rule rewrites more cleanly. It could also solve my problems with INSERT ... SELECT. Aside from plain Query nodes (representing a sub-Select) we'd need node types that represent UNION/INTERSECT/EXCEPT combinations of Queries. I don't like the way the current UNION/INTERSECT code overloads AND/OR/NOT nodes to do this duty, not least because there's noplace to represent the "ALL" modifier cleanly. I'd rather see a separate set of node types. I still don't understand the semantics of all those join types you are working on, but I suppose they would be additional node types in this Query tree structure. Should the rangetable itself (which represents a regular Cartesian-product join of the source tables) become some kind of explicit join node? If so, I guess the WHERE clause would be attached to this join node, and not to the Query node referencing the join. (Actually, the rangetable should probably continue to exist as a list of all the tables referenced anywhere in the Query tree, but we should separate out its implicit use as a representation of a Cartesian product join and make an explicit node that says what to join, how, and with what restriction clauses. The "in From clause" flag in RTEs would go away...) Another thing it'd be nice to think about while we are at it is how to implement SQL92's DISTINCT-inside-an-aggregate-function feature, eg, "SELECT COUNT(DISTINCT x), COUNT(DISTINCT y) FROM table". My thought here is that the cleanest implementation is to have sub-Queries like "SELECT DISTINCT x FROM table" and then apply the aggregates over the outputs of those subqueries. Not sure about details here. > afaik the planner/optimizer already has the notion of > merging/joining/scanning intermediate results, so teaching it to > invoke these explicitly from the query tree rather than just > implicitly may not be a huge stretch. Yes, the output of the planner is a tree of plan node types, so there would probably be very little change needed there or in the executor. We might need to generalize the notion that a plan node only has one or two descendants ("lefttree/righttree") into N descendants. regards, tom lane PS: Has anyone heard from Jan lately? Seems like he's been awfully quiet...