Re: [HACKERS] Join syntax - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] Join syntax |
Date | |
Msg-id | 29288.937579774@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] Join syntax (Thomas Lockhart <lockhart@alumni.caltech.edu>) |
List | pgsql-hackers |
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> Maybe it's time to bite the bullet and do it. You have any thoughts >> on what the representation should look like? > I was thinking of propagating a "join expression" into the > planner/optimizer in the same area as the existing qualification > nodes. I think it would be best to keep it out of the regular expression-tree stuff, for a number of reasons including the issue of not allowing reordering. The thing I was visualizing was a tree of Query nodes, not queries as items in ordinary expressions within queries. Essentially, we'd allow a sub-Query as an entry in the rangetable (the FROM list) of another Query. I *think* this is what Jan has been saying he wants in order to do view rule rewrites more cleanly. It could also solve my problems with INSERT ... SELECT. Aside from plain Query nodes (representing a sub-Select) we'd need node types that represent UNION/INTERSECT/EXCEPT combinations of Queries. I don't like the way the current UNION/INTERSECT code overloads AND/OR/NOT nodes to do this duty, not least because there's noplace to represent the "ALL" modifier cleanly. I'd rather see a separate set of node types. I still don't understand the semantics of all those join types you are working on, but I suppose they would be additional node types in this Query tree structure. Should the rangetable itself (which represents a regular Cartesian-product join of the source tables) become some kind of explicit join node? If so, I guess the WHERE clause would be attached to this join node, and not to the Query node referencing the join. (Actually, the rangetable should probably continue to exist as a list of all the tables referenced anywhere in the Query tree, but we should separate out its implicit use as a representation of a Cartesian product join and make an explicit node that says what to join, how, and with what restriction clauses. The "in From clause" flag in RTEs would go away...) Another thing it'd be nice to think about while we are at it is how to implement SQL92's DISTINCT-inside-an-aggregate-function feature, eg, "SELECT COUNT(DISTINCT x), COUNT(DISTINCT y) FROM table". My thought here is that the cleanest implementation is to have sub-Queries like "SELECT DISTINCT x FROM table" and then apply the aggregates over the outputs of those subqueries. Not sure about details here. > afaik the planner/optimizer already has the notion of > merging/joining/scanning intermediate results, so teaching it to > invoke these explicitly from the query tree rather than just > implicitly may not be a huge stretch. Yes, the output of the planner is a tree of plan node types, so there would probably be very little change needed there or in the executor. We might need to generalize the notion that a plan node only has one or two descendants ("lefttree/righttree") into N descendants. regards, tom lane PS: Has anyone heard from Jan lately? Seems like he's been awfully quiet...
pgsql-hackers by date: