Re: [HACKERS] Join syntax - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Join syntax
Date
Msg-id 29288.937579774@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Join syntax  (Thomas Lockhart <lockhart@alumni.caltech.edu>)
List pgsql-hackers
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> Maybe it's time to bite the bullet and do it.  You have any thoughts
>> on what the representation should look like?

> I was thinking of propagating a "join expression" into the
> planner/optimizer in the same area as the existing qualification
> nodes.

I think it would be best to keep it out of the regular expression-tree
stuff, for a number of reasons including the issue of not allowing
reordering.

The thing I was visualizing was a tree of Query nodes, not queries as
items in ordinary expressions within queries.  Essentially, we'd allow a
sub-Query as an entry in the rangetable (the FROM list) of another Query.
I *think* this is what Jan has been saying he wants in order to do view
rule rewrites more cleanly.  It could also solve my problems with INSERT
... SELECT.

Aside from plain Query nodes (representing a sub-Select) we'd need node
types that represent UNION/INTERSECT/EXCEPT combinations of Queries.
I don't like the way the current UNION/INTERSECT code overloads
AND/OR/NOT nodes to do this duty, not least because there's noplace to
represent the "ALL" modifier cleanly.  I'd rather see a separate set of
node types.

I still don't understand the semantics of all those join types you are
working on, but I suppose they would be additional node types in this
Query tree structure.  Should the rangetable itself (which represents
a regular Cartesian-product join of the source tables) become some
kind of explicit join node?  If so, I guess the WHERE clause would be
attached to this join node, and not to the Query node referencing the
join.  (Actually, the rangetable should probably continue to exist
as a list of all the tables referenced anywhere in the Query tree,
but we should separate out its implicit use as a representation of
a Cartesian product join and make an explicit node that says what to
join, how, and with what restriction clauses.  The "in From clause"
flag in RTEs would go away...) 

Another thing it'd be nice to think about while we are at it is how
to implement SQL92's DISTINCT-inside-an-aggregate-function feature,
eg, "SELECT COUNT(DISTINCT x), COUNT(DISTINCT y) FROM table".
My thought here is that the cleanest implementation is to have 
sub-Queries like "SELECT DISTINCT x FROM table" and then apply the
aggregates over the outputs of those subqueries.  Not sure about
details here.

> afaik the planner/optimizer already has the notion of
> merging/joining/scanning intermediate results, so teaching it to
> invoke these explicitly from the query tree rather than just
> implicitly may not be a huge stretch.

Yes, the output of the planner is a tree of plan node types, so there
would probably be very little change needed there or in the executor.
We might need to generalize the notion that a plan node only has
one or two descendants ("lefttree/righttree") into N descendants.
        regards, tom lane

PS: Has anyone heard from Jan lately?  Seems like he's been awfully
quiet...


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Bug
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Re: attdisbursion