Thread: Join syntax

Join syntax

From
Thomas Lockhart
Date:
I've been playing with selects using explicit join syntax, and have
some initial results for inner joins (haven't done anything more about
outer joins yet; inner joins are tough enough for now).

It's a real pita to flatten the join expressions into the traditional
Postgres query tree. It would be nice to start thinking about how to
represent general subqueries or intermediate queries in the parse
tree.

Anyway, some examples below...
                       - Thomas

postgres=> select * from t1;
i| j
-+--
1|10
2|20
3|30
(3 rows)

postgres=> select * from t2;
i|  x
-+---
1|100
3|300
(2 rows)

postgres=> select * from t1 natural join t2;
i| j|  x
-+--+---
1|10|100
3|30|300
(2 rows)

postgres=> select * from t1 join t2 using (i);
i| j|  x
-+--+---
1|10|100
3|30|300
(2 rows)

postgres=> select * from t1 join t2 on (t1.i = t2.i);
i| j|i|  x
-+--+-+---
1|10|1|100
3|30|3|300
(2 rows)

postgres=> select * from t1 natural join t2 natural join t1;
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
  before or while processing the request.
 
We have lost the connection to the backend, so further processing is
impossible.  Terminating.

Oh well. Was on a roll 'til then ;)

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Join syntax

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> It's a real pita to flatten the join expressions into the traditional
> Postgres query tree. It would be nice to start thinking about how to
> represent general subqueries or intermediate queries in the parse
> tree.

Yes.  Jan has been saying for some time that he needs that for rules.
Also, I have found some squirrely cases in INSERT ... SELECT ... that
can't really be done right unless the INSERT and SELECT targetlists
are kept separate, which seems to mean a two-level parsetree structure.

The UNION/INTERSECT/EXCEPT code has a really klugy approach to
multi-query parse trees, which maybe could be cleaned up if we
supported them in a more general fashion.

Maybe it's time to bite the bullet and do it.  You have any thoughts
on what the representation should look like?
        regards, tom lane


Re: [HACKERS] Join syntax

From
Thomas Lockhart
Date:
> > ... represent general subqueries or intermediate queries in the
> > parse tree.
> Maybe it's time to bite the bullet and do it.  You have any thoughts
> on what the representation should look like?

I was hoping you would tell me ;)

I don't have a good feel for the current parse tree (which of course
hasn't kept me from fooling around with it). But I'll definitely need
something extra or different to implement outer joins. If I were
keeping everything else the same, I was thinking of propagating a
"join expression" into the planner/optimizer in the same area as the
existing qualification nodes. One of the differences would be that the
JE marks a node around which the optimizer is not allowed to reorder
the plan (since outer joins must be evaluated in a specific order to
get the right result). But I could just as easily represent this as a
subquery node somewhere else in the parse tree.

afaik the planner/optimizer already has the notion of
merging/joining/scanning intermediate results, so teaching it to
invoke these explicitly from the query tree rather than just
implicitly may not be a huge stretch.

btw I'm currently rewriting the join syntax in gram.y to conform
better to a closer reading of the SQL92 standard. One annoyance is
that the standard allows table *and* column aliasing *everywhere*.
e.g.
 select * from (t1 as x1 (i,j,k) join t2 using (i)) as r1 (a,b,c,d)

is (apparently) legal syntax, resulting in rows labeled a-d. Ugh.
                       - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Join syntax

From
Vadim Mikheev
Date:
Thomas Lockhart wrote:
> 
> > > ... represent general subqueries or intermediate queries in the
> > > parse tree.
> > Maybe it's time to bite the bullet and do it.  You have any thoughts
> > on what the representation should look like?
> 
> I was hoping you would tell me ;)
> 
> I don't have a good feel for the current parse tree (which of course
> hasn't kept me from fooling around with it). But I'll definitely need
> something extra or different to implement outer joins. If I were
> keeping everything else the same, I was thinking of propagating a
> "join expression" into the planner/optimizer in the same area as the
> existing qualification nodes. One of the differences would be that the
> JE marks a node around which the optimizer is not allowed to reorder
^^^^^^^^^^^^^^^^^^^^^^^^^
> the plan (since outer joins must be evaluated in a specific order to ^^^^^^^^
> get the right result). But I could just as easily represent this as a

And this is what we need to have subqueries in FROM!..
(One a great thing which I want to have so much -:))

> subquery node somewhere else in the parse tree.

Vadim


Re: [HACKERS] Join syntax

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> Maybe it's time to bite the bullet and do it.  You have any thoughts
>> on what the representation should look like?

> I was thinking of propagating a "join expression" into the
> planner/optimizer in the same area as the existing qualification
> nodes.

I think it would be best to keep it out of the regular expression-tree
stuff, for a number of reasons including the issue of not allowing
reordering.

The thing I was visualizing was a tree of Query nodes, not queries as
items in ordinary expressions within queries.  Essentially, we'd allow a
sub-Query as an entry in the rangetable (the FROM list) of another Query.
I *think* this is what Jan has been saying he wants in order to do view
rule rewrites more cleanly.  It could also solve my problems with INSERT
... SELECT.

Aside from plain Query nodes (representing a sub-Select) we'd need node
types that represent UNION/INTERSECT/EXCEPT combinations of Queries.
I don't like the way the current UNION/INTERSECT code overloads
AND/OR/NOT nodes to do this duty, not least because there's noplace to
represent the "ALL" modifier cleanly.  I'd rather see a separate set of
node types.

I still don't understand the semantics of all those join types you are
working on, but I suppose they would be additional node types in this
Query tree structure.  Should the rangetable itself (which represents
a regular Cartesian-product join of the source tables) become some
kind of explicit join node?  If so, I guess the WHERE clause would be
attached to this join node, and not to the Query node referencing the
join.  (Actually, the rangetable should probably continue to exist
as a list of all the tables referenced anywhere in the Query tree,
but we should separate out its implicit use as a representation of
a Cartesian product join and make an explicit node that says what to
join, how, and with what restriction clauses.  The "in From clause"
flag in RTEs would go away...) 

Another thing it'd be nice to think about while we are at it is how
to implement SQL92's DISTINCT-inside-an-aggregate-function feature,
eg, "SELECT COUNT(DISTINCT x), COUNT(DISTINCT y) FROM table".
My thought here is that the cleanest implementation is to have 
sub-Queries like "SELECT DISTINCT x FROM table" and then apply the
aggregates over the outputs of those subqueries.  Not sure about
details here.

> afaik the planner/optimizer already has the notion of
> merging/joining/scanning intermediate results, so teaching it to
> invoke these explicitly from the query tree rather than just
> implicitly may not be a huge stretch.

Yes, the output of the planner is a tree of plan node types, so there
would probably be very little change needed there or in the executor.
We might need to generalize the notion that a plan node only has
one or two descendants ("lefttree/righttree") into N descendants.
        regards, tom lane

PS: Has anyone heard from Jan lately?  Seems like he's been awfully
quiet...