Re: [HACKERS] Another reason to redesign querytree representation - Mailing list pgsql-hackers

From Thomas Lockhart
Subject Re: [HACKERS] Another reason to redesign querytree representation
Date
Msg-id 379729CE.91905BB3@alumni.caltech.edu
Whole thread Raw
In response to Re: [HACKERS] Another reason to redesign querytree representation  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> >   select * from t1 left join t2 using (i)
> >                 left join t1 on (i = t1.j);
> Is that actually a valid query?  Wouldn't you at least need to rename
> one or the other appearance of t1?  (Nitpick, probably, but maybe I
> am not understanding something...)

afaik it is a valid query. The first outer join combines t1 and t2,
and by definition the intermediate result loses its table-specific
labeling.

> > For a query like
> >   select * from t1 left join t2 using (i) where t2.j = 3;
> > istm that the outer join must be done before the t2 qualification is
> > applied, and that another ordering may produce the wrong result.
> It's not immediately obvious what the semantics of that ought to be...
> but I agree it makes a difference if you eliminate t2 rows before
> rather than after the join.
> This looks to me like the syntactic notion is that t1-left-join-t2
> is already a single table as far as the rest of the SELECT is
> concerned.  (But what happens if they have column names in common,
> other than i?)  In which case you're right, join first then apply
> the WHERE condition is presumably what's supposed to happen.
> If that's the way it works, I think that an RTE describing the joined
> table is the natural way to handle it.  Obviously this would not be
> a primitive node; it would have to be some kind of structure of nodes.

Your statements are all correct. Maybe defining an RTE which does not
refer to a single specific table is the way to go; is there anything
like that already? If not, then putting the equivalent down in the
qualifications might work.

> > I was thinking of having a new qualification node to carry this 
> > info,
> You would need a qual clause to carry the join condition (t1.i = t2.i
> in your first example, i = t1.j in your second).  This would have to
> dangle off a node that represents the specially joined tables, I 
> think.
> There's no such thing as a "qualification node"; qual clauses are just
> expressions that happen to be in WHERE.  If the "specially joined
> tables" node isn't in the RTE then I think we need to invent some new
> place to put it.  The WHERE expression isn't a natural place for it.

Right, I wasn't remembering the right terminology.

> > From what I understand Bruce to say, the planner/optimizer is 
> > allowed to try all kinds of permutations of plans, choosing the one 
> > with the lowest cost. But if the info for the join is carried in a
> > qualification node, then the planner/optimizer must know that it 
> > can't reorder the query as freely as it does now.
> Yes, the join order would be forced W.R.T. the outer-joined tables,
> at least.
> The other alternative we should consider is the notion that the parser
> outputs are already a multilevel plan structure, where we'd have a 
> whole lower plan item representing the outer-join table result.  This 
> might end up being the same thing as above, since quite possibly the 
> RTE would be the natural place for the upper plan's link to the lower 
> one.
> We need to get Jan involved in this, since this sounds like the same
> kind of stuff he's been saying is needed for rules.  In fact Jan
> probably ought to be leading the discussion, not me...
> > A hashjoin method may be possible for queries which are structured 
> > as a left outer join; other outer joins will need to use the 
> > mergejoin method.
> I don't see why plain ol' nested loop couldn't be used too.  mergejoin
> is not always better than nested loop, or even always feasible.  It
> requires the availability of sort operators, for one thing.

Right. I'm learning as I go. I did put some code into the mergejoin
routines to walk the tables properly for outer joins (it is marked by
#ifdef ENABLE_OUTER_JOINS). But the flags for outer joins are not yet
passed in, and the result tuples are not constructed (they need some
null fields added to the left- or right-side table).

> > Also, some poorly-qualified outer joins reduce to inner joins,
> > and perhaps the optimizer can be smart enough to realize this.
> OK, now tell me about inner joins...

As you know that is what Postgres already does. You can specify inner
joins using this newer join syntax:
 select * from t1 join t2 using (i);

which I just convert to normal query tree nodes in the parser so it's
equivalent to
 select * from t1, t2 where t1.i = t2.i;

I probably don't do a complete job, but it's been a while since I've
looked at it.
                    - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


pgsql-hackers by date:

Previous
From: Malcolm Beattie
Date:
Subject: Phantom row from aggregate in self-join in 6.5
Next
From: "F.J.Cuberos"
Date:
Subject: