Re: [HACKERS] Another reason to redesign querytree representation - Mailing list pgsql-hackers
From | Thomas Lockhart |
---|---|
Subject | Re: [HACKERS] Another reason to redesign querytree representation |
Date | |
Msg-id | 379729CE.91905BB3@alumni.caltech.edu Whole thread Raw |
In response to | Re: [HACKERS] Another reason to redesign querytree representation (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
> > select * from t1 left join t2 using (i) > > left join t1 on (i = t1.j); > Is that actually a valid query? Wouldn't you at least need to rename > one or the other appearance of t1? (Nitpick, probably, but maybe I > am not understanding something...) afaik it is a valid query. The first outer join combines t1 and t2, and by definition the intermediate result loses its table-specific labeling. > > For a query like > > select * from t1 left join t2 using (i) where t2.j = 3; > > istm that the outer join must be done before the t2 qualification is > > applied, and that another ordering may produce the wrong result. > It's not immediately obvious what the semantics of that ought to be... > but I agree it makes a difference if you eliminate t2 rows before > rather than after the join. > This looks to me like the syntactic notion is that t1-left-join-t2 > is already a single table as far as the rest of the SELECT is > concerned. (But what happens if they have column names in common, > other than i?) In which case you're right, join first then apply > the WHERE condition is presumably what's supposed to happen. > If that's the way it works, I think that an RTE describing the joined > table is the natural way to handle it. Obviously this would not be > a primitive node; it would have to be some kind of structure of nodes. Your statements are all correct. Maybe defining an RTE which does not refer to a single specific table is the way to go; is there anything like that already? If not, then putting the equivalent down in the qualifications might work. > > I was thinking of having a new qualification node to carry this > > info, > You would need a qual clause to carry the join condition (t1.i = t2.i > in your first example, i = t1.j in your second). This would have to > dangle off a node that represents the specially joined tables, I > think. > There's no such thing as a "qualification node"; qual clauses are just > expressions that happen to be in WHERE. If the "specially joined > tables" node isn't in the RTE then I think we need to invent some new > place to put it. The WHERE expression isn't a natural place for it. Right, I wasn't remembering the right terminology. > > From what I understand Bruce to say, the planner/optimizer is > > allowed to try all kinds of permutations of plans, choosing the one > > with the lowest cost. But if the info for the join is carried in a > > qualification node, then the planner/optimizer must know that it > > can't reorder the query as freely as it does now. > Yes, the join order would be forced W.R.T. the outer-joined tables, > at least. > The other alternative we should consider is the notion that the parser > outputs are already a multilevel plan structure, where we'd have a > whole lower plan item representing the outer-join table result. This > might end up being the same thing as above, since quite possibly the > RTE would be the natural place for the upper plan's link to the lower > one. > We need to get Jan involved in this, since this sounds like the same > kind of stuff he's been saying is needed for rules. In fact Jan > probably ought to be leading the discussion, not me... > > A hashjoin method may be possible for queries which are structured > > as a left outer join; other outer joins will need to use the > > mergejoin method. > I don't see why plain ol' nested loop couldn't be used too. mergejoin > is not always better than nested loop, or even always feasible. It > requires the availability of sort operators, for one thing. Right. I'm learning as I go. I did put some code into the mergejoin routines to walk the tables properly for outer joins (it is marked by #ifdef ENABLE_OUTER_JOINS). But the flags for outer joins are not yet passed in, and the result tuples are not constructed (they need some null fields added to the left- or right-side table). > > Also, some poorly-qualified outer joins reduce to inner joins, > > and perhaps the optimizer can be smart enough to realize this. > OK, now tell me about inner joins... As you know that is what Postgres already does. You can specify inner joins using this newer join syntax: select * from t1 join t2 using (i); which I just convert to normal query tree nodes in the parser so it's equivalent to select * from t1, t2 where t1.i = t2.i; I probably don't do a complete job, but it's been a while since I've looked at it. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
pgsql-hackers by date: